Skip to content
v9.0.0 — Now available

Run 70B models
on a MacBook

Squish compresses model weights into memory-mapped tensors that load in milliseconds — served through a fully OpenAI-compatible API on Apple Silicon. No GPU required.

~/squish — zsh

Install via Homebrew

brew install squishai/squish/squish

Pull a pre-squished model (18 GB, < 2s)

squish pull llama3.3:70b

✓ Pulled llama3.3:70b in 1.4s

Start the OpenAI-compatible server

squish serve &

→ http://localhost:11435

Or chat interactively

squish run llama3.3:70b

You: Hello!

Squish: Hi! How can I help

< 2sCold start for 70B
~18 GBRAM for a 70B model
Faster load vs Ollama
100%OpenAI API compatible
How it works

Three steps to local AI

Squish handles everything from compression to serving. You just pull and run.

1
Install Squish

One Homebrew or pip command. No Docker, no CUDA drivers, no Python environment wrestling.

2
Pull a Model

Stream any pre-squished model from HuggingFace. Weights arrive as memory-mapped INT8 tensors — ready to load instantly.

3
Run or Serve

Chat in the terminal with squish run, or start a full OpenAI-compatible API with squish serve.

Features

Built for speed at every layer

From storage format to HTTP serving, every decision in Squish is optimised for Apple Silicon performance.

Instant Cold Start

Memory-mapped weights bypass all decode overhead. 70B models are ready in under 2 seconds — every time.

mmap → zero decode
🔌
Drop-in OpenAI API

Works with LangChain, LlamaIndex, OpenAI SDK, and any tool that speaks /v1/chat/completions.

/v1/chat/completions
📦
Batch Inference

Process multiple requests in parallel in a single call. Something Ollama and LM Studio simply don't offer.

batch: [req1, req2 …]
🗜️
INT8 + INT4 Compression

Two quantisation tiers. INT8 for near-lossless accuracy; INT4 for maximum density on 16 GB Macs.

squish push model:8b
🖥️
Clean CLI

Pull, run, serve, quantise, push, list, remove — composable commands for every workflow.

squish pull llama3.3:70b
🤗
HuggingFace Hub

Pre-squished models hosted on HuggingFace. Pull any model directly — no manual conversion needed.

hf://squish-community/…
Comparison

Why choose Squish?

See how Squish stacks up against the most popular local inference tools.

Feature Ollama LM Studio Squish ✦
Cold start (70B)~30 s~20 s< 2 s
RAM for 70B~40 GB~40 GB~18 GB
OpenAI API
Batch requests
Pre-compressed weights✓ HuggingFace
Zero-copy mmap
Weight formatGGUFGGUFINT8 mmap
PlatformmacOS / LinuxmacOS / WindowsmacOS (M1–M5)
Quick Start

Up and running in 30 seconds

Install
Pull a model
Chat
API server
Quantise

macOS via Homebrew (recommended)

brew install squishai/squish/squish

✓ squish 9.0.0 installed

Or from PyPI (Python 3.10+)

pip install squish

Verify

squish --version

squish 9.0.0

Browse available models

squish search llama

llama3.3:70b 18.2 GB INT8 ★ popular

llama3.2:3b 1.5 GB INT8

Pull a model

squish pull llama3.3:70b

████████████████████ 100% — 18.2 GB

✓ Pulled in 1.4s — ready

List local models

squish ls

llama3.3:70b 18.2 GB INT8 ✓ loaded

Interactive REPL

squish run llama3.3:70b

Loading model… 1.4s

You: Explain quantum entanglement like I'm 12

Squish: Imagine two magic coins that always

land on opposite sides, no matter how far apart…

Pass a system prompt

squish run llama3.3:70b --system "You are a pirate"

Start the server

squish serve &

→ Listening on http://localhost:11435

Query exactly like OpenAI

curl http://localhost:11435/v1/chat/completions \

-H "Content-Type: application/json" \

-d '{"model":"llama3.3:70b",

"messages":[{"role":"user","content":"Hello!"}]}'

Works with the OpenAI Python SDK too

python3 -c "import openai; c=openai.OpenAI(base_url='http://localhost:11435/v1',api_key='x'); print(c.chat.completions.create(model='llama3.3:70b',messages=[{'role':'user','content':'Hi'}]).choices[0].message.content)"

Quantise any HuggingFace model to INT8

squish push meta-llama/Llama-3.3-70B-Instruct --bits 8

Downloading weights… 70B params

Quantising to INT8…

✓ Pushed to hf://squish-community/llama3.3-70b-int8

Or INT4 for half the size

squish push meta-llama/Llama-3.3-70B-Instruct --bits 4

✓ Pushed to hf://squish-community/llama3.3-70b-int4

Ready to squish your models?

Install in 30 seconds and run 70B models on your MacBook today. Free for personal use, open-source on Jan 1, 2030.