TurboQuant Tested: Is Google's "Lossless" AI Compression Really Lossless?

TLDR:

YouTuber xCreate tests TurboQuant live — finds 4-bit turbo performs similarly to traditional 4-bit quantization, not “lossless” as claimed
9-bit quantization is genuinely near-lossless (99.6% token accuracy, 0.09 MAE)
2-bit turbo completely fails; model cannot compile or produce coherent output
Google/NYU paper’s spectacular results may be model-specific or optimized for certain benchmarks
Academic drama: Rabbit R claims TurboQuant used their C++ implementation without credit

The Hype vs Reality

image of TurboQuant Tested: Is Google's "Lossless" AI Compression Really Lossless? - HelloExpress - 2

When Google and NYU researchers published TurboQuant, the AI community got excited. The paper promised “lossless” quantization — running massive AI models at 2-4 bit precision with virtually no quality loss. The implications are huge: a 70B parameter model could run on a MacBook instead of a server rack.

image of TurboQuant Tested: Is Google's "Lossless" AI Compression Really Lossless? - HelloExpress - 3

But YouTuber xCreate actually tested it. The results? Not quite the revolution the paper claimed.

The Testing Method

xCreate ran TurboQuant through real-world tests on two models: MiniMax and Llama 1B. Rather than just benchmark numbers, he actually prompted the model to write a complex 3D graphics program — a spaceship game with context windows and game logic. Then he compared outputs at different precision levels.

Memory usage tells one story:

Precision	Memory Used
Full 16-bit	1.92 GiB
9-bit	1.1 GiB (43% reduction)
4.5-bit	0.5 GiB (74% reduction)
4-bit turbo	0.51 GiB (73% reduction)

The memory savings are real — but so are the quality losses.

Token Accuracy Results

Here’s where TurboQuant’s claims start falling apart. The critical metric is “top token accuracy” — how often the quantized model picks the exact same next word as the full-precision version:

Quantization	Token Accuracy	Perplexity
Full 16-bit	100%	0.07
9-bit	99.6%	Very low
4-bit affine (traditional)	97.2%	1.117
4-bit turbo (1-pass)	~97%	1.1171
3-bit mixed precision	95%	Higher
2-bit turbo	FAILED	N/A

The 4-bit turbo version performs nearly identically to traditional 4-bit affine quantization. It’s not lossless. At 9-bit, however, quantization is genuinely near-lossless — that’s the sweet spot.

The 2-Bit Disaster

Going below 3-bits breaks everything. At 2-bit precision, xCreate’s model couldn’t even compile. The Metal shaders failed to build, the context window collapsed, and output was completely incoherent. The tokenizer started outputting garbage tokens that didn’t correspond to any valid text.

This isn’t surprising — you’re cramming 16 bits of precision into 2 bits. But TurboQuant’s paper implied even 2-bit would work “losslessly.” That claim doesn’t hold up.

The Second Pass Problem

TurboQuant’s paper describes a two-pass approach: one pass for the main quantization, a second pass using a quantized Johnson-Lindenstrauss transform for error correction. This is where the paper’s best results come from.

But xCreate found that implementing the full two-pass method actually hurt performance. With the second pass enabled, token accuracy dropped to 88.2% and perplexity spiked to 1.5 — worse than the single-pass version.

The open-source implementations currently available (in MLX LM and MLX VLM) only implement the first pass anyway. So even if the paper’s two-pass method works in specific scenarios, nobody can actually use it yet.

The Academic Drama

TurboQuant’s GitHub page features prominent links to “A Recipe for Pretraining Data Efficiency” — which Rabbit R interprets as a diss. More seriously, Rabbit R claims the TurboQuant team contacted them in January 2025 asking for help debugging their C++ implementation, then transferred that code to Python without giving credit.

TurboQuant responded that Rabbit R’s guarantees are “suboptimal.” Academic discourse gets spicy on GitHub.

Our Take

TurboQuant is genuinely useful — but it’s not the revolution the headlines suggest. The 4-bit implementation is solid, practical, and about equivalent to existing 4-bit quantization methods. If you’re running local AI models on Apple Silicon, it’s worth using, but don’t expect “lossless” results.

The real breakthrough might be in the 9-bit range. That’s where xCreate saw genuinely near-lossless performance: 99.6% token accuracy with half the memory usage. For users who need more precision than 4-bit but can’t run full 16-bit models, 9-bit could be the sweet spot.

As for the paper’s spectacular claims — treat them as aspirational. Real-world results on consumer hardware tell a more modest story. The research is legitimate, but the gap between “works in the paper” and “works on your MacBook” remains significant.

Source:
– YouTube: How to Run TurboQuant – “Lossless” Quantization for Local AI TESTED by xCreate
– Google TurboQuant: How Extreme Compression Makes AI Models 8x Faster (Helloexpress)
– TurboQuant GitHub / Paper

TurboQuant Tested: Is Google’s “Lossless” AI Compression Really Lossless?

TLDR:

The Hype vs Reality

The Testing Method

Token Accuracy Results

The 2-Bit Disaster

The Second Pass Problem

The Academic Drama

Our Take

What is your reaction?

ASUS ProArt GoPro Edition Quick Review — much RAM, very Pro

Leave a reply Cancel reply

Recent Reviews

ASUS ProArt GoPro Edition Quick Review — much RAM, very Pro

Xiaomi TV A Pro 55″ (2026) Review

Xiaomi Pad 8 Review: The B5-Sized Powerhouse that Rethinks Android Tablets

Featured Posts

Categories

TLDR:

The Hype vs Reality

The Testing Method

Token Accuracy Results

The 2-Bit Disaster

The Second Pass Problem

The Academic Drama

Our Take

What is your reaction?

ASUS ProArt GoPro Edition Quick Review — much RAM, very Pro

You may also like

MAGI: Three Cheap AI Models Beat Claude Through Debate, Not Voting

Google TurboQuant: How Extreme Compression Makes AI Models 8x Faster

Leave a reply Cancel reply

Recent Reviews

ASUS ProArt GoPro Edition Quick Review — much RAM, very Pro

Xiaomi TV A Pro 55″ (2026) Review

Xiaomi Pad 8 Review: The B5-Sized Powerhouse that Rethinks Android Tablets

Featured Posts

Categories

TAGS