Skip to Content

PolarQuant + QJL: The Two-Stage Secret Behind TurboQuant's Zero Loss

How PolarQuant's random rotation + polar transform and QJL's 1-bit error correction work together to achieve what single-stage quantization cannot
Sk Jabedul Haque
Apr 28, 2026 5 min read 121 views
PolarQuant + QJL: The Two-Stage Secret Behind TurboQuant's Zero Loss
Navigation
10 Sections

    TurboQuant achieves zero accuracy loss through two complementary algorithms: PolarQuant (random rotation + polar transform) and QJL (1-bit residual correction). Together, they compress KV cache 6x with no quality degradation.

    We've covered TurboQuant's results — 6x memory reduction, 8x faster attention, zero accuracy loss. But how does it actually work? The secret lies in two algorithms that TurboQuant combines: PolarQuant and QJL (Quantized Johnson-Lindenstrauss).

    Understanding these two stages explains why TurboQuant achieves what traditional quantization cannot — lossless compression at aggressive bit-widths.

    Why Two Stages? The Quantization Problem

    Traditional quantization faces a fundamental trade-off: aggressive compression loses accuracy, conservative compression doesn't save enough memory. At 3-bit, most methods lose 5-10% accuracy because they treat each dimension independently.

    TurboQuant solves this with a two-stage approach:

    • Stage 1 — PolarQuant: Rotate and transform the vector so it's easy to compress
    • Stage 2 — QJL: Correct any remaining errors with 1-bit precision

    This combination — simple transform, tiny correction — achieves what single-stage methods cannot.

    Stage 1: PolarQuant — Making Vectors Easy to Quantize

    PolarQuant (to be presented at AISTATS 2026) uses two tricks: random rotation and polar transform.

    Trick 1: Random Rotation

    High-dimensional vectors (like KV cache entries) often have axes with very different scales — some directions have huge values, others tiny. Standard quantization treats all axes equally, wasting bits on meaningless variations.

    PolarQuant applies a random rotation to the vector. In high dimensions, random projections "spread out" the energy — no single coordinate dominates. This creates a vector where all dimensions have similar scales.

    Trick 2: Polar Transform

    After rotation, PolarQuant applies a polar transform — converting the vector into magnitude and direction. Imagine converting rectangular XYZ coordinates to spherical coordinates (radius + angles).

    Why does this help? The magnitude (radius) tends to concentrate around a small range, while the angular coordinates have predictable distributions. Both are easier to quantize than the original raw values.

    Stage What It Does Result
    Input Raw KV cache vector d dimensions, FP16
    Step 1 Random rotation Energy spread equally
    Step 2 Polar transform Easy-to-quantize format
    Quantize 3-bit encoding ~5x compression

    After PolarQuant's two steps, the vector is much easier to compress. But some small errors remain — this is where QJL comes in.

    Stage 2: QJL — Cleaning Up the Residual

    QJL (Quantized Johnson-Lindenstrauss) was presented at AISTATS 2026. It tackles the error that PolarQuant leaves behind.

    The Johnson-Lindenstrauss Lemma

    The Johnson-Lindenstrauss lemma is a mathematical result from 1984: you can project high-dimensional points to far fewer dimensions while approximately preserving distances.

    QJL applies this in reverse for error correction. Here's how:

    1. Extract the residual: After 3-bit quantization, calculate the reconstruction error (difference between original and quantized)
    2. Random project: Apply a random projection matrix that shrinks the error from d dimensions to just 1 bit's worth of information
    3. Quantize the projection: Store just 1 bit — essentially yes/no, correct/incorrect
    4. Reconstruct: During decoding, add back the 1-bit correction

    The magic: a 1-bit correction nearly eliminates the residual error from the first stage. Total bits used: 3 + 1 = 4 bits per value. But the quality matches 16-bit baseline.

    Component Input Output Purpose
    PolarQuant 16-bit vector 3-bit code Main compression
    QJL Reconstruction error 1-bit code Error correction
    Combined 16-bit vector 4 bits total Zero-loss output

    Why This Works: The Mathematical Insight

    Traditional quantization treats each dimension independently — but high-dimensional vectors have structure that compression can exploit. PolarQuant and QJL work together because:

    • Rotation removes worst-case axes: Random rotation ensures no coordinate has outlier values
    • Polar transform concentrates: After transform, most information is in predictable places
    • JL correction is efficient: One bit of error correction goes far because it targets the right error

    The key insight: you don't need many bits to correct errors if you project those errors into the right space. QJL does exactly that.

    How It Compares to Alternative Approaches

    Method Bits Accuracy Notes
    FP16 baseline 16 100% Standard full precision
    INT8 quantization 8 100% Standard approach
    INT4 quantization 4 95-98% 5-10% loss typical
    Standard 3-bit 3 90-93% Poor quality
    TurboQuant (3+1) 4 100% Zero loss achieved

    This comparison explains why TurboQuant's paper generated excitement. At 4 bits total (3 for PolarQuant + 1 for QJL), it matches 16-bit baseline — something no other 3-bit method achieves.

    Implementation Status

    Both algorithms are available as open source:

    • turbo-quant (Rust): Production-ready implementation from RecursiveIntell — supports both TurboQuant and separate PolarQuant/QJL
    • llama.cpp: Community integration available
    • vLLM: Integration in progress
    • PyTorch: Reference implementation

    According to Google Research's official blog, TurboQuant was presented at ICLR 2026, PolarQuant at AISTATS 2026, and QJL at AISTATS 2026.

    PolarQuant + QJL FAQ

    Why does PolarQuant need random rotation?
    Raw high-dimensional vectors often have axes with wildly different scales. Random rotation spreads the "energy" evenly across all coordinates, making each equally compressible. Without rotation, some axes dominate and waste quantization bits.
    What exactly does QJL correct?
    After 3-bit quantization, there's a small difference (residual error) between the original vector and its reconstructed version. QJL captures this error in just 1 bit — essentially marking whether to apply a small correction during decoding.
    Is 1 bit enough for error correction?
    Surprisingly yes. The Johnson-Lindenstrauss lemma proves you can preserve distance relationships with far fewer dimensions. QJL applies this insight: a well-chosen 1-bit signal corrects most of the residual error because it's targeting the right "direction" of error.
    Can I use just PolarQuant without QJL?
    Yes, but you'll lose some accuracy. PolarQuant alone achieves about 5x compression with small degradation. Adding QJL recovers that last bit of quality to achieve true zero-loss at 4-bit total.
    Does this work for model weights too?
    TurboQuant targets KV cache specifically. For model weights, different techniques work better. But the same principles (rotation +JL correction) could potentially apply — research is ongoing.
    What's the performance overhead?
    Near zero. The random rotation and projection are constants — computed once, applied everywhere. QJL uses lookup tables. Google reports "zero-overhead" in their paper — the encoding/decoding is negligible compared to attention computation.
    How does this compare to KV cache pruning?
    Pruning removes cache entries entirely — losing information. TurboQuant preserves all entries but compresses them. They're complementary: you could prune first (remove least important entries), then TurboQuant compress what remains.
    Is this available in Hugging Face Transformers?
    Not directly yet. The turbo-quant Rust library is the main production option. Community ports are in progress for llama.cpp and vLLM. PyTorch has reference implementations you can integrate.

    For more on TurboQuant, explore our articles on TurboQuant Explained (FP4), TurboQuant 3-Bit Explained, DeepSeek Engram Memory, and Context Engineering Guide.

    Questions about PolarQuant or QJL?

    Join Now

    Last Updated: April 29, 2026 | Source: Google Research, GitHub, TurboQuant.net

    Sk Jabedul Haque

    Sk Jabedul Haque

    Founder & Chief Editor

    Building India's most trusted finance education platform — simplifying news, calculators, and market trends so anyone can understand and invest confidently.