A polynomial autoencoder beats PCA on transformer embeddings

May 8, 2026 · ai · Source ↗

TLDR

PCA encoder + quadratic Ridge decoder in ~150 lines of numpy achieves 4x embedding compression with only -0.85 p.p. NDCG@10 loss on FiQA, no SGD required.

Poly-AE beats PCA by +1 to +4.4 p.p. NDCG at d=128 (8x compression) and +0.03 to +2.7 p.p. at d=256 (4x) across four models.
The quadratic decoder works by lifting PCA coordinates into all degree-2 monomials, then fitting a Ridge OLS in one np.linalg.solve call over corpus statistics.
Non-MRL models (bge-base, e5-base) see the biggest lift; nomic-v1.5 at d=256 sees near-zero gain, likely due to more isotropic training.
Requires transductive corpus fit: not suitable for multi-tenant SaaS, streaming indices, or edge inference where per-corpus PCA is impractical.
Method traces to quadratic manifold literature (Jain 2017, Geelen-Willcox 2022/2023) from dynamical systems, not ML.