AlgoMaster Logo

Two-Tower Architecture

Last Updated: May 29, 2026

Ashish

Ashish Pratap Singh

11 min read

Embedding-based retrieval works at serving time by encoding queries and items into vectors, searching an ANN index, and returning candidates. What that leaves open is how you train encoders whose vectors actually encode relevance, because the quality of every candidate depends on that and on whether the serving index matches the training objective.

The two-tower architecture is the answer most large-scale retrieval systems converge on. It gives you a clean serving split: item representations are precomputed offline, while query or user representations are computed online.

Premium Content

Subscribe to unlock full access to this content and more premium articles.