AlgoMaster Logo

Vector Databases

Last Updated: January 12, 2026

Ashish

Ashish Pratap Singh

When a user searches for "comfortable work from home chair" on an e-commerce site, they are not looking for those exact words.

They want results like "ergonomic office seating" or "lumbar support desk chair," even though these share no keywords with the query.

Traditional databases with keyword matching cannot solve this problem. They find exact matches, not semantic matches.

This is where vector databases come in. They store data as high-dimensional vectors called embeddings, which capture the meaning of text, images, or other data. Similar concepts end up close together in this vector space, even if they use different words or representations.

A vector database can find the most similar vectors to a query vector in milliseconds, enabling semantic search, recommendations, and AI-powered applications.

The rise of large language models (LLMs) like GPT has accelerated the importance of vector databases.

Understanding Embeddings

An embedding is a numerical representation of data in a high-dimensional space. Machine learning models learn to map text, images, audio, or other data types into vectors where similar items are close together.

How Embeddings Work

Consider text embeddings. An embedding model (like OpenAI's text-embedding-ada-002 or Sentence-BERT) converts text into a fixed-size vector:

The first two vectors are close together because the concepts are similar. The third is far away because running shoes are unrelated to office chairs.

Similarity Metrics

Vector databases use mathematical distance metrics to measure similarity:

MetricFormulaUse Case
Cosine similaritycos(θ) = (A·B) / (|A||B|)Text embeddings (normalized)
Euclidean distance√Σ(ai - bi)²General purpose
Dot productA·B = Σ(ai × bi)When magnitude matters

Cosine similarity is most common for text because it measures angle, not magnitude. Two vectors pointing in the same direction have similarity 1.0, regardless of their length.

Embedding Models

Different models produce embeddings for different data types:

ModelDimensionsData TypeProvider
text-embedding-ada-0021536TextOpenAI
text-embedding-3-small1536TextOpenAI
all-MiniLM-L6-v2384TextSentence Transformers
CLIP512Text + ImagesOpenAI
Cohere embed4096TextCohere

More dimensions generally capture more nuance but require more storage and computation.

Vector Database Architecture

Vector databases are optimized for one primary operation: finding the K most similar vectors to a query vector (K-nearest neighbors, or KNN).

The Challenge

Consider a database with 10 million vectors of 1536 dimensions each:

  • Storage: 10M × 1536 × 4 bytes (float32) = 61 GB
  • Brute-force search: Compare query to all 10M vectors = too slow

For a single query, computing cosine similarity with all 10 million vectors takes seconds. Real-time applications need milliseconds.

Approximate Nearest Neighbors (ANN)

The solution is approximate algorithms that trade exactness for speed. Instead of finding the exact K nearest neighbors, they find vectors that are very likely to be among the nearest.

ApproachExact KNNApproximate KNN
Recall100%95-99%
SpeedO(n × d)O(log n) to O(√n)
Use caseSmall datasets, offlineProduction, real-time

The metric "recall@K" measures what fraction of true K nearest neighbors are found. A recall of 0.95 means 95% of the true nearest neighbors are in the result set.

Indexing Algorithms

Vector databases use specialized indexes for fast ANN search:

Indexing Algorithms Deep Dive

HNSW (Hierarchical Navigable Small World)

HNSW is the most popular algorithm, offering excellent recall with fast queries. It builds a multi-layer graph where each layer is a sparser version of the layer below.

How HNSW search works:

  1. Start at the top layer with the entry point
  2. Greedily navigate to the closest node to the query
  3. When no closer node exists, drop to the next layer
  4. Repeat until reaching layer 0
  5. Search more thoroughly in layer 0 to find K nearest

HNSW parameters:

ParameterEffect
MConnections per node. Higher = better recall, more memory
efConstructionSearch depth during build. Higher = better index, slower build
efSearchSearch depth during query. Higher = better recall, slower search

Pros: High recall, fast queries, works in-memory Cons: Memory-intensive, slow to build

IVF (Inverted File Index)

IVF partitions vectors into clusters and searches only relevant clusters:

How IVF search works:

  1. Build time: Cluster vectors using k-means
  2. Query time: Find nprobe nearest cluster centroids
  3. Search only within those clusters for nearest neighbors

IVF parameters:

ParameterEffect
nlistNumber of clusters. Higher = more partitions
nprobeClusters to search. Higher = better recall, slower

Pros: Lower memory than HNSW, scales to disk Cons: Lower recall at same speed, requires training step

Product Quantization (PQ)

Product quantization compresses vectors to reduce memory and enable disk-based search:

How PQ works:

  1. Split vector into segments (e.g., 1536 dims → 192 segments of 8 dims)
  2. For each segment, train a codebook of representative vectors
  3. Replace each segment with the codebook ID of the nearest representative
  4. Store only the IDs, not the full vectors

Compression ratio: 32x (float32 to 1-byte codes)

IVF-PQ: Combines IVF partitioning with PQ compression for large-scale, disk-based search.

Popular Vector Databases

Pinecone

Fully managed vector database service:

  • Deployment: Cloud-only (serverless or pod-based)
  • Scaling: Automatic horizontal scaling
  • Features: Metadata filtering, namespaces, hybrid search
  • Strengths: Zero operations, simple API

Milvus

Open-source vector database designed for scale:

  • Deployment: Self-hosted or Zilliz Cloud
  • Scaling: Distributed architecture, separates compute and storage
  • Features: Multiple index types, GPU acceleration, hybrid search
  • Strengths: Open source, highly scalable

Weaviate

Vector database with built-in vectorization:

  • Deployment: Self-hosted or Weaviate Cloud
  • Features: Built-in embedding models, GraphQL API, hybrid search
  • Strengths: Can vectorize data automatically, schema-based

Qdrant

Open-source vector database with filtering:

  • Deployment: Self-hosted or Qdrant Cloud
  • Features: Rich filtering, payload storage, quantization
  • Strengths: Fast filtered search, Rust-based performance

pgvector (PostgreSQL Extension)

Vector search as a PostgreSQL extension:

  • Deployment: Any PostgreSQL instance
  • Features: Full SQL, joins with other tables, transactions
  • Strengths: Use existing PostgreSQL infrastructure

Comparison

FeaturePineconeMilvusWeaviateQdrantpgvector
ManagedYesOptionalOptionalOptionalNo
Open sourceNoYesYesYesYes
Built-in vectorizationNoNoYesNoNo
SQL supportNoNoNoNoFull
ScalingAutomaticManualManualManualLimited
Best forSimplicityScaleAuto-vectorizationFilteringSQL integration

Use Cases

Find documents by meaning, not just keywords:

Typical implementation:

  1. Pre-compute embeddings for all documents
  2. Store embeddings with document IDs in vector database
  3. At query time, embed the query
  4. Find K nearest document embeddings
  5. Return corresponding documents

Retrieval-Augmented Generation (RAG)

Give LLMs access to private knowledge:

Why RAG:

  • LLMs have knowledge cutoff dates
  • LLMs do not know your private data
  • LLMs can hallucinate; RAG grounds them in real documents
  • Cheaper than fine-tuning

Recommendations

Find similar items based on content or user behavior:

For collaborative filtering:

Search images with text queries:

CLIP and similar models create aligned embeddings for text and images, enabling cross-modal search.

Anomaly Detection

Detect outliers by finding points far from their nearest neighbors:

Hybrid Search

Pure vector search can miss exact keyword matches. Hybrid search combines vector similarity with keyword matching:

Reciprocal Rank Fusion (RRF):

When to use hybrid:

  • Product search (exact model numbers + semantic description)
  • Technical documentation (exact terms + conceptual queries)
  • When recall is critical and you cannot miss exact matches

Performance Considerations

Memory vs Disk Trade-offs

StorageLatencyCostBest For
In-memory (HNSW)< 10ms$$$Small-medium datasets
Memory-mapped10-50ms$$Medium-large datasets
Disk-based (IVF-PQ)50-200ms$Very large datasets

Scaling Considerations

FactorImpactMitigation
Vector countLinear memory/storageQuantization, sharding
Vector dimensionsLinear memory, √n searchDimensionality reduction
Query throughputCPU/GPU boundHorizontal scaling
Index build timeO(n log n) typicalIncremental updates

Quantization

Reduce memory by compressing vectors:

PrecisionBytes/DimensionMemory (1M × 1536)Quality Impact
float3246.1 GBBaseline
float1623.1 GBMinimal
int811.5 GBSmall
Binary0.125192 MBSignificant

Most applications can use float16 or int8 with negligible quality loss.

When to Choose Vector Databases

Vector databases are the right choice when:

  • Semantic similarity matters. You want to find similar items by meaning, not exact match.
  • Building AI applications. RAG, semantic search, and AI-powered features need vector search.
  • Recommendations by content. Find similar products, articles, or content.
  • Unstructured data search. Search through text, images, audio, or video.

When to Consider Alternatives

Vector databases may not fit when:

  • Exact match is sufficient. If keyword search works, it is simpler.
  • Small scale. For < 100K vectors, pgvector or in-memory solutions may suffice.
  • No AI/ML component. If you are not using embeddings, you do not need a vector database.
  • Structured queries dominate. For filtering and aggregation, relational databases are better.

Summary

Vector databases enable similarity search over high-dimensional embeddings:

AspectVector Database Approach
Data modelVectors (embeddings) with metadata
Primary operationK-nearest neighbor search
IndexingHNSW, IVF, product quantization
SimilarityCosine, Euclidean, dot product
Trade-offRecall vs speed vs memory

Key concepts:

  • Embeddings: Neural network representations capturing semantic meaning
  • ANN (Approximate Nearest Neighbors): Trade exactness for speed
  • HNSW: Graph-based, high recall, memory-intensive
  • IVF-PQ: Clustering + quantization, scales to disk
  • Hybrid search: Combine vector and keyword for best results

The next chapter explores full-text search engines, which optimize for a different kind of search: keyword-based search with relevance ranking, faceted filtering, and linguistic analysis.