Last Updated: January 12, 2026
When a user searches for "comfortable work from home chair" on an e-commerce site, they are not looking for those exact words.
They want results like "ergonomic office seating" or "lumbar support desk chair," even though these share no keywords with the query.
Traditional databases with keyword matching cannot solve this problem. They find exact matches, not semantic matches.
This is where vector databases come in. They store data as high-dimensional vectors called embeddings, which capture the meaning of text, images, or other data. Similar concepts end up close together in this vector space, even if they use different words or representations.
A vector database can find the most similar vectors to a query vector in milliseconds, enabling semantic search, recommendations, and AI-powered applications.
The rise of large language models (LLMs) like GPT has accelerated the importance of vector databases.
An embedding is a numerical representation of data in a high-dimensional space. Machine learning models learn to map text, images, audio, or other data types into vectors where similar items are close together.
Consider text embeddings. An embedding model (like OpenAI's text-embedding-ada-002 or Sentence-BERT) converts text into a fixed-size vector:
The first two vectors are close together because the concepts are similar. The third is far away because running shoes are unrelated to office chairs.
Vector databases use mathematical distance metrics to measure similarity:
| Metric | Formula | Use Case |
|---|---|---|
| Cosine similarity | cos(θ) = (A·B) / (|A||B|) | Text embeddings (normalized) |
| Euclidean distance | √Σ(ai - bi)² | General purpose |
| Dot product | A·B = Σ(ai × bi) | When magnitude matters |
Cosine similarity is most common for text because it measures angle, not magnitude. Two vectors pointing in the same direction have similarity 1.0, regardless of their length.
Different models produce embeddings for different data types:
| Model | Dimensions | Data Type | Provider |
|---|---|---|---|
| text-embedding-ada-002 | 1536 | Text | OpenAI |
| text-embedding-3-small | 1536 | Text | OpenAI |
| all-MiniLM-L6-v2 | 384 | Text | Sentence Transformers |
| CLIP | 512 | Text + Images | OpenAI |
| Cohere embed | 4096 | Text | Cohere |
More dimensions generally capture more nuance but require more storage and computation.
Vector databases are optimized for one primary operation: finding the K most similar vectors to a query vector (K-nearest neighbors, or KNN).
Consider a database with 10 million vectors of 1536 dimensions each:
For a single query, computing cosine similarity with all 10 million vectors takes seconds. Real-time applications need milliseconds.
The solution is approximate algorithms that trade exactness for speed. Instead of finding the exact K nearest neighbors, they find vectors that are very likely to be among the nearest.
| Approach | Exact KNN | Approximate KNN |
|---|---|---|
| Recall | 100% | 95-99% |
| Speed | O(n × d) | O(log n) to O(√n) |
| Use case | Small datasets, offline | Production, real-time |
The metric "recall@K" measures what fraction of true K nearest neighbors are found. A recall of 0.95 means 95% of the true nearest neighbors are in the result set.
Vector databases use specialized indexes for fast ANN search:
HNSW is the most popular algorithm, offering excellent recall with fast queries. It builds a multi-layer graph where each layer is a sparser version of the layer below.
How HNSW search works:
HNSW parameters:
| Parameter | Effect |
|---|---|
M | Connections per node. Higher = better recall, more memory |
efConstruction | Search depth during build. Higher = better index, slower build |
efSearch | Search depth during query. Higher = better recall, slower search |
Pros: High recall, fast queries, works in-memory Cons: Memory-intensive, slow to build
IVF partitions vectors into clusters and searches only relevant clusters:
How IVF search works:
nprobe nearest cluster centroidsIVF parameters:
| Parameter | Effect |
|---|---|
nlist | Number of clusters. Higher = more partitions |
nprobe | Clusters to search. Higher = better recall, slower |
Pros: Lower memory than HNSW, scales to disk Cons: Lower recall at same speed, requires training step
Product quantization compresses vectors to reduce memory and enable disk-based search:
How PQ works:
Compression ratio: 32x (float32 to 1-byte codes)
IVF-PQ: Combines IVF partitioning with PQ compression for large-scale, disk-based search.
Fully managed vector database service:
Open-source vector database designed for scale:
Vector database with built-in vectorization:
Open-source vector database with filtering:
Vector search as a PostgreSQL extension:
| Feature | Pinecone | Milvus | Weaviate | Qdrant | pgvector |
|---|---|---|---|---|---|
| Managed | Yes | Optional | Optional | Optional | No |
| Open source | No | Yes | Yes | Yes | Yes |
| Built-in vectorization | No | No | Yes | No | No |
| SQL support | No | No | No | No | Full |
| Scaling | Automatic | Manual | Manual | Manual | Limited |
| Best for | Simplicity | Scale | Auto-vectorization | Filtering | SQL integration |
Find documents by meaning, not just keywords:
Typical implementation:
Give LLMs access to private knowledge:
Why RAG:
Find similar items based on content or user behavior:
For collaborative filtering:
Search images with text queries:
CLIP and similar models create aligned embeddings for text and images, enabling cross-modal search.
Detect outliers by finding points far from their nearest neighbors:
Pure vector search can miss exact keyword matches. Hybrid search combines vector similarity with keyword matching:
Reciprocal Rank Fusion (RRF):
When to use hybrid:
| Storage | Latency | Cost | Best For |
|---|---|---|---|
| In-memory (HNSW) | < 10ms | $$$ | Small-medium datasets |
| Memory-mapped | 10-50ms | $$ | Medium-large datasets |
| Disk-based (IVF-PQ) | 50-200ms | $ | Very large datasets |
| Factor | Impact | Mitigation |
|---|---|---|
| Vector count | Linear memory/storage | Quantization, sharding |
| Vector dimensions | Linear memory, √n search | Dimensionality reduction |
| Query throughput | CPU/GPU bound | Horizontal scaling |
| Index build time | O(n log n) typical | Incremental updates |
Reduce memory by compressing vectors:
| Precision | Bytes/Dimension | Memory (1M × 1536) | Quality Impact |
|---|---|---|---|
| float32 | 4 | 6.1 GB | Baseline |
| float16 | 2 | 3.1 GB | Minimal |
| int8 | 1 | 1.5 GB | Small |
| Binary | 0.125 | 192 MB | Significant |
Most applications can use float16 or int8 with negligible quality loss.
Vector databases are the right choice when:
Vector databases may not fit when:
Vector databases enable similarity search over high-dimensional embeddings:
| Aspect | Vector Database Approach |
|---|---|
| Data model | Vectors (embeddings) with metadata |
| Primary operation | K-nearest neighbor search |
| Indexing | HNSW, IVF, product quantization |
| Similarity | Cosine, Euclidean, dot product |
| Trade-off | Recall vs speed vs memory |
The next chapter explores full-text search engines, which optimize for a different kind of search: keyword-based search with relevance ranking, faceted filtering, and linguistic analysis.