AlgoMaster Logo

What are Embeddings?

Last Updated: March 14, 2026

Ashish

Ashish Pratap Singh

Language models work with numbers, not words. Before a model can understand text, every word, phrase, or token must be converted into a numerical representation. That representation is called an embedding.

In this chapter, we will explore what embeddings are, how they are created, and why they are a foundational building block for many modern AI applications.

From Words to Coordinates

Embeddings do the same thing, but for meaning instead of geography. An embedding model takes a piece of text, whether it is a single word, a sentence, or an entire paragraph, and maps it to a list of numbers. Instead of two dimensions like GPS, embeddings typically use hundreds or thousands of dimensions.

Each dimension captures some aspect of meaning, and the resulting coordinates place the text at a specific point in a high-dimensional "meaning space."

The key property is that texts with similar meanings end up near each other in this space, even when they share zero words. Example: "How do I fix a broken pipe?" and "Plumbing repair guide" land in roughly the same neighborhood because their meanings overlap, not because their characters do.

Three different text inputs go into the embedding model. The two plumbing-related texts come out with similar coordinate values (close together), while the cake recipe ends up in a completely different region of the space. This is what makes semantic search possible: instead of matching keywords, you compute embeddings for the query and all documents, then find the documents whose coordinates are closest to the query's coordinates.

What Do the Dimensions Mean?

A natural question is: what does each number in an embedding vector actually represent? The honest answer is that individual dimensions do not have clean, human-interpretable meanings. They are not things like "dimension 47 = how much this text is about plumbing." Instead, meaning emerges from the combination of all dimensions together.

One dimension might partially capture formality, another might partially capture topic area, and yet another might partially capture sentiment. But these are distributed across many dimensions simultaneously, entangled in ways that resist simple labels.

In practice, this does not matter. You do not need to interpret individual dimensions to use embeddings effectively. What matters is that the distances between points in this space correspond to semantic relationships. Similar meanings are close, different meanings are far apart. That is the contract, and embedding models are remarkably good at honoring it.

How Embeddings Are Created

You do not need to train your own embedding model to use embeddings (just like you do not need to build a GPS satellite to use GPS). But understanding roughly how these models are trained helps you anticipate their strengths and weaknesses.

The Training Process

Modern embedding models are neural networks, usually based on the transformer architecture you learned previously. They are trained on massive amounts of text with a clever objective: learn to produce similar vectors for texts that mean similar things, and dissimilar vectors for texts that mean different things.

The training data typically consists of pairs or groups of related texts. These might be:

  • A search query and the document that answers it
  • Two paraphrases of the same sentence
  • A question and its corresponding answer
  • Sentences from the same paragraph (proximity as a signal of relatedness)

During training, the model learns to push related texts closer together in embedding space and push unrelated texts farther apart. This is called contrastive learning, and it is the key technique behind most modern embedding models.

The result of this training process is a model that has internalized an enormous amount of semantic knowledge. It knows that "dog" and "puppy" are related, that "bank" can mean a financial institution or the edge of a river, and that "running a company" and "running a marathon" use the word "running" in completely different senses. All of this is encoded in the coordinate system it learned.

From Tokens to a Single Vector

One detail worth understanding is how the model goes from a sequence of tokens to a single embedding vector. The transformer processes each token and produces a vector for every token position. But you want one vector for the whole sentence, not one per word.

The most common approach is called mean pooling: you simply average all the token vectors together. Some models use the vector from a special [CLS] token that is designed to capture the overall meaning. Others use more sophisticated pooling strategies. The specific method depends on the model, but the end result is always the same: one fixed-size vector that represents the entire input text.

Generating Embeddings with Python

With that background covered, here is how to generate embeddings in practice.

Using OpenAI's Embedding API

OpenAI offers an embedding model called text-embedding-3-small that produces 1536-dimensional vectors. It is fast, cheap, and quite good for most use cases.

main.py
Loading...

That is it. You send text in, you get a list of 1536 floating-point numbers back. Every text you embed with the same model will produce a vector of the same length, which is what allows you to compare them.

Using Sentence-Transformers (Open Source)

If you prefer not to depend on an external API, the sentence-transformers library gives you access to excellent open-source embedding models that run locally on your machine.

main.py
Loading...

Output:

Notice the difference in dimensionality: OpenAI's model produces 1536-dimensional vectors, while all-MiniLM-L6-v2 produces 384-dimensional vectors. Higher dimensionality can capture finer-grained distinctions but costs more to store and compare. We will discuss this trade-off later in the chapter.

Batch Processing

In real applications, you rarely embed one text at a time. You might have thousands of documents to embed when building a search index. Both OpenAI and sentence-transformers support batch processing.

main.py
Loading...

For OpenAI, you can send up to 2048 texts in a single API call. For local models with sentence-transformers, the batch_size parameter controls how many texts are processed together on the GPU (or CPU). Larger batches are faster but use more memory.

Measuring Similarity: How Close Is Close?

Now you have embedding vectors. The next question is: how do you quantify how similar two vectors are?

There are three main distance/similarity metrics used in practice, each with different properties and use cases.

Cosine Similarity

Cosine similarity measures the angle between two vectors, ignoring their magnitude. It answers the question: "Are these two vectors pointing in roughly the same direction?"

The formula is:

Where A . B is the dot product, and ||A|| is the magnitude (length) of vector A. The result ranges from -1 (opposite directions) to 1 (same direction), with 0 meaning the vectors are orthogonal (unrelated).

Here is the intuition: imagine two arrows starting from the origin. Cosine similarity only cares about the angle between them. If they point in the same direction, cosine similarity is 1, even if one arrow is twice as long as the other. This makes it robust to differences in text length, which is why it is the most popular metric for text embeddings.

main.py
Loading...

The two plumbing-related sentences have a much higher similarity score than either one has with the cake recipe. The embedding model captured the semantic relationship even though "How do I fix a broken pipe?" and "Plumbing repair guide" share zero words.

Dot Product

The dot product is the simplest similarity measure. No normalization, no division. Just multiply corresponding elements and sum them up.

Unlike cosine similarity, the dot product is affected by vector magnitude. A longer vector paired with another long vector produces a larger dot product, even if they are not particularly similar in direction. This means the dot product conflates two things: how similar the directions are, and how large the magnitudes are.

This can be useful or harmful depending on your application. If magnitude encodes something meaningful (like document importance or confidence), the dot product captures both similarity and importance in one number. If magnitude is just noise (which is common), it adds unwanted variance.

main.py
Loading...

Important note: If your embeddings are normalized (magnitude = 1), the dot product and cosine similarity produce identical results. Many embedding models, including OpenAI's, return normalized embeddings by default. This means you can use the cheaper dot product computation and still get cosine similarity behavior.

Euclidean Distance

Euclidean distance measures the straight-line distance between two points in the embedding space. It is the familiar distance formula from geometry, extended to high dimensions.

Unlike the previous two metrics where higher means more similar, with Euclidean distance, lower means more similar. Two identical vectors have a distance of zero, and the distance grows as they diverge.

main.py
Loading...

The plumbing pair has a smaller distance (more similar) than the pipe-cake pair, as expected.

When to Use Which Metric

Choosing the right metric depends on your embedding model and your use case. Here is a practical guide:

Scroll
MetricBest WhenWatch Out ForUsed By
Cosine SimilarityEmbeddings are NOT normalized; you want pure directional similarityIgnores magnitude, which sometimes carries useful signalMost text search applications, FAISS (inner product with normalized vectors)
Dot ProductEmbeddings ARE normalized (equivalent to cosine), or magnitude is meaningfulCan give misleading results if vectors have very different magnitudesOpenAI recommends this for their normalized embeddings, Pinecone default
Euclidean DistanceYou need true geometric distance; clustering applicationsSensitive to magnitude differences; computationally slightly more expensiveSome clustering algorithms, HNSW indexes

The practical rule of thumb: Check whether your embedding model returns normalized vectors. If it does (and most popular models do), use dot product. It gives you the same ranking as cosine similarity but is faster to compute because you skip the normalization step. If your vectors are not normalized and you do not want magnitude to influence results, use cosine similarity. Use Euclidean distance when you are doing clustering or when the actual distance magnitude matters to your application.

Visualizing Embedding Spaces

You cannot directly visualize a 384-dimensional or 1536-dimensional space. The human brain tops out at three dimensions, and even that requires squinting. But there are mathematical techniques that can compress high-dimensional data into 2D or 3D while preserving the relative distances between points. These are called dimensionality reduction techniques, and they are invaluable for understanding what your embeddings actually look like.

t-SNE (t-Distributed Stochastic Neighbor Embedding)

t-SNE is the classic algorithm for embedding visualization. It works by finding a 2D arrangement of points that preserves the neighborhood structure of the original high-dimensional data. Points that were close together in 1536 dimensions will stay close together in 2D, and points that were far apart will stay far apart.

main.py
Loading...

When you run this code, you will see three distinct clusters. The programming sentences huddle together, the cooking sentences form their own cluster, and the sports sentences group up elsewhere. This happens even though sentences like "Python is a great programming language" and "Learning to code takes practice" share very few words. The embedding model understands they are about the same topic.

UMAP (Uniform Manifold Approximation and Projection)

UMAP is a newer alternative to t-SNE that tends to better preserve global structure. While t-SNE is great at showing local neighborhoods (which points are near each other), it sometimes distorts the distances between clusters. UMAP generally keeps both local and global relationships more intact.

main.py
Loading...

t-SNE vs UMAP: When to Use Which

Scroll
Featuret-SNEUMAP
Local structure (neighborhoods)ExcellentExcellent
Global structure (cluster distances)Poor, distorts inter-cluster distancesBetter, preserves relative cluster positions
SpeedSlow on large datasets (O(n^2))Faster, scales better
ReproducibilityResults vary between runsMore stable with random_state
Best forExploring local patterns, small datasetsProduction visualizations, large datasets

A practical tip: always try both and compare. If you are presenting results to others, UMAP is usually the safer choice because it does not mislead about how far apart clusters really are.

Practical Considerations

Before you start building with embeddings, there are a few practical details that will save you headaches in production.

Dimensionality

Embedding models produce vectors of different sizes. Common dimensions include 384 (MiniLM), 768 (BERT-base), 1024 (many modern models), and 1536 (OpenAI's text-embedding-3-small). OpenAI's text-embedding-3-large goes up to 3072 dimensions.

Higher dimensionality means the model can encode finer distinctions between texts, but it comes at a cost:

  • Storage: 1 million 1536-dimensional vectors at 32-bit floats takes about 5.7 GB of RAM. At 3072 dimensions, that doubles to 11.4 GB.
  • Search speed: Computing similarity between higher-dimensional vectors takes more time. This matters when you are searching through millions of vectors.
  • Diminishing returns: Beyond a certain point, adding more dimensions does not improve quality meaningfully. A 3072-dimensional model is not twice as good as a 1536-dimensional one.

OpenAI's text-embedding-3 models support a neat feature called Matryoshka embeddings: you can truncate the vector to a shorter length (say, 512 dimensions instead of 1536) and it still works reasonably well. This lets you trade quality for speed and storage.

main.py
Loading...

Normalization

Normalization means scaling a vector so its magnitude (length) equals 1. This is important because, as we discussed, cosine similarity and dot product give the same results for normalized vectors, and many vector databases and indexes assume normalized inputs.

Most embedding APIs return normalized vectors by default. But if you are using a model that does not normalize, or if you have truncated the vectors, you should normalize them yourself:

main.py
Loading...

Batch Processing and Rate Limits

When you have thousands or millions of texts to embed, you need a strategy for efficient batch processing.

For OpenAI, the main constraints are:

  • Maximum 2048 texts per API call
  • Token limit per request (about 8191 tokens per text for text-embedding-3-small)
  • Rate limits on requests per minute and tokens per minute

For local models with sentence-transformers, GPU memory is the bottleneck. Larger batch sizes are faster but require more memory.

main.py
Loading...

For OpenAI, here is a pattern for processing large batches with rate limit handling:

main.py
Loading...

The diagram above shows the typical embedding pipeline for production use. Raw text goes in, gets split into manageable batches, passed through the embedding model, normalized, and stored in a vector database for later retrieval. We will cover vector databases in detail later in this module.

Caching Embeddings

Generating embeddings costs money (if using an API) or compute time (if running locally). Never re-embed text that has not changed. Store your embeddings alongside the source text, and only regenerate when the text is updated.

main.py
Loading...

References