{"title":"What Are Embeddings?","description":"","content":"AI systems do not compare raw words the way humans read them. They turn text, images, code, and other inputs into numbers first. An **embedding** is one of those numeric representations: a vector produced by a model so that useful relationships between inputs show up as distance and direction.\n\nFor AI engineers, embeddings matter because they turn messy inputs such as questions, documents, images, products, and code snippets into something software can store, compare, search, cluster, and rank.\n\n---\n\n# From Words to Coordinates\n\n\n> **A Useful Analogy**\n>\n> Think about how GPS coordinates work. Every location on Earth gets a pair of numbers: latitude and longitude. San Francisco is (37.77, -122.42). Tokyo is (35.68, 139.69). New York is (40.71, -74.01).\n>\n> These numbers encode where each location sits on the planet. Locations that are physically close to each other have coordinates that are numerically close to each other.\n\n\nEmbeddings do something similar for model-learned similarity. An embedding model takes a piece of text, such as a word, sentence, paragraph, or document chunk, and maps it to a list of numbers. Instead of two dimensions like GPS, text embeddings usually have hundreds or thousands of dimensions.\n\nDo not take the GPS analogy too literally. Embedding space is not a clean map where every axis has a human-readable label. It is a learned coordinate system. The useful property is relational: texts that the model judges to be similar tend to have vectors that are close under a chosen similarity metric.\n\n\n\n\n\nThe key property is that texts with similar meanings can end up near each other in this space, even when they share no important words. For example, \"How do I fix a broken pipe?\" and \"Plumbing repair guide\" may land in the same neighborhood because their meanings overlap, not because the strings look alike.\n\n\n```mermaid\nflowchart TB\n subgraph Input Text\n T1[\"'fix broken pipe'\"]:::primary\n T2[\"'plumbing repair'\"]:::primary\n T3[\"'chocolate cake recipe'\"]:::primary\n end\n\n EM[Embedding
Model]:::orange\n\n subgraph Embedding Space\n E1[\"[0.82, 0.15, ..., 0.73]
Close together\"]:::teal\n E2[\"[0.79, 0.18, ..., 0.71]
Close together\"]:::teal\n E3[\"[0.12, 0.91, ..., 0.04]
Far away\"]:::red\n end\n\n T1 --> EM\n T2 --> EM\n T3 --> EM\n EM --> E1\n EM --> E2\n EM --> E3\n\n classDef primary fill:#00ceff,stroke:#000,color:#000\n classDef orange fill:#ffa94d,stroke:#000,color:#000\n classDef teal fill:#38d9a9,stroke:#000,color:#000\n classDef red fill:#ff8787,stroke:#000,color:#000\n```\n\n\nThree text inputs go into the embedding model. The two plumbing-related texts land near each other; the cake recipe lands elsewhere. This is the basis of semantic search: embed the query, embed the documents, and retrieve the documents with vectors closest to the query vector.\n\n### What Do the Dimensions Mean?\n\nIndividual dimensions do not usually have clean, human-readable meanings. They are not things like \"dimension 47 = how much this text is about plumbing.\" The useful signal comes from the whole vector.\n\nA single dimension may contribute to several patterns at once: topic, tone, language, intent, domain, or something harder to name. Those patterns are distributed across many dimensions, so simple labels are usually misleading.\n\nIn practice, you rarely need to interpret individual dimensions. You need to know whether the vector space behaves well for your task. For retrieval, relevant passages should rank ahead of irrelevant passages. For clustering, related items should group together. For anomaly detection, unusual items should separate from common ones. The vector is useful only if these relationships hold on your data.\n\n---\n\n# How Embeddings Are Created\n\nYou do not need to train your own embedding model to use embeddings. But understanding the rough training idea helps you anticipate where embeddings work well and where they fail.\n\n\n\n\n\n### The Training Process\n\nModern text embedding models are often transformer encoders or encoder-style variants. They are trained to produce nearby vectors for inputs that should match, and more distant vectors for inputs that should not.\n\nThe training data often consists of pairs or groups of related texts. These might be:\n\n- A search query and the document that answers it\n- Two paraphrases of the same sentence\n- A question and its corresponding answer\n- Sentences from the same paragraph (proximity as a signal of relatedness)\n\nDuring training, the model learns from positive and negative examples. A query and a clicked or human-labeled result should move closer. A query and an unrelated document should move farther apart. This family of objectives is usually called contrastive learning, though production models often combine several objectives and data sources.\n\n\n```mermaid\nflowchart TD\n subgraph Training Data\n P1[\"Positive Pair
'king' + 'monarch'\"]:::green\n P2[\"Negative Pair
'king' + 'bicycle'\"]:::red\n end\n\n MODEL[Transformer
Encoder]:::primary\n\n subgraph Learning Objective\n PULL[\"Pull similar
texts closer\"]:::green\n PUSH[\"Push different
texts apart\"]:::red\n end\n\n UPDATE[Update
Model Weights]:::orange\n\n P1 --> MODEL\n P2 --> MODEL\n MODEL --> PULL\n MODEL --> PUSH\n PULL --> UPDATE\n PUSH --> UPDATE\n UPDATE -->|\"Next batch\"| MODEL\n\n classDef primary fill:#00ceff,stroke:#000,color:#000\n classDef orange fill:#ffa94d,stroke:#000,color:#000\n classDef green fill:#69db7c,stroke:#000,color:#000\n classDef red fill:#ff8787,stroke:#000,color:#000\n```\n\n\nThe result is a model with a useful retrieval geometry. It can place \"dog\" near \"puppy\", distinguish \"bank\" as a financial institution from \"bank\" as a river edge when context is clear, and treat \"running a company\" differently from \"running a marathon\". At query time, it applies patterns learned during training; it is not reasoning through every comparison from first principles.\n\n### From Tokens to a Single Vector\n\nThe transformer processes tokens and produces a vector for each token position. Retrieval usually needs one vector for the whole sentence, paragraph, or chunk, not one vector per token. The model has to combine the token-level vectors into a single fixed-size vector.\n\nCommon pooling strategies include mean pooling over token vectors, using a special classification token such as `[CLS]`, or applying a model-specific pooling head. The choice matters because pooling affects what information survives in the final vector. In application code, you normally use the pooling method shipped with the model.\n\n---\n\n# Generating Embeddings with Python\n\nGenerating embeddings in practice is straightforward. The important part is using the same model, dimension setting, and preprocessing everywhere you plan to compare vectors.\n\n### Using OpenAI's Embedding API\n\nOpenAI's `text-embedding-3-small` produces 1536-dimensional vectors by default and is a practical baseline for many text retrieval systems. The larger `text-embedding-3-large` model produces 3072-dimensional vectors by default and is worth testing when retrieval quality is the bottleneck.\n\n\n**main.py**\n\n```python\nfrom openai import OpenAI\n\nclient = OpenAI() # Uses OPENAI_API_KEY from environment\n\n# Generate an embedding for a single text\nresponse = client.embeddings.create(\n model=\"text-embedding-3-small\",\n input=\"How do I fix a broken pipe?\"\n)\n\n# Extract the embedding vector\nembedding = response.data[0].embedding\nprint(f\"Dimensions: {len(embedding)}\")\nprint(f\"First 10 values: {embedding[:10]}\")\n```\n\n\nYou send text in and receive a fixed-length list of floating-point numbers. Texts embedded with the same model and dimension setting produce vectors in the same space, which is what makes comparison meaningful.\n\n### Using Sentence Transformers Locally\n\nIf you prefer not to depend on an external API, the `sentence-transformers` library gives you access to many local and open-weight embedding models.\n\n\n**main.py**\n\n```python\nfrom sentence_transformers import SentenceTransformer\n\n# Load a popular local embedding model\nmodel = SentenceTransformer(\"all-MiniLM-L6-v2\")\n\n# Generate embeddings for multiple texts at once\nsentences = [\n \"How do I fix a broken pipe?\",\n \"Plumbing repair guide\",\n \"Chocolate cake recipe\"\n]\n\nembeddings = model.encode(sentences)\nprint(f\"Shape: {embeddings.shape}\")\nprint(f\"First embedding (first 10 values): {embeddings[0][:10]}\")\n```\n\n\n**Output:**\n\n\n```plaintext\nShape: (3, 384)\nFirst embedding (first 10 values): [-0.0312 0.0678 -0.0451 0.0923 -0.0234 0.0789 -0.0567 0.0345 0.0123 -0.0891]\n```\n\n\nNotice the difference in dimensionality: OpenAI's small embedding model produces 1536-dimensional vectors by default, while `all-MiniLM-L6-v2` produces 384-dimensional vectors. Dimension count affects storage, memory bandwidth, and search cost. It does not, by itself, prove that one model is better than another. Quality depends on the model, training data, objective, and fit to your task.\n\n### Batch Processing\n\nIn real applications, you rarely embed one text at a time. You may have thousands or millions of chunks to embed when building a search index. Both API-based models and local models support batch processing.\n\n\n**main.py**\n\n```python\nfrom openai import OpenAI\n\nclient = OpenAI()\n\n# OpenAI batch embedding\ntexts = [\"First document\", \"Second document\", \"Third document\"]\nresponse = client.embeddings.create(\n model=\"text-embedding-3-small\",\n input=texts # Pass a list instead of a single string\n)\nembeddings = [item.embedding for item in response.data]\n\nprint(embeddings)\n```\n\n\nFor API models, batch size is constrained by provider limits, token limits, and rate limits. For local models with sentence-transformers, the `batch_size` parameter controls how many texts are processed together on the GPU or CPU. Larger batches improve throughput until you run out of memory.\n\n---\n\n# Measuring Similarity: How Close Is Close?\n\nOnce you have embedding vectors, you need a way to quantify how similar two of them are.\n\nThree metrics come up often in text embedding systems: cosine similarity, dot product, and Euclidean distance. They are related, but they are not interchangeable unless you understand how your vectors are normalized.\n\n### Cosine Similarity\n\nCosine similarity measures the angle between two vectors, ignoring their magnitude. It answers the question: \"Are these two vectors pointing in roughly the same direction?\"\n\nThe formula is:\n\n\n```shell\ncosine_similarity(A, B) = (A . B) / (||A|| * ||B||)\n```\n\n\nWhere `A . B` is the dot product, and `||A||` is the magnitude, or length, of vector A. The result ranges from -1 to 1. A score near 1 means the vectors point in a similar direction. A score near 0 means they are roughly orthogonal. In real embedding systems, \"near 0\" does not automatically mean \"unrelated\"; interpretation depends on the model and data.\n\nThink of two arrows starting from the origin. Cosine similarity cares about the angle between them. If they point in the same direction, cosine similarity is 1, even if one arrow is twice as long as the other. This makes it insensitive to vector magnitude, which is one reason cosine similarity is common for text embeddings.\n\n\n**main.py**\n\n```python\nimport numpy as np\n\ndef cosine_similarity(vec_a, vec_b):\n dot_product = np.dot(vec_a, vec_b)\n magnitude_a = np.linalg.norm(vec_a)\n magnitude_b = np.linalg.norm(vec_b)\n return dot_product / (magnitude_a * magnitude_b)\n\n# Using OpenAI embeddings\nfrom openai import OpenAI\n\nclient = OpenAI()\n\nsentences = [\n \"How do I fix a broken pipe?\",\n \"Plumbing repair guide\",\n \"Chocolate cake recipe\"\n]\n\nresponse = client.embeddings.create(\n model=\"text-embedding-3-small\",\n input=sentences\n)\n\nembeddings = [item.embedding for item in response.data]\n\n# Compare all pairs\nprint(f\"Pipe vs Plumbing: {cosine_similarity(embeddings[0], embeddings[1]):.4f}\")\nprint(f\"Pipe vs Cake: {cosine_similarity(embeddings[0], embeddings[2]):.4f}\")\nprint(f\"Plumbing vs Cake: {cosine_similarity(embeddings[1], embeddings[2]):.4f}\")\n```\n\n\nThe two plumbing-related sentences should have a higher similarity score than either one has with the cake recipe. The embedding model can capture the semantic relationship even though \"How do I fix a broken pipe?\" and \"Plumbing repair guide\" do not rely on exact word overlap.\n\n### Dot Product\n\nThe dot product multiplies corresponding elements and sums them. There is no normalization step.\n\n\n```shell\ndot_product(A, B) = A[0]*B[0] + A[1]*B[1] + ... + A[n]*B[n]\n```\n\n\nUnlike cosine similarity, the dot product is affected by vector magnitude. A longer vector paired with another long vector can produce a larger dot product, even if the directions are not especially similar. This means the dot product mixes two signals: direction and magnitude.\n\nThat can be useful or harmful depending on your application. If magnitude encodes a meaningful signal, such as confidence or popularity, dot product can include that signal. If magnitude is just a side effect of the model, it can add noise.\n\n\n**main.py**\n\n```python\nimport numpy as np\n\ndef dot_product(vec_a, vec_b):\n return np.dot(vec_a, vec_b)\n\n# Using OpenAI embeddings\nfrom openai import OpenAI\n\nclient = OpenAI()\n\nsentences = [\n \"How do I fix a broken pipe?\",\n \"Plumbing repair guide\",\n \"Chocolate cake recipe\"\n]\n\nresponse = client.embeddings.create(\n model=\"text-embedding-3-small\",\n input=sentences\n)\n\nembeddings = [item.embedding for item in response.data]\n\n# Compare all pairs\nprint(f\"Pipe vs Plumbing: {dot_product(embeddings[0], embeddings[1]):.4f}\")\nprint(f\"Pipe vs Cake: {dot_product(embeddings[0], embeddings[2]):.4f}\")\nprint(f\"Plumbing vs Cake: {dot_product(embeddings[1], embeddings[2]):.4f}\")\n```\n\n\n**Important note:** If your embeddings are normalized to unit length, dot product and cosine similarity produce the same ranking. OpenAI embeddings are normalized this way. Some local models are not normalized unless you request it, so check the model documentation or normalize explicitly.\n\n### Euclidean Distance\n\nEuclidean distance measures the straight-line distance between two points in the embedding space. It is the familiar distance formula from geometry, extended to high dimensions.\n\n\n```shell\neuclidean_distance(A, B) = sqrt((A[0]-B[0])^2 + (A[1]-B[1])^2 + ... + (A[n]-B[n])^2)\n```\n\n\nUnlike the previous two metrics where higher means more similar, with Euclidean distance, lower means more similar. Two identical vectors have a distance of zero, and the distance grows as they diverge.\n\n\n**main.py**\n\n```python\nimport numpy as np\n\ndef euclidean_distance(vec_a, vec_b):\n vec_a = np.array(vec_a)\n vec_b = np.array(vec_b)\n return np.linalg.norm(vec_a - vec_b)\n\n# Using OpenAI embeddings\nfrom openai import OpenAI\n\nclient = OpenAI()\n\nsentences = [\n \"How do I fix a broken pipe?\",\n \"Plumbing repair guide\",\n \"Chocolate cake recipe\"\n]\n\nresponse = client.embeddings.create(\n model=\"text-embedding-3-small\",\n input=sentences\n)\n\nembeddings = [item.embedding for item in response.data]\n\nprint(f\"Pipe vs Plumbing: {euclidean_distance(embeddings[0], embeddings[1]):.4f}\")\nprint(f\"Pipe vs Cake: {euclidean_distance(embeddings[0], embeddings[2]):.4f}\")\n```\n\n\nThe plumbing pair should have a smaller distance than the pipe-cake pair.\n\n\n```mermaid\nflowchart TD\n subgraph Euclidean Distance\n EA[\"Straight-line distance
between points\"]:::primary\n EB[\"Range: 0 to infinity\"]:::teal\n EC[\"Lower = more similar\"]:::orange\n end\n\n subgraph Dot Product\n DA[\"Multiplies and sums
corresponding elements\"]:::primary\n DB[\"Range: unbounded\"]:::teal\n DC[\"Affected by magnitude\"]:::orange\n end\n\n subgraph Cosine Similarity\n CA[\"Measures angle
between vectors\"]:::primary\n CB[\"Range: -1 to 1\"]:::teal\n CC[\"Ignores magnitude\"]:::green\n end\n\n classDef primary fill:#00ceff,stroke:#000,color:#000\n classDef teal fill:#38d9a9,stroke:#000,color:#000\n classDef orange fill:#ffa94d,stroke:#000,color:#000\n classDef green fill:#69db7c,stroke:#000,color:#000\n```\n\n\n### When to Use Which Metric\n\nChoosing the right metric depends on the model and the index you plan to use:\n\n\n| Metric | Best When | Watch Out For | Common Usage |\n|--------|-----------|---------------|---------|\n| Cosine Similarity | You want direction-based similarity and the vectors may not be unit length | Ignores magnitude, which sometimes carries useful signal | Common for text search; OpenAI recommends cosine similarity for its embeddings |\n| Dot Product | Vectors are already normalized, or magnitude is meaningful | Can give misleading results if vector lengths vary for reasons unrelated to relevance | Common in ANN libraries and vector databases, especially with normalized vectors |\n| Euclidean Distance | You need distance between points, as in many clustering algorithms | Sensitive to vector magnitude; lower means more similar | Common in k-means and L2 indexes |\n\n\n**The practical rule:** Use the same metric during evaluation, indexing, and querying. If vectors are normalized, dot product, cosine similarity, and Euclidean distance often produce identical or near-identical rankings. If vectors are not normalized, the metric choice can change results substantially.\n\n---\n\n# Visualizing Embedding Spaces\n\nYou cannot directly inspect a 384-dimensional or 1536-dimensional space. Dimensionality reduction methods project embeddings into 2D or 3D so you can look for clusters, outliers, and labeling errors. Treat these plots as debugging tools, not proof that the retrieval system works.\n\n### t-SNE (t-Distributed Stochastic Neighbor Embedding)\n\nt-SNE is a common visualization algorithm. It tries to preserve local neighborhoods: points that were near each other in the original space tend to remain near each other in 2D. It does not reliably preserve global distances, so do not read too much into the space between clusters.\n\n\n**main.py**\n\n```python\nfrom sentence_transformers import SentenceTransformer\nfrom sklearn.manifold import TSNE\nimport numpy as np\nimport matplotlib.pyplot as plt\n\nmodel = SentenceTransformer(\"all-MiniLM-L6-v2\")\n\n# Sentences across different topics\nsentences = [\n # Programming\n \"Python is a great programming language\",\n \"Java is used for enterprise applications\",\n \"JavaScript runs in the browser\",\n \"Learning to code takes practice\",\n # Cooking\n \"Bake the cake at 350 degrees\",\n \"Chop the onions and garlic finely\",\n \"Season the steak with salt and pepper\",\n \"Simmer the sauce for twenty minutes\",\n # Sports\n \"The goalkeeper saved the penalty kick\",\n \"She scored a three-point shot at the buzzer\",\n \"The marathon runner crossed the finish line\",\n \"The tennis match went to a tiebreak\",\n]\n\nlabels = [\"prog\"] * 4 + [\"cook\"] * 4 + [\"sport\"] * 4\ncolors = {\"prog\": \"#00ceff\", \"cook\": \"#ffa94d\", \"sport\": \"#69db7c\"}\n\n# Generate embeddings\nembeddings = model.encode(sentences)\n\n# Reduce to 2D with t-SNE\ntsne = TSNE(n_components=2, random_state=42, perplexity=4)\nembeddings_2d = tsne.fit_transform(embeddings)\n\n# Plot\nplt.figure(figsize=(10, 8))\nfor label in [\"prog\", \"cook\", \"sport\"]:\n mask = [l == label for l in labels]\n points = embeddings_2d[mask]\n plt.scatter(points[:, 0], points[:, 1], c=colors[label], label=label, s=100)\n\nplt.legend()\nplt.title(\"t-SNE Visualization of Sentence Embeddings\")\nplt.savefig(\"tsne_clusters.png\", dpi=150, bbox_inches=\"tight\")\nplt.show()\n```\n\n\nWhen you run this code, you should see topic-level grouping: programming sentences near other programming sentences, cooking near cooking, and sports near sports. The exact layout will vary because t-SNE is sensitive to initialization and parameters.\n\n\n```mermaid\nflowchart LR\n subgraph High Dimensional Space\n HD[\"384 or 1536
dimensions\"]:::primary\n end\n\n REDUCE[\"Dimensionality
Reduction
(t-SNE / UMAP)\"]:::orange\n\n subgraph 2D Visualization\n C1[\"Programming
Cluster\"]:::teal\n C2[\"Cooking
Cluster\"]:::green\n C3[\"Sports
Cluster\"]:::purple\n end\n\n HD --> REDUCE\n REDUCE --> C1\n REDUCE --> C2\n REDUCE --> C3\n\n classDef primary fill:#00ceff,stroke:#000,color:#000\n classDef orange fill:#ffa94d,stroke:#000,color:#000\n classDef teal fill:#38d9a9,stroke:#000,color:#000\n classDef green fill:#69db7c,stroke:#000,color:#000\n classDef purple fill:#f783ac,stroke:#000,color:#000\n```\n\n\n### UMAP (Uniform Manifold Approximation and Projection)\n\nUMAP is a common alternative to t-SNE. It often runs faster and tends to preserve more global structure, though it still creates a projection, not a faithful map of the original space.\n\n\n**main.py**\n\n```python\nimport umap\n\n# Reduce to 2D with UMAP\nreducer = umap.UMAP(n_components=2, random_state=42, n_neighbors=5, min_dist=0.3)\nembeddings_2d_umap = reducer.fit_transform(embeddings)\n\n# Plot (same plotting code as above, with embeddings_2d_umap)\nplt.figure(figsize=(10, 8))\nfor label in [\"prog\", \"cook\", \"sport\"]:\n mask = [l == label for l in labels]\n points = embeddings_2d_umap[mask]\n plt.scatter(points[:, 0], points[:, 1], c=colors[label], label=label, s=100)\n\nplt.legend()\nplt.title(\"UMAP Visualization of Sentence Embeddings\")\nplt.savefig(\"umap_clusters.png\", dpi=150, bbox_inches=\"tight\")\nplt.show()\n```\n\n\n### t-SNE vs UMAP: When to Use Which\n\n\n| Feature | t-SNE | UMAP |\n|---------|-------|------|\n| Local structure (neighborhoods) | Often good | Often good |\n| Global structure (cluster distances) | Often distorted | Usually more useful, but still approximate |\n| Speed | Can be slow on large datasets | Often faster |\n| Reproducibility | Results vary between runs | More stable with `random_state` |\n| Good fit | Exploring local patterns in small datasets | Exploring larger datasets and broad structure |\n\n\nA practical tip: use these plots to find suspicious data, duplicated content, bad labels, and topic drift. Use retrieval metrics such as Recall@K and MRR to decide whether the embeddings are good enough for the product.\n\n---\n\n# Practical Considerations\n\nBefore you start building with embeddings, a few practical details are worth getting right early.\n\n### Dimensionality\n\nEmbedding models produce vectors of different sizes. Common dimensions include 384 (MiniLM), 768 (BERT-base), 1024 (many modern models), and 1536 (OpenAI's text-embedding-3-small). OpenAI's `text-embedding-3-large` goes up to 3072 dimensions.\n\nHigher dimensionality can give a model more capacity, but dimension count is not a quality guarantee. It does create predictable costs:\n\n- **Storage:** 1 million 1536-dimensional vectors at 32-bit floats takes about 6 GB of RAM (1,000,000 x 1536 x 4 bytes). At 3072 dimensions, that doubles to about 12 GB.\n- **Search speed:** Computing similarity between higher-dimensional vectors takes more time. This matters when you are searching through millions of vectors.\n- **Diminishing returns:** More dimensions do not automatically mean better retrieval. A 3072-dimensional model is not twice as good as a 1536-dimensional one.\n\nSome models are trained so shorter versions of the vector remain useful. This is often described as Matryoshka-style dimensionality. OpenAI's `text-embedding-3` models expose this through the `dimensions` parameter, letting you trade some quality for lower storage and latency without switching models.\n\n\n**main.py**\n\n```python\n# OpenAI: Request fewer dimensions\nresponse = client.embeddings.create(\n model=\"text-embedding-3-small\",\n input=\"How do I fix a broken pipe?\",\n dimensions=512 # Truncate from 1536 to 512\n)\nshort_embedding = response.data[0].embedding\nprint(f\"Dimensions: {len(short_embedding)}\")\n```\n\n\n### Normalization\n\nNormalization means scaling a vector so its magnitude, or length, equals 1. This matters because cosine similarity and dot product give the same ranking for normalized vectors, and many vector indexes are configured with that assumption.\n\nDo not assume every provider or local model normalizes outputs. If your retrieval setup expects cosine-like behavior, normalize explicitly unless the model documentation says the vectors are already unit length:\n\n\n**main.py**\n\n```python\nimport numpy as np\n\ndef normalize(vec):\n norm = np.linalg.norm(vec)\n if norm == 0:\n return vec\n return vec / norm\n\n# Normalize a batch of embeddings\nembeddings_normalized = np.array([normalize(e) for e in embeddings])\n\n# Verify: magnitude should be ~1.0\nmagnitudes = np.linalg.norm(embeddings_normalized, axis=1)\nprint(f\"Magnitudes: {magnitudes}\")\n```\n\n\n### Batch Processing and Rate Limits\n\nWhen you have thousands or millions of texts to embed, you need a strategy for efficient batch processing.\n\nFor API-based embedding, the main constraints are:\n\n- Maximum number of inputs per request\n- Maximum tokens per input and per request\n- Requests-per-minute and tokens-per-minute rate limits\n- Retry behavior for transient provider errors\n\nFor local models with sentence-transformers, GPU or CPU memory is usually the bottleneck. Larger batch sizes improve throughput until you run out of memory.\n\n\n**main.py**\n\n```python\nfrom sentence_transformers import SentenceTransformer\n\nmodel = SentenceTransformer(\"all-MiniLM-L6-v2\")\n\n# Embed a large dataset in batches\nlarge_dataset = [f\"Document number {i} about various topics\" for i in range(10000)]\n\n# sentence-transformers handles batching internally\nall_embeddings = model.encode(\n large_dataset,\n batch_size=64,\n show_progress_bar=True,\n normalize_embeddings=True # Normalize during encoding\n)\n\nprint(f\"Embedded {len(all_embeddings)} documents\")\nprint(f\"Shape: {all_embeddings.shape}\")\n```\n\n\nFor API-based embedding, use bounded retries and keep failed batches visible. The example below is intentionally simple; production jobs should also log failures, track progress, and be safe to resume.\n\n\n**main.py**\n\n```python\nimport time\nfrom openai import OpenAI, RateLimitError, APIError\n\nclient = OpenAI()\n\ndef embed_in_batches(texts, model=\"text-embedding-3-small\", batch_size=100, max_retries=4):\n all_embeddings = []\n for i in range(0, len(texts), batch_size):\n batch = texts[i:i + batch_size]\n\n for attempt in range(max_retries):\n try:\n response = client.embeddings.create(model=model, input=batch)\n all_embeddings.extend(item.embedding for item in response.data)\n print(f\"Processed {i + len(batch)}/{len(texts)}\")\n break\n except (RateLimitError, APIError):\n if attempt == max_retries - 1:\n raise\n\n wait_seconds = 2 ** attempt\n time.sleep(wait_seconds)\n\n return all_embeddings\n```\n\n\n\n```mermaid\nflowchart LR\n INPUT[\"Raw Text
(N documents)\"]:::primary\n BATCH[\"Split into
Batches\"]:::orange\n EMBED[\"Embedding
Model\"]:::teal\n NORM[\"Normalize
Vectors\"]:::green\n STORE[\"Store in Vector
Database\"]:::purple\n\n INPUT --> BATCH\n BATCH --> EMBED\n EMBED --> NORM\n NORM --> STORE\n\n classDef primary fill:#00ceff,stroke:#000,color:#000\n classDef orange fill:#ffa94d,stroke:#000,color:#000\n classDef teal fill:#38d9a9,stroke:#000,color:#000\n classDef green fill:#69db7c,stroke:#000,color:#000\n classDef purple fill:#f783ac,stroke:#000,color:#000\n```\n\n\nThe diagram shows the usual production pipeline. Raw text is cleaned, split into batches, embedded, optionally normalized, and stored with metadata in a vector database. The metadata is not an afterthought; it is how you enforce access control, filter by tenant or source, and trace a retrieved chunk back to the original document.\n\n### Caching Embeddings\n\nGenerating embeddings costs money, compute, or both. Do not re-embed text that has not changed. Store the embedding with the source text hash, embedding model name, dimension setting, normalization choice, and chunking version. If any of those change, treat the cached vector as stale.\n\n\n**main.py**\n\n```python\nimport json\nimport hashlib\n\ndef get_or_create_embedding(text, cache, model, model_name, chunking_version):\n # Include model and chunking settings so incompatible vectors do not mix.\n cache_key = json.dumps(\n {\n \"text_hash\": hashlib.sha256(text.encode(\"utf-8\")).hexdigest(),\n \"model\": model_name,\n \"chunking_version\": chunking_version,\n },\n sort_keys=True,\n )\n\n if cache_key in cache:\n return cache[cache_key]\n\n embedding = model.encode(text).tolist()\n cache[cache_key] = embedding\n return embedding\n```\n\n\n---\n\n# Quiz\n\n---\n\n### References\n\n- [OpenAI Embeddings Guide](https://platform.openai.com/docs/guides/embeddings) - Official documentation for OpenAI's embedding models\n- [Sentence-Transformers Documentation](https://www.sbert.net/) - Library for generating sentence embeddings with open-source models\n- [UMAP Documentation](https://umap-learn.readthedocs.io/) - Uniform Manifold Approximation and Projection for dimension reduction\n- [Understanding t-SNE (Distill.pub)](https://distill.pub/2016/misread-tsne/) - Interactive guide to interpreting t-SNE visualizations\n- [The Illustrated Word2Vec (Jay Alammar)](https://jalammar.github.io/illustrated-word2vec/) - Visual introduction to word embeddings and the intuition behind them\n- [Matryoshka Representation Learning (arXiv)](https://arxiv.org/abs/2205.13147) - Paper on flexible-dimensional embeddings used in OpenAI's text-embedding-3 models","pageType":"ai-engineering"}

Get Premium