What Are Embeddings? Cosine Similarity Explained for Developers

Semantic search, recommendations, and Retrieval-Augmented Generation (RAG) all rest on the same idea: turn text into vectors of numbers so a computer can measure how similar two pieces of meaning are. Those vectors are embeddings, and the measurement is almost always cosine similarity.

Most explanations jump straight to vector databases. This one starts a level lower: what a vector actually is, why "similar meaning" becomes "small angle", and why cosine is the metric most people use. You can try the math on real vectors in the Cosine Similarity Calculator as you go.

From words to vectors

An embedding model takes a piece of text and outputs a fixed-length list of numbers, a vector. A small model might produce 384 numbers; larger ones produce 1,536 or more. Each number is a dimension, and you can think of each as capturing some learned feature of meaning.

The key property is this: the model is trained so that text with similar meaning lands close together in this high-dimensional space, and unrelated text lands far apart. "dog" and "puppy" end up near each other; "dog" and "tax return" don't. You don't get to read what each dimension "means", but you can measure distances, and that turns out to be enough.

Why similarity becomes an angle

Once two pieces of text are vectors, "how similar are they?" becomes a geometry question. The most useful answer is the angle between the vectors.

Cosine similarity is the cosine of that angle:

cosine_similarity(A, B) = (A · B) / (‖A‖ × ‖B‖)

Here A · B is the dot product (multiply matching components, sum them up) and ‖A‖ is the vector's length (its magnitude). The result runs from -1 to 1:

1 means same direction, as similar as it gets.
0 means perpendicular, unrelated.
-1 means opposite direction.

Because it divides by the magnitudes, cosine similarity measures direction only. Two vectors pointing the same way score 1 no matter how long they are.

Cosine vs. Euclidean vs. dot product

Three metrics show up constantly. The difference is what each one pays attention to:

Metric	Measures	Sensitive to length?
Cosine similarity	Angle (direction)	No
Dot product	Direction and magnitude	Yes
Euclidean distance	Straight-line distance	Yes

For comparing meaning, you usually want cosine. A long document and a short one about the same topic should count as similar even though their vectors have different magnitudes, and cosine ignores that magnitude. One useful fact: if you normalize every vector to unit length, the dot product equals the cosine similarity. That's why a lot of vector databases store normalized embeddings and then use a fast dot product under the hood.

A worked example

Suppose three short sentences embed (in a toy 2-D space) to:

Query: "How do I reset my password?" maps to [0.9, 0.1]
A: "Steps to recover your account login" maps to [0.8, 0.2]
B: "Best pizza toppings, ranked" maps to [0.1, 0.9]

Computing cosine similarity to the query:

Candidate	Cosine to query	Meaning
A (account recovery)	≈ 0.98	very similar
B (pizza)	≈ 0.27	unrelated

The query points the same direction as A and nearly perpendicular to B, so a semantic search ranks A first, which is what you'd expect. Real embeddings live in hundreds of dimensions, but the principle is identical. Paste any two vectors into the calculator to see it.

How this scales into RAG

Semantic search and RAG are this idea applied at scale:

Chunk your documents into passages.
Embed each chunk with an embedding model and store the vectors in a vector database.
At query time, embed the user's question the same way.
Retrieve the top-k chunks by cosine similarity to the question.
Feed those chunks to a language model as context, so it answers grounded in your data.

That retrieval step, finding the nearest vectors, is pure cosine similarity (or its normalized dot-product equivalent), just run efficiently over millions of vectors with an approximate-nearest-neighbor index.

So what?

Embeddings turn meaning into geometry. Similar text becomes nearby vectors, and cosine similarity scores how aligned two of them are, ignoring length so a short query can match a long document. It's the idea under semantic search, dedup, recommendations, and RAG. To build the intuition, run a few vectors through the Cosine Similarity Calculator. And if you're wiring up RAG, the LLM Token Counter and VRAM Calculator help you budget the model that consumes those retrieved chunks.