Similarity & Distance
Quantifying Semantic Similarity
The standard metric for comparing word vectors is cosine similarity: the cosine of the angle between two vectors in high-dimensional space. This measure is magnitude-invariant, capturing only the directional relationship between representations.
- 1.0 (100%) — Identical direction (maximum similarity)
- 0.5 (50%) — Partial relatedness
- 0.0 (0%) — Orthogonal (no measurable relationship)
- Negative — Opposing semantic content
Calculate Similarity
Dimensionality Reduction and Visualization
Although word vectors occupy 50+ dimensional space, projection techniques (e.g., PCA, t-SNE) reduce them to two dimensions for visualization. Semantically related words form observable clusters.
Word Map
Select two words to highlight them on the map. Notice how similar words cluster together.
Applications of Vector Similarity
Semantic similarity measurement underlies several core capabilities of modern AI systems:
- Semantic search: Retrieving documents by meaning rather than keyword matching
- Recommendation systems: Identifying related items via representational proximity
- Synonym resolution: Recognizing that "automobile" and "car" share a referent
Key Takeaways
- Cosine similarity measures directional alignment between two vectors
- Semantic relatedness corresponds to high similarity scores
- This metric enables semantic search, recommendation, and synonym resolution