Foundation Lesson 3 of 4

Similarity & Distance

Quantifying Semantic Similarity

The standard metric for comparing word vectors is cosine similarity: the cosine of the angle between two vectors in high-dimensional space. This measure is magnitude-invariant, capturing only the directional relationship between representations.

  • 1.0 (100%) — Identical direction (maximum similarity)
  • 0.5 (50%) — Partial relatedness
  • 0.0 (0%) — Orthogonal (no measurable relationship)
  • Negative — Opposing semantic content

Calculate Similarity

Cosine Similarity --

Dimensionality Reduction and Visualization

Although word vectors occupy 50+ dimensional space, projection techniques (e.g., PCA, t-SNE) reduce them to two dimensions for visualization. Semantically related words form observable clusters.

Word Map

Select two words to highlight them on the map. Notice how similar words cluster together.

Royalty/Gender Animals Places Concepts

Applications of Vector Similarity

Semantic similarity measurement underlies several core capabilities of modern AI systems:

  • Semantic search: Retrieving documents by meaning rather than keyword matching
  • Recommendation systems: Identifying related items via representational proximity
  • Synonym resolution: Recognizing that "automobile" and "car" share a referent

Key Takeaways

  • Cosine similarity measures directional alignment between two vectors
  • Semantic relatedness corresponds to high similarity scores
  • This metric enables semantic search, recommendation, and synonym resolution