Foundation Lesson 1 of 4

What Are Word Vectors?

The Representation Problem

When the string "cat" is entered, the computer processes it as a sequence of character codes (e.g., 01100011 01100001 01110100). This encoding carries no semantic information — the system cannot distinguish "cat" as an animal from "cat" as a Unix command.

This constitutes a fundamental barrier: language models require numeric representations that preserve semantic relationships.

The Vector Solution

The foundational approach is to represent each word as a fixed-length vector of real numbers that encodes its meaning.

"cat"

→

[0.2, -0.5, 0.8, 0.1, ...]

These values are not arbitrary. They are learned by analyzing statistical co-occurrence patterns across billions of tokens in text corpora. Words that appear in similar distributional contexts converge to similar vector representations.

Distributional hypothesis: "Cat" and "dog" receive similar vectors because they co-occur with overlapping context words (pets, animals, veterinary). "Cat" and "democracy" occupy distant regions of the vector space.

Relevance to Language Models

Large language models such as Claude are built upon this representational foundation. All input text is first converted into vector representations before any computation occurs. This transformation is the prerequisite for semantic processing.

Understanding word vectors provides insight into:

Why semantically similar prompts produce similar outputs
How models capture relational structure between concepts
The representational basis of language understanding in neural networks

Key Takeaways

Word vectors map lexical items to fixed-length numeric representations
Semantic similarity corresponds to geometric proximity in vector space
This representational scheme is the foundation of all modern language models