What Are Word Vectors?
The Representation Problem
When the string "cat" is entered, the computer processes it as a
sequence of character codes (e.g., 01100011 01100001 01110100).
This encoding carries no semantic information — the system cannot
distinguish "cat" as an animal from "cat" as a Unix command.
This constitutes a fundamental barrier: language models require numeric representations that preserve semantic relationships.
The Vector Solution
The foundational approach is to represent each word as a fixed-length vector of real numbers that encodes its meaning.
These values are not arbitrary. They are learned by analyzing statistical co-occurrence patterns across billions of tokens in text corpora. Words that appear in similar distributional contexts converge to similar vector representations.
Distributional hypothesis: "Cat" and "dog" receive similar vectors because they co-occur with overlapping context words (pets, animals, veterinary). "Cat" and "democracy" occupy distant regions of the vector space.
Relevance to Language Models
Large language models such as Claude are built upon this representational foundation. All input text is first converted into vector representations before any computation occurs. This transformation is the prerequisite for semantic processing.
Understanding word vectors provides insight into:
- Why semantically similar prompts produce similar outputs
- How models capture relational structure between concepts
- The representational basis of language understanding in neural networks
Key Takeaways
- Word vectors map lexical items to fixed-length numeric representations
- Semantic similarity corresponds to geometric proximity in vector space
- This representational scheme is the foundation of all modern language models