What is Attention?
Paying attention to what matters
Attention is a mechanism that allows a model to focus on relevant words when processing each word in a sentence. Think of it like highlighting the most important parts of a text.
For each word, attention calculates a set of weights that indicate how much to "pay attention" to every other word. These weights:
- Range from 0 to 1
- Sum to 1 (like a probability distribution)
- Higher weight = more relevant
Try it: Click a word
Attention Weights Visualization
Click on any word to see how much attention it pays to other words. Thicker lines = stronger attention.
How it helps understanding
When processing the word "bank" in "I went to the bank to deposit money", attention assigns high weights to words like "deposit" and "money". These words provide the context that determines "bank" means a financial institution.
In "I sat on the river bank", attention would focus on "river" and "sat", leading to a different understanding of "bank".
Without Attention
"bank" → same vector always
With Attention
"bank" → context-aware vector
Key Takeaways
- Attention calculates weights for each word pair
- Weights indicate relevance (0 to 1, sum to 1)
- This allows context-dependent word representations