Attention Lesson 2 of 4

What is Attention?

Paying attention to what matters

Attention is a mechanism that allows a model to focus on relevant words when processing each word in a sentence. Think of it like highlighting the most important parts of a text.

For each word, attention calculates a set of weights that indicate how much to "pay attention" to every other word. These weights:

  • Range from 0 to 1
  • Sum to 1 (like a probability distribution)
  • Higher weight = more relevant

Try it: Click a word

Attention Weights Visualization

Click on any word to see how much attention it pays to other words. Thicker lines = stronger attention.

How it helps understanding

When processing the word "bank" in "I went to the bank to deposit money", attention assigns high weights to words like "deposit" and "money". These words provide the context that determines "bank" means a financial institution.

In "I sat on the river bank", attention would focus on "river" and "sat", leading to a different understanding of "bank".

Without Attention

"bank" → same vector always

With Attention

"bank" → context-aware vector

Key Takeaways

  • Attention calculates weights for each word pair
  • Weights indicate relevance (0 to 1, sum to 1)
  • This allows context-dependent word representations