LLM Basics Lesson 1 of 3

How LLMs Work

What this lesson teaches

Before using tools like Claude Code effectively, you need to understand what's actually happening inside. This lesson explains the core mechanics of Large Language Models (LLMs) in simple terms.

What is an LLM?

A Large Language Model is a program that predicts the next word (or "token") based on what came before. That's it. No magic, no thinking—just very sophisticated pattern matching.

Key insight: LLMs don't "understand" anything. They predict what text is likely to come next based on patterns learned from training data.

Tokens: The building blocks

LLMs don't read words—they read tokens. A token is roughly 3-4 characters. The word "understanding" might be split into "under" + "stand" + "ing".

Why does this matter? Because you pay for tokens, and the model's memory (context window) is measured in tokens. More tokens = more cost, and eventually you hit limits.

Probability: How responses are generated

When you ask Claude a question, it doesn't retrieve an answer from a database. It generates text one token at a time, each time asking: "Given everything so far, what token is most likely next?"

This is why the same prompt can produce slightly different outputs—there's randomness (called "temperature") in the selection process.

Context window: The memory limit

The context window is everything the model can "see" at once: your prompt, the conversation history, any files you've shared. Claude's context window is large (200K tokens), but it's not infinite.

Important: Bigger context doesn't mean better results. More information can actually confuse the model and reduce output quality.

Key Takeaways

  • LLMs predict the next token based on patterns—they don't "think"
  • Tokens are the unit of measurement for input, output, and cost
  • Context window = the model's working memory (limited)
  • More context ≠ better output