LLM Basics Lesson 1 of 3

How LLMs Work

Overview

Effective use of LLM-based tools such as Claude Code requires a foundational understanding of their internal mechanisms. This lesson examines three core aspects of Large Language Models: tokenization, probabilistic generation, and context windows.

Definition and Operating Principle

A Large Language Model is a probabilistic system that predicts the next token given an input sequence. Trained on large-scale text corpora, the model reproduces statistically likely continuations rather than performing semantic comprehension. The fundamental operation is pattern matching, not reasoning in the human sense.

Note: Anthropic's current Claude 4 family comprises three tiers: Opus (highest-accuracy reasoning), Sonnet (balanced accuracy and speed), and Haiku (optimized for speed and cost efficiency). Model selection should be guided by the requirements of the task at hand.

Tokenization

LLMs process text in units called tokens. In English, one token corresponds to approximately 3–4 characters; the word "understanding" is typically segmented into "under" + "stand" + "ing". In Japanese, a single character often maps to 1–2 tokens.

Token count directly affects both cost calculation and context window consumption. Managing input volume is therefore a practical concern in production workflows.

Probabilistic Generation

LLMs do not retrieve answers from a database. Instead, they generate output one token at a time, computing at each step: "Given the sequence so far, which token has the highest probability of occurring next?"

A temperature parameter controls the degree of randomness in token selection. Consequently, the same prompt may yield different outputs across invocations.

Context Window

The context window defines the total number of tokens a model can reference in a single inference pass. This includes the prompt, conversation history, and any shared files. The Claude 4 family supports a context window of 200K tokens. However, as discussed in subsequent lessons, output quality begins to degrade well before this upper bound is reached.

Important: Increasing context volume does not guarantee improved output quality. The inclusion of irrelevant information disperses the model's attention and can reduce response accuracy.

Key Takeaways

  • LLMs are token-prediction models driven by statistical patterns, not semantic understanding
  • The Claude 4 family consists of three tiers: Opus, Sonnet, and Haiku
  • Tokens serve as the fundamental unit for both cost and context capacity
  • Larger context does not necessarily produce higher-quality output