Tokens, Context Windows, and Temperature

18 min

What you will learn

Explain what tokens are and why they matter for cost and limits
Understand context windows and their practical implications
Know what temperature controls and when to adjust it

1 of 7

Tokens: The Currency of AI

Every interaction you have with an AI model — every question you ask, every document you paste, every answer you receive — is measured in tokens. If you want to understand AI costs, speed, and limitations, you need to understand tokens. They are the fundamental unit of everything.

What Exactly Is a Token?

A token is a chunk of text that the AI model processes as a single unit. Tokens are not the same as words. They are not the same as characters. They are sub-word units created by a process called tokenization, which breaks text into pieces that balance vocabulary size with efficiency.

Here is how tokenization actually works with real examples:

"cat" = 1 token (common short words are single tokens)
"hello" = 1 token
"unconstitutional" = 4 tokens ("un" + "const" + "itution" + "al")
"AI" = 1 token
"artificial intelligence" = 2 tokens
"New York City" = 3 tokens
"123456789" = 3-4 tokens (numbers are tokenized in chunks)
Emojis like "😀" = 1-3 tokens depending on the emoji
Code like function calculateTotal(items) { = roughly 7-8 tokens

The key insight: common words and word fragments get their own token, while rare or long words get split into multiple tokens. This is why "the" is always 1 token but "pneumonoultramicroscopicsilicovolcanoconiosis" might be 12+ tokens.

Token Counting Rules of Thumb

You do not need a calculator. Use this table for quick estimates:

Content Type	Approximate Token Count
1 English word	~1.3 tokens (on average)
1 page of text (~500 words)	~650-700 tokens
A short email (100 words)	~130 tokens
A one-page memo (400 words)	~520 tokens
A 10-page report	~6,500-7,000 tokens
A full-length novel (80,000 words)	~100,000 tokens
1 line of Python code	~10-15 tokens
100 lines of code	~1,000-1,500 tokens
1,000 tokens	~750 words
1 token	~4 characters (in English)

Important caveats: - Non-English languages typically use more tokens per word. Chinese, Japanese, and Korean can use 2-3x more tokens for the same meaning. - Code tends to use more tokens per "concept" than prose because of syntax characters. - Structured data (JSON, XML) is token-heavy due to brackets, keys, and formatting characters. - Whitespace and punctuation consume tokens too.

Why Tokens Matter: The Three Costs

Cost #1: Money. You pay per token — both input tokens (what you send) and output tokens (what the model generates). Output tokens are typically 3-5x more expensive than input tokens. A typical GPT-4 API call processing a 2,000-word document and generating a 500-word response costs roughly $0.05-0.15 depending on the model. That sounds small, but at scale — say 10,000 calls per day — it adds up to $500-1,500 daily.

Cost #2: Speed. More tokens = slower responses. The model generates tokens one at a time (even if it displays them in chunks). A 100-token response takes about 1-2 seconds. A 2,000-token response takes 15-30 seconds. If you ask for a 5,000-word essay, you will wait.

Cost #3: Limits. Every model has a maximum context window (more on this below). Tokens consumed by your input reduce the space available for the model's output. If you paste a 50-page document into a model with a 4K token limit, it literally cannot process it.

←→navigatespacecontinue

Knowledge check

1 of 2

What happens when a conversation exceeds the context window?

Key takeaway

Tokens determine cost and limits, context windows determine how much the AI can 'remember,' and temperature controls creativity vs consistency.

Practice Exercise

Hands-on practice — do this now to lock in what you learned

Open an AI assistant and try this:

Try this to see temperature in action (if your tool allows temperature settings): Ask the same question 3 times: "Give me a one-sentence tagline for a coffee shop." At low temperature, you'll get very similar answers each time. At high temperature, you'll get noticeably different ones. If you can't adjust temperature, just notice the natural variation — that's the default temperature at work.

Open in ChatGPT

Or try it in:Claude Gemini Perplexity

+10 XP when completed

Write your prompt attempt here (optional)