Six questions, one per core idea of the chapter. Answer before peeking; every option explains itself.
Your turn
What does a trained LLM actually output after one forward pass?
Your turn
Lowering the temperature toward 0 does what to the output distribution?
Your turn
In self-attention, what role does a token's query vector play?
Your turn
Why can a 2,000-token prompt be processed in one fast pass, while generating 2,000 tokens takes 2,000 passes?
Your turn
What exactly does the KV cache store, and what does it trade?
Your turn
Why does doubling the context length more than double attention's cost?
You have finished Chapter 2. Next we ask where these weights come from: training.
Sign in to take the chapter quiz and check your understanding.