Chapter quiz · hexahype

Six questions, one per core idea of the chapter. Answer before peeking; every option explains itself.

Your turn

What does a trained LLM actually output after one forward pass?

Your turn

Lowering the temperature toward 0 does what to the output distribution?

Your turn

In self-attention, what role does a token's query vector play?

Your turn

Why can a 2,000-token prompt be processed in one fast pass, while generating 2,000 tokens takes 2,000 passes?

Your turn

What exactly does the KV cache store, and what does it trade?

Your turn

Why does doubling the context length more than double attention's cost?

You have finished Chapter 2. Next we ask where these weights come from: training.