Chapter quiz · hexahype

Six questions, one per core idea of the chapter. Answer before peeking; every option explains itself.

Your turn

Why do modern tokenizers use subwords instead of whole words?

Your turn

In one round of BPE training, what gets merged?

Your turn

You encode text with GPT-2's tokenizer and feed the IDs into Llama. What happens?

Your turn

The embedding matrix of a model is E ∈ R^(V×d). What does one row of E hold?

Your turn

Cosine similarity between vectors (1, 2) and (2, 4) equals 1. Why?

Your turn

Why must positional information be added to token embeddings?

You have finished Chapter 1. In the next chapter we open the model itself and follow these vectors through the transformer.