Six questions, one per core idea of the chapter. Answer before peeking; every option explains itself.
Your turn
Why do modern tokenizers use subwords instead of whole words?
Your turn
In one round of BPE training, what gets merged?
Your turn
You encode text with GPT-2's tokenizer and feed the IDs into Llama. What happens?
Your turn
The embedding matrix of a model is E β R^(VΓd). What does one row of E hold?
Your turn
Cosine similarity between vectors (1, 2) and (2, 4) equals 1. Why?
Your turn
Why must positional information be added to token embeddings?
You have finished Chapter 1. In the next chapter we open the model itself and follow these vectors through the transformer.
Sign in to take the chapter quiz and check your understanding.