BERT (an AI model) reads an entire sentence at once rather than word-by-word, allowing it to understand the complete context. By looking left and right simultaneously, it easily guesses missing words and catches subtle meanings.
Step through a live visual of how BERT processes a sentence — from raw tokens to attention patterns to final word predictions.
This is the actual bert-base-uncased model — the same one Google uses. It downloads once (~90 MB) and runs entirely in your browser.
💡 This is exactly what the next slides explain step-by-step — how BERT went from a sentence to these probabilities.
An AI will predict the missing word. But first — what’s your guess?
Tokenization: Converting words and sub-words into unique numbered IDs that the AI can mathematically understand.
The AI converts every token into a location in meaning space. Words that belong together end up close together.
✨ Notice: cat lives in Animal Alley, sat is in Action Street, and ??? floats in Furniture Lane — right next to mat, floor and chair. The AI already feels it should be something you sit on.
Before guessing the blank, the AI asks every word: “Who should I pay attention to?”
BERT runs 12 attention heads at the same time. Each head specialises in a different linguistic feature — subject, verb, grammar — then all 12 answers are merged back into one rich representation.
Each card specialises in a different feature. By splitting 768 dims into 12 channels of 64, BERT doesn't have to choose between tracking the noun or the verb — it tracks both simultaneously.
Here is the actual code that predicts the missing word. Click Run to execute it right here — no server needed.
Every word in the vocabulary gets a probability score. The word with the highest score wins. But before we reveal it…
Our demo showed you the key ideas — here's how BERT puts them all together in one pass.
BERT and its descendants power these tools you use right now:
Toggle sentence groups on or off. Watch the AI change its prediction live. Same model. Same architecture. Different data = different answer.