An LLM does not write a whole answer all at once. It builds the answer one token at a time.
At each step, the model reads the tokens it has so far, runs them through the transformer, and produces a raw score for every token in its vocabulary. Those scores are turned into probabilities. A decoding strategy then chooses the next token.
That loop is the mechanical core of text generation:
context -> model -> token scores -> decoding -> next token -> updated context
The details matter in real applications. Temperature, top-p, stop tokens, context length, and output limits all affect this loop. When a model repeats itself, drifts away from the task, invents a citation, or stops mid-sentence, the reason is often connected to how generation is configured.