Last Updated: March 14, 2026
Not all prompts are created equal. Two people can ask the same model to perform the same task and get dramatically different results. The difference usually comes down to how the prompt is structured.
An effective prompt is not just a question. It is a carefully constructed input that tells the model what role it should take, what task it should perform, what constraints it should follow, and what kind of output it should produce.
This chapter covers the building blocks that make prompts work, and more importantly, a systematic approach to writing and improving them instead of relying on trial and error.
Every effective prompt is built from some combination of five components. Not every prompt needs all five, but knowing what each one does lets you reach for the right tool when the model is not giving you what you want.
Each component is worth looking at with concrete before-and-after examples.
The role tells the model what perspective to adopt. Surprisingly powerful, because it activates different "regions" of the model's training data. Asking a question as-is versus asking it from the perspective of a specific expert produces noticeably different outputs.
Without role:
A generic answer. It might cover mobile push notifications, email, or even physical alarm systems. The model has no idea what kind of notification system you mean.
With role:
Now the model frames the answer around message queues, delivery guarantees, fan-out patterns, and retry strategies. The role did not just change the tone. It changed the entire substance of the answer.
The task is the core of your prompt. It tells the model exactly what to do. The most common mistake here is being too vague about the desired action.
Compare these two versions:
Vague: "Tell me about database indexing."
Clear: "Explain how B-tree indexes work in PostgreSQL. Cover how the index is structured, how lookups traverse it, and what types of queries benefit most from B-tree indexes."
The vague version could produce anything from a one-paragraph summary to a textbook chapter. The clear version tells the model exactly what ground to cover.
Context is the background information the model needs to do its job well. Without context, the model fills in the blanks with assumptions, and those assumptions are often wrong.
Without context:
With context:
The first version might produce something like "Upload failed. Error code 413." The second version produces something a nurse can actually understand and act on.
Format specification tells the model how to structure its output. In production, this matters a lot because you often need to parse the model's response programmatically.
Without the format specification, the model might give you a paragraph of prose. With it, you get a structured response you can present directly in a UI or parse into sections.
Constraints tell the model what NOT to do. They are your safety net against the model's tendency to be overly helpful, verbose, or off-topic.
Constraints are especially important in production applications where the model interacts directly with users. Without guardrails, models will happily make things up, reveal information they should not, or go on tangents.
A prompt that uses all five components together:
Not every prompt needs all five components. A quick question might only need a task. A production prompt for a customer-facing chatbot probably needs all five. The skill is knowing which components to add when the output is not what you need.
In the OpenAI client library, the messages array came up with its three roles: system, user, and assistant. But when should instructions go in the system prompt versus the user prompt?
The system prompt sets the model's persistent behavior. Think of it as configuring the model for a session. It stays the same across many user interactions.
The user prompt contains the specific request. It changes with every interaction.
A practical way to think about it:
An important detail: the system prompt is sent with every API call. There is no "session" on the server side. If your application has a multi-turn conversation, you send the system prompt at the beginning of the messages array every single time. This means your system prompt consumes tokens on every request.
For applications with long conversations, a lengthy system prompt becomes expensive. If your system prompt is 500 tokens and you make 20 turns in a conversation, that is 10,000 tokens just for system prompt repetition. Worth thinking about when designing production systems.
There is a tension in prompt writing. Too vague, and the model guesses what you want. Too rigid, and you get robotic, over-constrained output that misses the spirit of the request. The best prompts land somewhere in the middle.
The same task at five levels of specificity shows how this plays out.
"Write something about APIs."
This could produce anything. An essay, a poem, a tutorial, a rant. The model has zero signal about what you actually want.
"Write a blog post about REST APIs."
Now we have a format (blog post) and a topic (REST APIs), but we still do not know the audience, length, depth, or purpose.
"Write a 500-word blog post explaining REST APIs to junior developers. Cover what REST stands for, the main HTTP methods, and a simple example using a bookstore API."
This gives the model a clear target. Audience, length, scope, and an example domain are all specified.
"Write a 500-word blog post explaining REST APIs to junior developers who know basic HTTP but have never built an API. Cover: (1) what REST stands for and the core principles, (2) GET, POST, PUT, DELETE with one-sentence descriptions, (3) a bookstore API example showing a GET request and response. Use a conversational tone. Do not use jargon without defining it first."
For most production applications, this is the sweet spot. The model knows exactly what to produce.
"Write exactly 500 words. First paragraph must be 3 sentences. Use the word 'API' exactly 12 times. Every section header must be a question. The third paragraph must contain a code block of exactly 4 lines. End with a question starting with 'Have you...'."
At this level, the model spends all its effort satisfying arbitrary constraints instead of producing good content. The output reads like a machine following rules, because it was.
The right level of specificity depends on your use case:
The general principle: specify WHAT you want clearly, but leave room for HOW the model delivers it.
Vague instructions produce vague outputs. A few concrete techniques go a long way toward making instructions unambiguous.
Start your instructions with action verbs. Instead of "It would be nice if you could explain...", write "Explain...". Instead of "Can you maybe list some...", write "List the top 5...".
LLMs respond well to direct instructions. You are not being rude. You are being clear.
When a task has multiple parts, break it into numbered steps. Numbering helps the model organize its thinking and makes it less likely to skip steps.
Monolithic instruction (prone to missing parts):
Decomposed instruction (reliable):
The decomposed version is longer, but it produces a complete, structured analysis every time. The monolithic version often skips one of the three requested items.
If you need structured output, show the model exactly what you want. The most reliable approach is to include a template or example in your prompt.
When the format is this explicit, the model follows it consistently. Critical for production systems where downstream code needs to parse the response.
When your prompt includes data the model should process (not interpret as instructions), use clear delimiters. This prevents prompt injection and makes it obvious where instructions end and data begins.
The delimiters ---BEGIN REVIEW--- and ---END REVIEW--- make it clear that the text between them is data, not instructions. Without delimiters, a malicious review could attempt to hijack the model's behavior.
Sometimes the most efficient way to show the model what you want is to give it examples. Known as few-shot prompting, this is one of the most effective techniques available.
The terminology is simple:
The difference is clearest with a real task: classifying customer support tickets.
For straightforward cases, zero-shot works fine. But ambiguous tickets are trickier. The model might classify a billing-related login issue as TECHNICAL when your team considers it BILLING.
The fifth example is the key one. It shows a login issue that is classified as BILLING because the root cause was billing-related. This teaches the model to look deeper than surface-level keywords.
Few-shot prompting works best when:
More examples are not always better. Here are the tradeoffs:
A good rule of thumb: start with zero-shot. If the output is not consistent enough, add one or two examples. Only go beyond three examples if you are dealing with a genuinely ambiguous classification task.
The diagram above shows the decision flow. Start simple and only add complexity when you have evidence that it is needed. We will go much deeper into advanced few-shot techniques, chain-of-thought prompting, and self-consistency in the next chapter.
The biggest thing to internalize about prompts: they are code, not magic. You do not write a prompt once and hope it works. You write a draft, test it, analyze the failures, and refine. Just like debugging software.
It follows a simple loop:
A concrete example makes this tangible. Suppose you are building a feature that generates product descriptions from raw specifications.
Running this against several products reveals three problems:
Run v2 against the same products plus new edge cases:
If v2 still has issues, add more constraints or examples. Maybe you discover that the model struggles with pricing formats, so you add a constraint: "Display the price as stated, do not convert or reformat it." Maybe technical specs need a rule: "For technical terms, add a brief parenthetical explanation."
This iterative process never really ends. As new edge cases surface in production, you update the prompt. Treating prompts as code, version-controlled, tested, and reviewed, pays off quickly.