AlgoMaster Logo

Structured Output from LLMs

Last Updated: March 14, 2026

Ashish

Ashish Pratap Singh

LLMs return free-form text by default. So if you ask one to extract product information from a customer review, the response might look something like this:

That is easy for a human to read, but applications usually need structured data instead: a Python dictionary, a JSON object, or a database row, where the product name is stored in one field, the price is a float, and the rating is an integer.

A common workaround is to parse the response using regex, split on colons, or search for patterns like dollar signs. But this is fragile because LLMs are non-deterministic. The wording can change from one call to the next, and even a small variation can break your parser.

main.py
Loading...

This approach has three major problems. First, LLM outputs are non-deterministic, so the phrasing may vary across calls. Second, regex-based parsing does not scale well to nested or more complex structures. Third, there is no real type safety, so values that look correct may still be malformed or inconsistent.

The better approach is to stop parsing free-form text and start telling the LLM exactly what structure you want. That is the core idea behind structured output.

The Structured Output Pipeline

Before we dive into specific approaches, here is the overall flow for getting structured data from an LLM.

You provide a prompt along with a schema that describes the expected structure. The LLM generates output (ideally JSON). A validation layer checks whether the output matches your schema. If it does, you have clean structured data. If not, you feed the validation error back to the LLM and try again.

Every approach we cover in this chapter is a variation on this pipeline. The approaches differ in where the structure enforcement happens: at the prompt level, at the decoding level, or at the validation level.

Approach 1: JSON Mode

The simplest way to get structured output is to ask the LLM to respond in JSON and tell the API to enforce it. Most providers call this "JSON mode."

How It Works

You set response_format={"type": "json_object"} in your API call. The provider then constrains the model's token generation so that the output is always valid JSON. No more half-finished strings or random prose mixed in.

main.py
Loading...

Limitations

JSON mode guarantees valid JSON, but it does not guarantee the right JSON. The model might return {"name": "Sony"} instead of {"product_name": "Sony"}. It might use a string for the price instead of a number. You have no schema enforcement, just a guarantee that json.loads() will not throw an error.

This is good enough for quick prototypes, but production applications need more control.

Approach 2: Structured Outputs with JSON Schema

JSON Schema mode goes beyond basic JSON mode. Instead of just guaranteeing valid JSON, it guarantees JSON that matches a specific schema you define. OpenRouter supports this through the response_format parameter with type: "json_schema".

How It Works: Constrained Decoding

This is not just a prompt trick. The underlying provider uses constrained decoding at the token generation level. The model literally cannot produce tokens that would violate your schema. If your schema says rating is an integer, the model will never output "rating": "four". The constraint is applied during generation, not after.

Think of it like autocomplete on a form. You can only type in the fields that exist, and each field only accepts its specified data type.

main.py
Loading...

Now you are guaranteed to get exactly these fields, with exactly these types, every single time. No missing keys, no wrong types, no surprise fields.

Approach 3: Tool-Based Extraction (OpenAI Tools Format via OpenRouter)

OpenAI's tool calling system is a powerful path to structured output. Instead of a dedicated structured output mode, you define a "tool" that the model can call, and the tool's input schema becomes your structured output schema. You force the model to call it, then read the arguments as your structured data.

Because OpenRouter exposes the OpenAI tools format for all its models, you can use this pattern with Claude, Llama, Mistral, or any other model OpenRouter supports — no provider-specific SDK needed.

How It Works

You define a tool with the structure you want. Then you tell the model to use that tool. The model "calls" the tool with the structured data as the argument. You never actually execute the tool. You just grab the structured input the model generated.

It is a clever reuse of the tool calling infrastructure. Any model that supports function calling via the OpenAI tools format can do this, and OpenRouter makes that consistent across providers.

main.py
Loading...

The key line is tool_choice={"type": "function", "function": {"name": "extract_product_review"}}. This forces the model to call that specific function, which means it must produce output matching the function's parameter schema. Because this uses the standard OpenAI tools format via OpenRouter, the same code works if you swap "anthropic/claude-haiku-4-5-20251001" for any other OpenRouter model string.

Comparing Structured Output Approaches

Now that you have seen four approaches, here is how they stack up.

Scroll
ApproachReliabilityComplexityBest For
JSON ModeMediumLowQuick prototypes, flexible schemas
JSON SchemaHighMediumStrict schema compliance, production use
Tool-BasedHighMediumModels that support function calling but not JSON Schema mode

Since everything goes through OpenRouter, you do not need to worry about provider-specific APIs. All three approaches use the same OpenAI SDK with the same client setup. The only difference is which response_format or tools parameter you pass.

The bottom line: use JSON Schema (Approach 2) when you need strict structure guarantees. Use tool-based extraction (Approach 3) as a fallback for models that do not support json_schema response format. JSON mode is fine for quick experiments but will cause problems in production if your downstream code depends on specific field names or types.

Pydantic for Validation

So far, we have been working with raw dictionaries. That is fine for simple cases, but real applications need type checking, field validation, and clean serialization. This is where Pydantic comes in.

Why Pydantic?

Pydantic is Python's most popular data validation library. You define a model class with typed fields, and Pydantic validates incoming data against that model. If the data is wrong, you get a clear error message explaining exactly what failed.

For LLM applications, Pydantic solves three problems:

  1. Type safety: A price field is a float, not a string that looks like a float
  2. Validation rules: A rating must be between 0 and 5, not 42
  3. Missing field detection: If the LLM forgets a field, you know immediately
main.py
Loading...

Notice the Field(ge=0, le=5) on rating. That constrains the value between 0 and 5. The Optional[str] on reviewer_name means the field can be missing or null. These constraints catch problems that JSON schema alone might miss.

The Instructor Library: Pydantic Meets LLMs

Writing the glue code between LLM API calls and Pydantic validation gets repetitive. The instructor library eliminates that boilerplate.  It patches the OpenAI client so you can pass a Pydantic model directly and get a validated object back.

main.py
Loading...

Under the hood, instructor converts your Pydantic model to a JSON schema, sends it to the API using structured outputs or tool calling, parses the response, and validates it against your model. If validation fails, it automatically retries (more on that shortly).

Since this goes through OpenRouter, you can switch between GPT-4o, Claude, Llama, or Gemini by changing a single model string:

main.py
Loading...

Same Pydantic model, same pattern, any model. You define your data model once and the instructor + OpenRouter combination handles the rest.

Schema-Driven Extraction

Structured output really shines when you need to extract specific entities and relationships from messy, unstructured text. Think of it as building a custom NER (Named Entity Recognition) system without training any models.

Extracting Entities

Here is a practical example: extracting people and organizations from a news article.

main.py
Loading...

Extracting Relationships

You can go further and extract not just entities, but the relationships between them.

main.py
Loading...

This pattern is powerful. You define the structure of what you want to extract, and the LLM does the hard work of reading the text and filling in the fields. The Pydantic model acts as a contract between your code and the LLM.

Error Handling and Retry Strategies

Things go wrong. Even with structured outputs and schema enforcement, you will hit edge cases. The LLM might misinterpret a field, hallucinate a value, or (in non-constrained modes) return malformed JSON. You need a strategy for handling these failures.

What Can Go Wrong

Here are the most common failure modes:

  1. Invalid JSON: The LLM returns text that is not valid JSON (mostly with JSON mode or prompt-based approaches)
  2. Missing required fields: A field exists in the schema but the LLM omits it
  3. Wrong types: A number comes back as a string, or a boolean as "yes"/"no"
  4. Failed validation rules: A rating of 11 when the schema says max is 5
  5. Hallucinated values: The LLM invents data that was not in the source text

Retry with Error Feedback

The most effective strategy is to catch the validation error and send it back to the LLM as feedback. The model sees what went wrong and corrects its output. This works because LLMs are good at following instructions, and a specific error message is a very clear instruction.

main.py
Loading...

Instructor's Built-In Retry

If you are using instructor, retry logic is built in. You just set max_retries:

main.py
Loading...

When validation fails, instructor automatically appends the validation error to the conversation and retries. It does exactly what our manual implementation does, but in one line.

Choosing a Retry Strategy

Scroll
StrategyWhen to UseTradeoff
No retryConstrained decoding (OpenAI structured outputs)Schema errors are impossible, but hallucination still possible
Retry with error feedbackJSON mode or tool-based extractionCosts extra API calls, but usually fixes the problem in 1-2 retries
Retry with re-promptingComplex extraction where the model misunderstands the taskMore expensive, but handles semantic errors
Fallback to different modelWhen primary model consistently failsAdds latency and complexity

A reasonable default: use OpenAI structured outputs for strict schema compliance, layer Pydantic on top for business rule validation, and set max_retries=2 via instructor for the rare cases where something slips through.

Putting It All Together: Building a Resume Parser

Let's combine everything we have learned into a practical application. You will build a resume parser that takes raw resume text and extracts structured data into a validated Pydantic model.

Step 1: Define the Schema

main.py
Loading...

Step 2: Build the Extractor

main.py
Loading...

Step 3: Test with Real Data

main.py
Loading...

Step 4: Handle Edge Cases

Real resumes are messy. Some have no email, some list skills in paragraph form, some have gaps in employment. Your Pydantic model already handles optional fields, but you should also think about what happens when extraction fails.

main.py
Loading...