Last Updated: March 13, 2026
As AI systems grow in complexity, maintaining clear and reliable code becomes increasingly important. Type hints help make Python code easier to understand by explicitly specifying the types of variables, function arguments, and return values. This improves readability, enables better tooling support, and helps catch errors early through static analysis.
However, type hints alone do not enforce correctness at runtime. This is where Pydantic becomes useful. Pydantic builds on Python’s type hints to provide powerful data validation and parsing, ensuring that inputs conform to the expected structure and types.
In this chapter, you will learn how to use type hints effectively and how Pydantic turns them into robust, runtime-validated data models that make your AI applications safer and easier to maintain.
Type hints were introduced in Python 3.5 via PEP 484. They are completely optional, they do not affect runtime behavior, and Python itself ignores them. So why bother?
Because your tools do not ignore them. IDEs like VS Code and PyCharm use type hints for autocomplete, inline documentation, and error detection. Static analyzers like mypy and pyright catch type errors before you run the code. And libraries like Pydantic and FastAPI use them as the source of truth for data validation.
The syntax is straightforward. A colon after a variable or parameter name, followed by the type:
Notice the -> str after the parameter list. That is the return type annotation. It tells callers exactly what to expect back.
For lists, dicts, sets, and tuples, you annotate the contents too. Since Python 3.9, you can use the built-in types directly.
The key difference from untyped code: list[str] tells your IDE that every element is a string, so it can autocomplete string methods when you iterate.
In real-world AI code, many values can be absent. An LLM response might not include a finish_reason field. A configuration parameter might be unset. You need a way to express "this is a string, or it might be None."
Optional[str] means "str or None." It is the most common type hint you will see in AI libraries.
Optional[str] is actually shorthand for Union[str, None]. But Union is more general, it works with any combination of types:
Starting with Python 3.10, you can use the pipe operator | instead of Union. It is cleaner and more intuitive:
Throughout this course, we will use the | syntax when targeting Python 3.10+ and Optional/Union when backward compatibility matters.
As your type annotations get more complex, they start to clutter the code. Type aliases and generics keep things readable.
A type alias is just a variable that holds a type. It gives a name to a complex type so you do not repeat it everywhere:
In AI code, you will often create aliases for common structures like chat messages, embedding vectors, or token sequences:
TypeVar lets you write functions that work with any type while maintaining type safety.
You can constrain a TypeVar to specific types:
For more complex scenarios, you can create generic classes. This pattern shows up in AI libraries that need to handle different data types through the same interface:
You do not need to master generics right away. But understanding the pattern helps when you read library source code, which uses generics heavily.
Type hints alone are advisory. Python ignores them at runtime. Pydantic changes that. It reads your type annotations and enforces them when data enters your system, validating types, coercing values, and raising clear errors when something does not match.
This is why Pydantic has become the backbone of modern Python AI tooling. FastAPI uses it for request/response validation. LangChain uses it for chain configuration. The OpenAI SDK uses it for structured output. The instructor library is built entirely on top of it. If you are doing AI engineering in Python, Pydantic is one of the most useful libraries to know.
A Pydantic model is a class that inherits from BaseModel. Each field is a class attribute with a type annotation:
So far, this looks like a regular dataclass. Here is where Pydantic earns its keep.
Pydantic validates every field against its type annotation. If the data does not match, you get a detailed error. If it can be safely converted (like a string "7" to an integer 7), Pydantic handles that automatically:
This matters in AI engineering because data comes from many sources: user input, API responses, config files, environment variables. Each source has its own quirks. Pydantic normalizes everything into the types you declared.
Here is how the validation pipeline works:
Raw input enters from the left. Pydantic checks types and attempts coercion. If everything passes, you get a validated model instance with proper types. If not, you get a ValidationError that tells you exactly which field failed and why. Once you have a model instance, you can serialize it back to a dict or JSON string.
The Field() function gives you fine-grained control over each field: default values, validation constraints, descriptions, and more.
Common Field constraints:
The description field is especially important in AI work. When you use Pydantic models to define structured output schemas for LLMs, those descriptions become part of the prompt that guides the model's output.
Real-world data is rarely flat. An LLM response contains choices, which contain messages, which contain tool calls, which contain function arguments. Pydantic handles nesting naturally, you just use one model as a field type in another.
Notice that you can pass raw dictionaries for nested models. Pydantic automatically converts them into the appropriate model instances. This is especially useful when parsing JSON responses from APIs.
Sometimes type checking and constraints are not enough. You need custom logic: ensuring two fields are consistent, transforming values, or validating against an external source. Pydantic provides two kinds of validators.
A @field_validator runs on a single field. It receives the value and can transform it or raise an error:
The validator returns the (possibly transformed) value. If it raises a ValueError, Pydantic includes that message in the validation error.
A @model_validator runs on the entire model, after all field validators have passed. This is useful for cross-field validation:
The mode="after" means this runs after all fields are validated. You can also use mode="before" to transform the raw input dict before field validation.
Getting data into Pydantic models is half the story. You also need to get data out, as dictionaries for database inserts, as JSON strings for API responses, or as new model instances from raw data.
Going the other direction, from raw data into a model:
This is the pattern you will use constantly: receive JSON from an API, validate it into a model, work with typed attributes, then serialize it back out when needed.
AI applications have a lot of configuration: API keys, model names, temperature defaults, chunk sizes, database URLs, vector store endpoints. Pydantic Settings loads these from environment variables and .env files with full validation.
This replaces the manual os.getenv() calls and dotenv.load_dotenv() pattern you see in many tutorials. With Pydantic Settings, your config is validated on startup. If a required API key is missing, you get a clear error immediately instead of a cryptic None failure ten minutes into a pipeline.
Your .env file looks like this:
This is where everything comes together. One of the most common patterns in AI engineering is defining a Pydantic model that describes the structure you want an LLM to produce, then passing that schema to the model and validating the response.
Let's build a practical example: extracting structured information from a technical document.
Now you can pass this schema to an LLM using any of the structured output approaches we will cover in Module 1. The Field descriptions guide the model on what to produce for each field. The constraints (min_length, max_length) get validated after the model responds.
Here is how this model looks when used with the OpenAI SDK (a preview of what is coming in the next module):
The LLM generates JSON that matches the DocumentAnalysis schema. Pydantic validates it. You get a fully typed object with autocomplete, attribute access, and serialization. No regex, no string parsing, no guessing.
This pattern, defining Pydantic models as output schemas, is foundational. You will use it in every module from here on out.