Last Updated: March 13, 2026
As Python projects grow larger, organizing code becomes just as important as writing it. This is where object-oriented programming (OOP) helps. By modeling real-world entities as classes and objects, you can structure code in a way that is modular, reusable, and easier to maintain.
Python supports all the core OOP concepts such as classes, encapsulation, inheritance, and polymorphism. These ideas are widely used in AI codebases for organizing components like datasets, models, pipelines, and services.
Python also provides dataclasses, a feature that makes it much easier to define classes that primarily store data.
In this chapter, you will learn how object-oriented design works in Python and how dataclasses simplify many common programming patterns.
The first thing you will notice is how little ceremony Python requires. There are no access modifiers, no type declarations on fields (unless you want them), and no separate constructor keyword. Here is a minimal class:
Three things stand out compared to other languages.
Every instance method explicitly receives self as its first argument. In many languages, this is implicit. Python makes it explicit. You have to write it in the method signature, and you access instance attributes through it. Forget self in the signature and Python will throw an error about missing arguments.
There is no private, protected, or public. By convention, a leading underscore means "treat this as internal" (self._cache), and a double leading underscore triggers name mangling (self.__secret becomes self._Model__secret), but nothing actually prevents access.
Python does not care about an object's class. It cares about what the object can do. If it has a __len__ method, you can call len() on it. If it has __getitem__, you can index into it. This is why you can pass any object with the right methods to a framework function, even if it does not inherit from any particular base class.
Dunder methods (short for "double underscore") are Python's protocol system. They let your objects plug into built-in operations like print(), len(), ==, and for loops. Frameworks like PyTorch and Hugging Face define their contracts through these methods.
__init__, __repr__, and __str__You have already seen __init__. It initializes an object after creation. But what happens when you print an object?
Not helpful. That output tells you nothing about the object's state. This is where __repr__ and __str__ come in.
__repr__ is for developers. It should produce output that ideally could recreate the object. Python uses it in the REPL and in debuggers.__str__ is for users. It produces a human-friendly description. print() calls __str__ first, falling back to __repr__ if __str__ is not defined.The !r format specifier inside the f-string calls repr() on that value, which adds quotes around strings. It is a common pattern for __repr__ implementations.
__eq__ and __hash__By default, Python compares objects by identity (memory address), not by value. Two objects with the same data are not considered equal unless you define __eq__:
__hash__ matter? If you want to use objects as dictionary keys or store them in sets, Python requires __hash__. The rule is simple: if two objects are equal (__eq__ returns True), they must have the same hash. If you define __eq__ without __hash__, Python sets __hash__ to None, making the object unhashable.
This comes up in practice when you want to cache LLM responses keyed by configuration, or when you want to deduplicate a list of configurations.
__len__ and __getitem__: Making Objects Behave Like CollectionsThis is where things get directly relevant to AI frameworks. PyTorch's DataLoader expects dataset objects to support two operations: "how many items?" and "give me item at index i." It does not check inheritance. It checks for __len__ and __getitem__.
When you define __getitem__, Python automatically makes the object iterable (it tries index 0, then 1, and so on until it gets an IndexError). This is duck typing at work. PyTorch's DataLoader never checks isinstance(dataset, SomeBaseClass). It just calls len(dataset) and dataset[i], and if those work, everything is fine.
Python supports multiple inheritance, which is more powerful but also more complex than single inheritance. For this course, you mostly need single inheritance and a basic understanding of how Python resolves methods.
When you call model.predict(), Python looks for the method in this order: the instance, then the class (SentimentModel), then the parent class (BaseModel), and so on up the chain. With multiple inheritance, this order follows the C3 linearization algorithm, which you can inspect with ClassName.__mro__ or ClassName.mro().
You will rarely need to think about MRO directly, but it explains why super() works the way it does in Python. When you call super().__init__(), Python does not simply call the parent class. It calls the next class in the MRO, which matters when multiple inheritance is involved.
@property for Computed AttributesThe @property decorator lets you define methods that behave like attributes. This is useful when a value depends on other attributes and should always be up to date:
The advantage over storing steps_per_epoch in __init__ is that it stays correct even if you change batch_size or total_samples later. You will see this pattern in training configuration classes throughout ML frameworks.
@classmethod and @staticmethodA @classmethod receives the class itself as the first argument (conventionally called cls) instead of an instance. The most common use case is alternative constructors:
A @staticmethod is just a regular function that happens to live inside a class namespace. It does not receive self or cls. Use it when the function is logically related to the class but does not need access to instance or class state.
Writing __init__, __repr__, __eq__, and __hash__ by hand for every class gets tedious. When your class is primarily a container for data (configuration, API responses, model metadata), Python's dataclasses module generates all of that boilerplate for you.
The @dataclass decorator reads the class annotations and generates __init__, __repr__, and __eq__ for you. Default values work exactly as you would expect. Notice how clean this is compared to writing everything manually.
field() for Complex DefaultsThere is one gotcha with mutable defaults. This is a classic Python mistake:
Python's dataclass decorator is smart enough to raise a ValueError if you try to use a mutable default. The solution is field() with a default_factory:
The default_factory parameter takes a callable (like list, dict, or a lambda) that creates a fresh default for each instance. The repr=False parameter excludes a field from the string representation, useful for internal caches or large data structures.
In AI applications, you often want configuration objects that cannot be accidentally modified after creation. A changed hyperparameter mid-training is a debugging nightmare. The frozen=True option makes all fields read-only:
Because frozen dataclasses are immutable, Python can safely generate __hash__ for them. This means you can use them as dictionary keys, which is perfect for caching experiment results by configuration.
__post_init__Sometimes you need to compute derived values or validate inputs after the auto-generated __init__ runs. That is what __post_init__ is for:
The field(init=False) tells the dataclass not to include that field in the generated __init__, since it will be set in __post_init__ instead.
For data containers, dataclasses are almost always the right choice. Here is when to use which:
The rule of thumb: if your class is more about storing data than about behavior, use a dataclass.
AI frameworks need a way to say "your class must implement these methods." Python's answer is abstract base classes (ABCs). You will encounter them constantly in LangChain, LlamaIndex, and similar libraries.
If you try to instantiate BaseLLM directly, or create a subclass without implementing both abstract methods, Python raises a TypeError:
Notice that generate_batch has a default implementation in the base class. Subclasses inherit it for free but can override it with a more efficient implementation (like batched API calls).
This is the pattern you will see in LangChain's BaseLanguageModel, LlamaIndex's BaseLLM, and many other AI libraries. Understanding it means you can read framework source code, extend existing components, and create your own pluggable architectures.
ABCs require explicit inheritance. Your class must say class MyLLM(BaseLLM) to satisfy the contract. But Python also supports structural typing through Protocol, if your class has the right methods, it satisfies the protocol, no inheritance needed.
The @runtime_checkable decorator lets you use isinstance() with the protocol. Without it, the protocol only works with static type checkers like mypy.
When should you use each?
Use ABCs when you control the class hierarchy and want to provide shared behavior. Use Protocols when you want to define a contract that any class can satisfy without modification.