AlgoMaster Logo

Object-Oriented Python and Dataclasses

Last Updated: March 13, 2026

Ashish

Ashish Pratap Singh

As Python projects grow larger, organizing code becomes just as important as writing it. This is where object-oriented programming (OOP) helps. By modeling real-world entities as classes and objects, you can structure code in a way that is modular, reusable, and easier to maintain.

Python supports all the core OOP concepts such as classes, encapsulation, inheritance, and polymorphism. These ideas are widely used in AI codebases for organizing components like datasets, models, pipelines, and services.

Python also provides dataclasses, a feature that makes it much easier to define classes that primarily store data.

In this chapter, you will learn how object-oriented design works in Python and how dataclasses simplify many common programming patterns.

Python Classes vs Other Languages

The first thing you will notice is how little ceremony Python requires. There are no access modifiers, no type declarations on fields (unless you want them), and no separate constructor keyword. Here is a minimal class:

main.py
Loading...

Three things stand out compared to other languages.

First, the `self` parameter

Every instance method explicitly receives self as its first argument. In many languages, this is implicit. Python makes it explicit. You have to write it in the method signature, and you access instance attributes through it. Forget self in the signature and Python will throw an error about missing arguments.

Second, no access modifiers

There is no private, protected, or public. By convention, a leading underscore means "treat this as internal" (self._cache), and a double leading underscore triggers name mangling (self.__secret becomes self._Model__secret), but nothing actually prevents access.

Third, duck typing

Python does not care about an object's class. It cares about what the object can do. If it has a __len__ method, you can call len() on it. If it has __getitem__, you can index into it. This is why you can pass any object with the right methods to a framework function, even if it does not inherit from any particular base class.

Dunder Methods: The Protocol System

Dunder methods (short for "double underscore") are Python's protocol system. They let your objects plug into built-in operations like print(), len(), ==, and for loops. Frameworks like PyTorch and Hugging Face define their contracts through these methods.

__init__, __repr__, and __str__

You have already seen __init__. It initializes an object after creation. But what happens when you print an object?

main.py
Loading...

Not helpful. That output tells you nothing about the object's state. This is where __repr__ and __str__ come in.

  • __repr__ is for developers. It should produce output that ideally could recreate the object. Python uses it in the REPL and in debuggers.
  • __str__ is for users. It produces a human-friendly description. print() calls __str__ first, falling back to __repr__ if __str__ is not defined.
main.py
Loading...

The !r format specifier inside the f-string calls repr() on that value, which adds quotes around strings. It is a common pattern for __repr__ implementations.

__eq__ and __hash__

By default, Python compares objects by identity (memory address), not by value. Two objects with the same data are not considered equal unless you define __eq__:

main.py
Loading...

Why does __hash__ matter?

If you want to use objects as dictionary keys or store them in sets, Python requires __hash__. The rule is simple: if two objects are equal (__eq__ returns True), they must have the same hash. If you define __eq__ without __hash__, Python sets __hash__ to None, making the object unhashable.

main.py
Loading...

This comes up in practice when you want to cache LLM responses keyed by configuration, or when you want to deduplicate a list of configurations.

__len__ and __getitem__: Making Objects Behave Like Collections

This is where things get directly relevant to AI frameworks. PyTorch's DataLoader expects dataset objects to support two operations: "how many items?" and "give me item at index i." It does not check inheritance. It checks for __len__ and __getitem__.

main.py
Loading...

When you define __getitem__, Python automatically makes the object iterable (it tries index 0, then 1, and so on until it gets an IndexError). This is duck typing at work. PyTorch's DataLoader never checks isinstance(dataset, SomeBaseClass). It just calls len(dataset) and dataset[i], and if those work, everything is fine.

Inheritance and Method Resolution Order

Python supports multiple inheritance, which is more powerful but also more complex than single inheritance. For this course, you mostly need single inheritance and a basic understanding of how Python resolves methods.

main.py
Loading...

When you call model.predict(), Python looks for the method in this order: the instance, then the class (SentimentModel), then the parent class (BaseModel), and so on up the chain. With multiple inheritance, this order follows the C3 linearization algorithm, which you can inspect with ClassName.__mro__ or ClassName.mro().

You will rarely need to think about MRO directly, but it explains why super() works the way it does in Python. When you call super().__init__(), Python does not simply call the parent class. It calls the next class in the MRO, which matters when multiple inheritance is involved.

Properties and Class/Static Methods

@property for Computed Attributes

The @property decorator lets you define methods that behave like attributes. This is useful when a value depends on other attributes and should always be up to date:

main.py
Loading...

The advantage over storing steps_per_epoch in __init__ is that it stays correct even if you change batch_size or total_samples later. You will see this pattern in training configuration classes throughout ML frameworks.

@classmethod and @staticmethod

A @classmethod receives the class itself as the first argument (conventionally called cls) instead of an instance. The most common use case is alternative constructors:

main.py
Loading...

A @staticmethod is just a regular function that happens to live inside a class namespace. It does not receive self or cls. Use it when the function is logically related to the class but does not need access to instance or class state.

Dataclasses: The Modern Way

Writing __init__, __repr__, __eq__, and __hash__ by hand for every class gets tedious. When your class is primarily a container for data (configuration, API responses, model metadata), Python's dataclasses module generates all of that boilerplate for you.

The Basics

main.py
Loading...

The @dataclass decorator reads the class annotations and generates __init__, __repr__, and __eq__ for you. Default values work exactly as you would expect. Notice how clean this is compared to writing everything manually.

field() for Complex Defaults

There is one gotcha with mutable defaults. This is a classic Python mistake:

main.py
Loading...

Python's dataclass decorator is smart enough to raise a ValueError if you try to use a mutable default. The solution is field() with a default_factory:

main.py
Loading...

The default_factory parameter takes a callable (like list, dict, or a lambda) that creates a fresh default for each instance. The repr=False parameter excludes a field from the string representation, useful for internal caches or large data structures.

Frozen Dataclasses for Immutable Configs

In AI applications, you often want configuration objects that cannot be accidentally modified after creation. A changed hyperparameter mid-training is a debugging nightmare. The frozen=True option makes all fields read-only:

main.py
Loading...

Because frozen dataclasses are immutable, Python can safely generate __hash__ for them. This means you can use them as dictionary keys, which is perfect for caching experiment results by configuration.

Post-Init Processing with __post_init__

Sometimes you need to compute derived values or validate inputs after the auto-generated __init__ runs. That is what __post_init__ is for:

main.py
Loading...

The field(init=False) tells the dataclass not to include that field in the generated __init__, since it will be set in __post_init__ instead.

Why Dataclasses Over Plain Classes

For data containers, dataclasses are almost always the right choice. Here is when to use which:

Scroll
Use CasePlain ClassDataclass
Complex behavior with many methodsYesNo
Configuration objectsNoYes
API response containersNoYes
Data transfer objectsNoYes
Objects with mostly logic, little dataYesNo
Need __eq__, __repr__, __hash__ for freeNoYes

The rule of thumb: if your class is more about storing data than about behavior, use a dataclass.

Abstract Base Classes: Defining Interfaces

AI frameworks need a way to say "your class must implement these methods." Python's answer is abstract base classes (ABCs). You will encounter them constantly in LangChain, LlamaIndex, and similar libraries.

main.py
Loading...

If you try to instantiate BaseLLM directly, or create a subclass without implementing both abstract methods, Python raises a TypeError:

main.py
Loading...

Notice that generate_batch has a default implementation in the base class. Subclasses inherit it for free but can override it with a more efficient implementation (like batched API calls).

This is the pattern you will see in LangChain's BaseLanguageModel, LlamaIndex's BaseLLM, and many other AI libraries. Understanding it means you can read framework source code, extend existing components, and create your own pluggable architectures.

Protocols: Structural Typing

ABCs require explicit inheritance. Your class must say class MyLLM(BaseLLM) to satisfy the contract. But Python also supports structural typing through Protocol, if your class has the right methods, it satisfies the protocol, no inheritance needed.

main.py
Loading...

The @runtime_checkable decorator lets you use isinstance() with the protocol. Without it, the protocol only works with static type checkers like mypy.

When should you use each?

Scroll
FeatureABCProtocol
Requires inheritanceYesNo
Can provide default implementationsYesNo
Works with existing classes you don't controlNoYes
Runtime isinstance checkingYesOnly with @runtime_checkable
Common in AI frameworksLangChain, LlamaIndexType hints, function signatures

Use ABCs when you control the class hierarchy and want to provide shared behavior. Use Protocols when you want to define a contract that any class can satisfy without modification.

References