Last Updated: March 13, 2026
Before building AI systems, you need a solid foundation in the language that powers most of the modern AI ecosystem: Python.
Python has become the standard language for AI and machine learning because it is simple, expressive, and supported by a massive ecosystem of libraries such as NumPy, Pandas, PyTorch, and TensorFlow. From data processing to model training and deploying AI applications, Python sits at the center of the entire workflow.
In this chapter, we will cover the Python essentials you need for AI engineering. You will learn the key language features, data structures, and programming patterns that appear frequently in AI codebases.
Python has four core collection types, and each one maps to a specific use case in AI work. If you have used Java's HashMap or JavaScript's Map object, you will find Python's versions familiar but with cleaner syntax since they are built into the language itself.
Here is how they compare:
Let's walk through each one with examples that mirror what you will actually see in AI projects.
Lists are ordered, mutable sequences. In AI code, lists hold everything from raw token sequences to batches of embeddings.
Dicts are key-value stores. Python dicts are everywhere in AI: model configurations, API payloads, token vocabularies.
Sets are unordered collections of unique elements. Their superpower is O(1) membership testing, which matters when you are filtering against a vocabulary of 50,000 tokens.
If you need both uniqueness and order, a common pattern is list(dict.fromkeys(items)). This preserves insertion order while removing duplicates.
Tuples are like lists but immutable. You cannot change them after creation. This makes them useful as dict keys (lists cannot be dict keys because they are mutable) and as return values from functions.
That last example, returning multiple values and unpacking them on the other side, is something Python developers do dozens of times a day. In Java you would need a custom class or a Pair. In Go, multiple return values work similarly. Python just makes it effortless.
Comprehensions are one of the features that make Python code feel distinctly different from Java or Go. They let you transform and filter collections in a single expression. In AI work, they show up constantly for preprocessing, feature extraction, and batch processing.
The basic pattern is [expression for item in iterable if condition]. If you are used to Java streams or JavaScript's .map().filter() chains, this is the Python equivalent, but more concise.
Same idea, but produces a dictionary. Extremely useful for building lookup tables and inverting mappings.
Less common, but handy for extracting unique values from a collection.
A word of caution: comprehensions are powerful, but do not nest them more than two levels deep. A double-nested comprehension is already hard to read. If you find yourself nesting three levels, break it into a regular loop. Readability counts, especially in collaborative AI projects where others need to understand your preprocessing pipeline.
We touched on this briefly with tuples, but unpacking deserves its own section because it is so pervasive in Python AI code.
The * operator captures "the rest" into a list. This is like JavaScript's spread/rest syntax but for assignments.
Python functions commonly return tuples, and callers unpack them directly. You will see this pattern in every ML training loop.
F-strings (formatted string literals) are Python's answer to string interpolation. If you are coming from JavaScript, think template literals. From Java, think String.format() but shorter.
F-strings are not just convenient, they are the standard in modern Python. Older approaches like % formatting and .format() still work but are less readable. In this course, we use f-strings exclusively.
Slicing lets you extract portions of lists, strings, and (later) arrays and tensors. The syntax is sequence[start:stop:step], where start is inclusive and stop is exclusive. If you are coming from Go, this is similar to Go's slice syntax a[low:high].
When you start working with NumPy arrays and PyTorch tensors, slicing becomes essential. The syntax is identical:
You do not need to fully understand NumPy yet. The point is that the slicing syntax you learn here transfers directly to the numerical computing libraries you will use throughout this course.
Strings support the same slicing syntax. This is useful for truncating prompts, extracting prefixes, or working with fixed-format text.
:=The walrus operator (:=), introduced in Python 3.8, assigns a value to a variable as part of an expression. It is called the walrus operator because := looks like a walrus on its side.
Where it really shines in AI work is streaming, where you are reading chunks of data in a while loop:
Without the walrus operator, that last example would require computing the similarity score separately, assigning it to a variable, checking the condition, and then using the variable. The walrus operator collapses those steps.
If this feels unfamiliar, do not worry about it yet. You will naturally start using it once you encounter streaming APIs in Module 1.
Python has a set of built-in functions and patterns that experienced developers reach for instead of writing manual loops. Learning these idioms is what makes your code look "Pythonic" rather than "Java translated to Python."
Instead of maintaining a counter variable (the for (int i = 0; ...) pattern from Java), use enumerate:
zip pairs up elements from two or more iterables. In Java, there is no built-in equivalent. You would need a manual index loop. In Python, it is one word.
These short-circuit through an iterable and return a single boolean. Think of any as "does at least one item satisfy this?" and all as "do all items satisfy this?"
The key parameter lets you sort by any criteria without defining a full comparator. In Java, this is like passing a Comparator.comparing() lambda.
We mentioned this earlier, but it is worth emphasizing. Using .get() with a default value is the standard pattern for safe dictionary access.
When you work with text data in AI, whether that is preprocessing inputs for an LLM, parsing model outputs, or building NLP pipelines, you will lean heavily on a small set of string methods. These come up far more often than you might expect.
split breaks a string into a list. join does the reverse. Together, they form the simplest tokenization pipeline.
Messy text is the norm in real data. These methods handle the most common cleaning tasks.
These are cleaner than slicing for checking prefixes and suffixes, and they accept tuples for checking multiple patterns at once.
Python's ternary expression is value_if_true if condition else value_if_false. It is the Python equivalent of Java's condition ? a : b or JavaScript's ternary. The syntax reads more like English, which some people find more readable and others find awkward.
Keep ternary expressions to one line. If the logic is more complex, use a regular if/else block. Nested ternaries are legal in Python but universally hated.
Python's None is like null in Java or nil in Go. But Python has a broader concept of "truthiness" that you need to understand to avoid subtle bugs.
These values all evaluate to False in a boolean context:
Everything else is truthy. This is powerful but can bite you if you are not careful.
is None and TruthinessThis is a famous Python gotcha. Never use a mutable default argument:
The problem with the first version is that the default [] is created once when the function is defined, not each time it is called. So every call without an explicit collection argument shares and mutates the same list. This is one of Python's most notorious footguns, and it comes up in AI code whenever you are accumulating results.
Here are the key takeaways from this chapter:
.get() with defaults for safe access.enumerate and zip, it eliminates most index-based loops.:.2f for floats and :.1% for percentages.[start:stop:step] works on lists, strings, and (later) arrays and tensors. The same syntax transfers to NumPy and PyTorch.enumerate, zip, any/all, sorted(key=...), and dict.get() replace patterns that require more boilerplate in Java, Go, and JavaScript.is None, not truthiness, when 0, "", or [] are valid values. And never use mutable default arguments.