Last Updated: March 18, 2026
There are so many LLMs that it's hard to keep track of them all.
There are many open-source models, hosting providers, SDKs, and pricing models, as well as OpenAI, Anthropic, Google, Meta, Mistral, Cohere, and others. There is a new "best" model every week. How do you even start?
The good news is that calling an LLM API is basically the same as calling any other REST API. You send a request and get a response. The ideas work for all providers, and it doesn't take long to switch between the models once you understand one.
This chapter will teach you how to use a single unified gateway to make API calls to multiple LLMs, how to securely handle authentication, how to understand the structure of requests and responses, and how to deal with the errors that might happen in production.
Before we write any code, let’s build a clear mental model for how we will work with LLM providers throughout this course.
In this course, we will use OpenRouter, an AI gateway that gives you access to 200+ models, including models from OpenAI, Anthropic, Google, Meta, Mistral, and more, through a single API key using the OpenAI SDK.
Instead of managing multiple SDKs, multiple API keys, and different response formats for each provider, you integrate with OpenRouter once, and it routes your request to whichever model you want to use.
Your application talks to OpenRouter through one consistent interface. OpenRouter takes care of routing, authentication, and response normalization across the underlying providers.
The best part is that you only need to write the API integration once. After that, switching from GPT to Claude to Llama is as simple as changing the model name string.
You can find the available free models in OpenRouter here, and explore the full list of supported models here.
OpenRouter gives you access to most of the popular models. If you want to go through this course without spending any money, here are a few free models you can use:
In this course, I have included a Run button wherever feasible so you can execute the Python code directly in your browser. Some heavier and more time-consuming scripts have been omitted, but you can still run them on your local system.
Every LLM provider uses API keys to authenticate requests. Before we write any model-calling code, it is important to set this up correctly.
Get your OpenRouter key at openrouter.ai. Create a free account and generate a key. This single key gives you access to 200+ models from major providers such as OpenAI, Anthropic, Google, Meta, and Mistral.
If you plan to use only the free models, you do not need to add payment details. OpenRouter offers 25+ free models with limited daily usage (50 reqs/day). Checkout their pricing page for more details.
This may sound obvious, but it is one of the most common mistakes beginners make.
Never do this:
Instead, store your key in an environment variable.
There are two common ways to do that.
.env file (recommended for projects)Create a .env file in the root of your project and set your API key:
Then load it in Python using the python-dotenv package:
And always add .env to your .gitignore:
Let's install the client libraries we will need:
That is the only dependency you need. OpenRouter uses the same API format as OpenAI, so the OpenAI SDK works out of the box, no matter which underlying model you are calling.
Every LLM API call follows the same fundamental pattern, regardless of provider.
At the heart of every LLM request is a messages array. This is a list of messages that represent a conversation. Each message has two fields: a role and content.
There are three roles:
Here is what a typical messages array looks like:
Here is what happens when you make an LLM API call:
The flow is straightforward:
Here is the complete code to make your first LLM API call. The code is minimal, and you will recognize the pattern immediately if you have ever used the OpenAI SDK before.
Let's break down what is happening here.
The base_url parameter is the key. By setting it to openrouter, you redirect the OpenAI SDK to send all requests to OpenRouter instead of OpenAI's servers. The request format, response format, and SDK methods remain identical, only the destination changes.
Model names follow the format provider/model-name. So openai/gpt-4o-mini tells OpenRouter "route this to OpenAI's gpt-oss-120b model".
The chat.completions.create() method is the main endpoint for generating text. You specify:
provider/model-name format.The response object is the standard OpenAI format regardless of which model you call. choices[0].message.content holds the generated text, and usage gives you token counts for billing.
The response looks like this (simplified):
A few fields worth understanding:
"stop" means it finished naturally. "length" means it hit the token limit, which usually means your output was cut off.Before we move to error handling, let's talk about tokens, because they determine both the cost of your API calls and the limits you will hit.
A token is roughly 3/4 of a word in English. The word "hamburger" is two tokens ("ham" and "burger"). Common words like "the" or "is" are single tokens. Code tends to use more tokens per line than English prose because of special characters and variable names.
Here is a rough rule of thumb:
Why does this matter? Two reasons.
Cost: Every provider charges per token. Input tokens (your prompt) are cheaper than output tokens (the model's response). If input tokens cost about $2.50 per million and output tokens cost about $10 per million, a request with a 500-token prompt and a 200-token response costs roughly $0.003. This might look cheap for a single call, but it adds up at scale.
Context window: Every model has a maximum number of tokens it can process in a single request (input + output combined). GPT-5.1 supports 400k tokens. Claude opus 4.6 supports 1m tokens. If your messages array exceeds this limit, the API will return an error.
When calling models through OpenRouter, you can estimate token counts using the tiktoken library before sending a request. It runs offline and returns results instantly, no API call needed.
For non-OpenAI models (Claude, Gemini, Llama), tiktoken gives a reasonable estimate. The exact count will differ slightly because each model uses its own tokenizer vocabulary, but the estimate is close enough for planning purposes. The actual token usage always comes back in response.usage after the call completes.
LLM APIs fail in ways that are easy to predict, and the difference between a prototype and a production application is how well they handle these failures.
Here are the errors you will encounter most often:
Here is a complete implementation:
Usage is simple. Just pass the same arguments you would pass to chat.completions.create
By default, the SDK will wait a long time for a response. For production use, you should set an explicit timeout:
For requests that generate long outputs, you might need longer timeouts. A good starting point is 30 seconds for short responses and 60 seconds for longer ones.
Now let's combine everything into a single script that sends the same prompt to multiple models and compares the results. This is the power of building on OpenRouter: one client, one API key, any model.
When you run this script, you will see responses to the same prompt from three different model families, with latency and token counts for each. A few things you will likely notice:
The key point is that none of these differences required you to write separate API calls. You changed a string, and the rest just worked.