AlgoMaster Logo

From Text Generation to Action

Last Updated: March 15, 2026

Ashish

Ashish Pratap Singh

LLMs started as powerful text generators. You give them a prompt, and they produce coherent, useful text in response. But modern AI systems are expected to do more than just generate answers. They need to take actions.

In real-world applications, generating text is often only the first step. A useful AI system might need to search the web, query a database, call an API, run code, or interact with external tools to complete a task. Instead of simply responding with information, the model becomes part of a system that can observe, decide, and act.

This shift marks an important transition in AI engineering. We move from building systems that generate text to systems that use language models as reasoning engines that orchestrate actions.

In this chapter, we will explore how LLMs can be connected to tools, APIs, and external systems, enabling them to move beyond conversation and start getting real work done.

Why LLMs Cannot Act

In the previous module, you learned how RAG lets models access external knowledge by retrieving documents at query time. That solved the "stale knowledge" problem. But retrieval is read-only. What about when the user wants to write, create, update, or interact with external systems?

Consider what an LLM can and cannot do today:

Scroll
TaskCan the LLM do it alone?Why / Why not
Explain how weather APIs workYesText generation from training data
Return the current weather in TokyoNoRequires a live API call
Write a SQL query to find inactive usersYesText generation
Execute that query against your databaseNoRequires database access
Draft a Slack messageYesText generation
Send that Slack messageNoRequires Slack API call
Describe how to create a calendar eventYesText generation
Actually create the calendar eventNoRequires Google Calendar API

The pattern is clear. Anything that requires interacting with the world outside the model's context window is impossible with text generation alone. The model can describe the action perfectly, but it has no hands to carry it out.

Function calling gives the model hands. The model still only outputs text. But instead of outputting a natural language response, it outputs a structured JSON message that says "I want to call function X with these parameters." Your application code then executes that function, feeds the result back to the model, and the model incorporates the real data into its response.

Key Insight: The model never executes code. It never makes HTTP requests. It never touches your database. It produces a structured request, and your code does the work. This distinction matters because it means you stay in full control of what actions are allowed, what parameters are valid, and what gets executed.

The diagram above shows the two paths a model can take when it receives a message. If the user asks something the model can answer from its training data, it responds directly with text (Option A). If the user asks something that requires external data or an action, the model outputs a tool call (Option B), your code executes it, the result goes back to the model, and then the model responds with real information.

How Function Calling Works

Now that you understand why function calling exists, let's walk through exactly how it works, step by step. The entire mechanism comes down to four phases.

Phase 1: You define the available tools

When you send a request to the LLM API, you include a list of tool definitions alongside the user's message. Each tool definition describes a function: its name, what it does, and what parameters it accepts. Think of this as handing the model a menu of capabilities.

Phase 2: The model decides whether to call a tool

Based on the user's message and the available tool definitions, the model makes a decision. If the user's question can be answered without any tools, the model just responds with text. If a tool is needed, the model outputs a special tool_calls object containing the function name and arguments as structured JSON.

Phase 3: Your code executes the function

You receive the model's tool call, extract the function name and arguments, and run the actual function in your application. This might mean calling a weather API, querying a database, or creating a record in an external system. The model has no involvement in this step.

Phase 4: You send the result back

You take the function's return value and send it back to the model as a new message with the role tool. The model reads this result and generates a natural language response that incorporates the real data.

Here is a sequence diagram showing this full loop:

Notice the two round trips to the LLM API. The first sends the user's message and gets back a tool call. The second sends the tool's result and gets back the final response. This is different from a normal API call, which only requires one round trip. Every function call adds latency because of this extra round trip, something to keep in mind when designing your application.

Defining Tools with JSON Schema

The quality of your tool definitions directly determines how well the model uses them. A vague description leads to wrong tool calls. A precise description leads to accurate ones. The model reads your tool definitions the same way a developer reads API documentation, so write them with the same care.

Each tool definition has three parts:

  • name: A short, descriptive function name (e.g., get_current_weather, create_ticket)
  • description: A clear explanation of what the function does and when to use it
  • parameters: A JSON Schema object that defines the function's arguments, their types, and which ones are required

Here is what a complete tool definition looks like for the OpenAI API:

main.py
Loading...

Let's break down why each part matters.

The name should be a verb-noun pair that clearly describes the action: get_current_weather, search_users, create_ticket. The model uses this name to understand what the function does at a glance.

The description is the most important field. This is where you tell the model when to use this function. Be specific. Instead of "Gets weather data," write "Get the current weather conditions for a specific location. Use this when the user asks about current weather, temperature, or conditions in a city or region." The more context you provide, the better the model's decisions.

The parameters use standard JSON Schema. Each property needs a type (string, number, boolean, array, object) and a description. The enum field restricts values to a specific set, which prevents the model from inventing invalid options. The required array lists which parameters must be provided.

Here is a more complete example with multiple related tools:

main.py
Loading...

Notice how the three tools cover different time horizons: current, future, and past. The descriptions explicitly state when to use each one. This is critical because a user asking "What will the weather be like in Paris next week?" needs the forecast tool, not the current weather tool. Without clear descriptions, the model might pick the wrong one.

Key Insight: The most common cause of wrong tool calls is not a model limitation. It is a bad tool description. If you find the model calling the wrong tool, improve the description before blaming the model. Add more context about when to use it, when not to use it, and what distinguishes it from similar tools.

The Execute-Respond Loop

Now let's put it all together in real code. The core pattern for function calling is a loop: send a message, check if the model wants to call a tool, execute the tool if so, send the result back, and repeat until the model gives a final text response.

Here is the complete implementation:

main.py
Loading...

Let's walk through what happens when this code runs.

  1. The user asks "What's the weather like in Tokyo right now?"
  2. We send this message to the API along with the three weather tool definitions.
  3. The model recognizes this as a "current weather" question and returns a tool_calls response with get_current_weather(location="Tokyo, Japan").
  4. Our code looks up get_current_weather in the AVAILABLE_FUNCTIONS dictionary and calls it with the model's arguments.
  5. The function returns {"temp": 22, "condition": "Sunny", "humidity": 45, ...}.
  6. We add this result to the messages array with role tool and the matching tool_call_id.
  7. We make a second API call with the updated messages.
  8. The model reads the weather data and generates a natural language response like "It's currently 22 degrees Celsius and sunny in Tokyo, with 45% humidity."

The while loop is important. In some cases, the model might need to call multiple tools in sequence, or call the same tool multiple times with different arguments. The loop continues until the model responds with text instead of tool calls.

Notice the tool_call_id field when sending results back. Each tool call has a unique ID, and the result must reference that ID so the model knows which call produced which result. If you are handling multiple parallel tool calls, getting the IDs wrong will confuse the model.

How Models Choose Which Tool to Call

When you send a message with tool definitions, the model goes through a decision process that resembles how a developer reads API documentation. It looks at the user's intent, scans the available tool descriptions, and picks the best match. If no tool matches, it just responds with text.

This decision process is not a simple keyword match. The model understands semantics. If a user asks "Is it going to rain tomorrow in Berlin?", the model understands that "tomorrow" means future weather, which maps to get_weather_forecast, not get_current_weather. Even though the user never said the word "forecast."

The tool_choice Parameter

You can influence the model's tool selection behavior using the tool_choice parameter:

ValueBehavior
"auto"Model decides whether to use a tool or respond with text (default)
"none"Model will never call a tool, even if one would be helpful
"required"Model must call at least one tool (will not respond with text only)
{"type": "function", "function": {"name": "get_current_weather"}}Model must call this specific tool

Here is how to use each option:

main.py
Loading...

When do you need to override `tool_choice`?

Most of the time, "auto" works well. But there are cases where you want more control:

  • Use "none" when you want the model to summarize tool results without making additional calls.
  • Use "required" when you know the user's request definitely needs a tool (e.g., the first message in a data lookup flow).
  • Use a specific function when your application logic has already determined which function to call and you just want the model to extract the arguments.

When the Model Gets Confused

The model sometimes picks the wrong tool, especially in these situations:

  • Ambiguous requests: "Tell me about Tokyo weather" could be current, forecast, or general knowledge. Clear tool descriptions help, but ambiguity still trips up models.
  • Similar tools: If two tools have overlapping descriptions, the model may pick either one. Make your descriptions as distinct as possible.
  • Missing tools: If no tool matches the user's intent, the model should respond with text. But sometimes it forces a tool call anyway, trying to approximate what the user wants.

The fix for all of these is the same: improve your tool descriptions. Add negative examples ("Do NOT use this for forecast questions"), add explicit triggers ("Use this ONLY when the user asks about current, real-time conditions"), and test with edge cases.

Building a Complete Weather Assistant

Let's bring everything together into a complete, interactive weather assistant. This version supports multi-turn conversations, handles multiple tool calls, and includes proper error handling.

main.py
Loading...

Sample Output:

Try running this with a sequence of messages to see how multi-turn conversation works:

  1. "What's the weather in Tokyo?" (current weather call)
  2. "What about the forecast for next week?" (forecast call, model remembers Tokyo from context)
  3. "How does that compare to London right now?" (current weather call for London)
  4. "What is the capital of France?" (no tool call, model answers from training data)

The fourth message is interesting. The model recognizes this is a general knowledge question, not a weather question, so it skips the tools entirely and responds with text. This is the "auto" behavior of tool_choice at work.

References