Have you ever asked a chatbot to send an email or update a database and got back “I can’t do that”? Frustrating, right. The reason is simple: a Large Language Model (LLM) alone can only generate text. To act in the real world — write a record, call an API, send a notification — it needs tools. This guide shows you how, with working code and no abstract theory.
At Meteora Web, we work daily with AI for automation and agents. Tool calling is the game changer: it turns an LLM from a talker into a doer. If you run an e-commerce or a small business, it means no more vague answers — real actions.
What Is Tool Calling and Why Is It Critical for Automation?
Tool calling (or function calling) is a mechanism that lets an LLM declare its intention to call an external function. The model doesn’t execute the function — it generates a JSON object with the name and parameters. Your code executes it and returns the result. In practice: the LLM decides what to do, you make it happen.
Why critical? Without tool calling, an AI agent is just a text generator — it can write “I updated the database” but doesn’t do it. With tool calling, that update is real. We see it in our clients’ projects: orders that go out, tickets that close, reports that generate. Revenue is measured in actions, not words.
Sponsored Protocol
How Does Tool Calling Work in Modern LLMs?
The flow is simple and powerful:
- Define the available functions (name, description, parameters in JSON Schema).
- Include these definitions in the request to the LLM.
- The model analyzes the prompt and, if a tool is needed, returns a
tool_callsobject. - Your code executes the function and returns the result as a new message.
- The model receives the result and generates the final response.
It works with OpenAI, Anthropic, Google Gemini and open-source models. The key is clear definitions: a poorly described tool produces wrong calls.
Practical Example: Built-in Calculator
Here’s the code using the OpenAI API in Python. Make sure you have openai>=1.0 installed.
from openai import OpenAI
import json
client = OpenAI()
def calculate(operation, a, b):
"""Performs an arithmetic operation."""
if operation == "add":
return a + b
elif operation == "multiply":
return a * b
else:
return "Unsupported operation"
tools = [
{
"type": "function",
"function": {
"name": "calculate",
"description": "Performs an arithmetic operation on two numbers",
"parameters": {
"type": "object",
"properties": {
"operation": {"type": "string", "enum": ["add", "multiply"]},
"a": {"type": "number"},
"b": {"type": "number"}
},
"required": ["operation", "a", "b"]
}
}
}
]
messages = [{"role": "user", "content": "What is 123 times 456?"}]
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
tools=tools
)
if response.choices[0].message.tool_calls:
tool_call = response.choices[0].message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
result = calculate(**args)
messages.append(response.choices[0].message)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps({"result": result})
})
final_response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages
)
print(final_response.choices[0].message.content)
Try it: the LLM understands a multiplication is needed, calls calculate and receives the result. You can extend this pattern to any API.
Sponsored Protocol
What Tools Should You Give Your LLM?
Not all tools are useful. Choose based on context. Here are the most common ones in projects we follow:
Sponsored Protocol
- Internal search (RAG): query a vector DB for company documents.
- External APIs: weather, shipping, payments.
- Database CRUD: create a lead, update an order.
- Email or notification sending: confirmations, alerts.
- Custom calculations: tax, discounts, installments.
Watch out for security: never give DELETE FROM clients to an unverified LLM. We limit permissions and always validate the output. We handle this with an authorization layer before executing the function.
Sponsored Protocol
Common Mistakes and How to Avoid Them in Tool Calling
We’ve seen (and made) classic mistakes:
- Vague descriptions: the model calls the wrong tool. Use detailed descriptions: “Performs multiplication of two numbers a and b, returns the product”.
- Missing parameters: if a field isn’t
required, the model may skip it. Mark everything needed as required. - No error handling: if the function throws an exception, the loop breaks. Return a structured error message and let the model decide how to handle it.
- Too many tools: more than 10-15 tools confuse the model. Group them or use hierarchical routing.
A real case: a client had a tool for “send email” with no recipient limit. The LLM sent 500 emails at random. Now we add a limit and a human confirmation.
Tool Calling vs RAG: Differences and When to Use Them Together?
RAG (Retrieval-Augmented Generation) and tool calling are not competitors. RAG brings knowledge, tool calling brings action. An agent that answers catalog questions uses RAG; an agent that updates stock after a purchase uses tool calling. In many projects we combine them: the LLM searches documents (RAG) and then executes commands (tool calling).
Sponsored Protocol
The real question is: does the agent only need to answer or also to act? If only answer, RAG is enough. If it must act, you need tool calling.
What to Do Next
Here are three immediate actions to start with tool calling:
- Write a simple tool: a function that returns the current time or a dummy value. Integrate with the OpenAI API as shown above.
- Add a real tool: connect to a test API (e.g. Open-Meteo for weather). See if the LLM calls correctly and uses the result.
- Check security: set a whitelist of allowed actions and a per-call limit. Never execute functions on unfiltered user input.
At Meteora Web, we use tool calling daily to automate workflows — from order management to quote delivery. Next step? Multi-tool agents collaborating. If you want to dive deeper, check out our pillar on AI agentic systems.
Also read how Morgan Stanley halved reconciliation time using AI agents with tool calling and how few companies actively monitor their AI models.