How to Build Effective AI Agents to Process Millions of Requests

Learn how to build production-ready systems using AI agents

Sep 9, 2025

9 min read

In this article, I highlight how to build production ready AI agents. Image by ChatGPT.

AI agents have quickly become an effective way of using LLMs for problem solving. Almost weekly, you see a new large AI research lab releasing LLMs with specific agentic capabilities. However, building an effective agent for production is a lot more complicated than it appears. An agent needs guardrails, specific workflows to follow, and proper error handling before being effective for production usage. In this article, I highlight what you need to think about before deploying your AI agent to production, and how to make an effective AI application using agents.

Motivation

My motivation for this article is that AI agents have become incredibly potent and effective lately. We see more and more LLMs released that are specially trained for agentic behaviour, such as Qwen 3, where improved agentic capabilities were an important highlight of the new LLM release from Alibaba.

A lot of tutorials online highlight how simple setting up an agent is now, using frameworks such as LangGraph. The problem, however, is that these tutorials are designed for agentic experimentation, not for utilizing agents in production. Effectively utilizing AI agents in production is much harder and requires solving challenges you don’t really face when experimenting with agents locally. The focus of this article will thus be on how to make production-ready AI agents

Guardrails

The first challenge you need to solve when deploying AI agents to production is to have guardrails. Guardrails are a vaguely defined term in the online space, so I’ll provide my own definition for this article.

LLM guardrails refers to the concept of ensuring LLMs act within their assigned tasks, adheres to instructions, and doesn’t perform unexpected actions.

The question now is: How do you set up guardrails for your AI agents? Here are some examples of how to set up guardrails:

Limit the number of functions an agent has access to
Limit the time an agent can work, or the number of tool calls they can make without human intervention
Make the agent ask for human supervision when performing dangerous tasks, such as deleting objects

Such guardrails will ensure your agent acts within its designed responsibilities, and doesn’t cause issues such as:

Exaggerated wait times for users
Large cloud bills due to extreme token usage (can happen if an agent is stuck in a loop, for example)

Furthermore, guardrails are important for ensuring the agent stays on course. If you provide your AI agent too many options, it’s likely that the agent will fail at performing its task. This is why my next section is on the topic of minimizing the agents’ options by using specific workflows.

Guiding the agent through problem-solving

Another super important point when utilizing agents in production is to minimize the number of options the agent has access to. You might imagine that you can simply make an agent that immediately has access to all your tools, and thus create an effective AI agent.

Unfortunately, this rarely works in practice: Agents get stuck in loops, are unable to pick the correct function, and struggle to recover from previous errors. The solution for this is to guide the agent through its problem-solving. In Anthropic’s Building Effective AI Agents, this is referred to as prompt chaining and is applied to agentic workflows that you can decompose into different steps. In my experience, most workflows have this attribute, and this principle is thus relevant for most problems you can solve with agents.

I’ll enhance the explanation through an example:

Task: Fetch information about location, time, and contact person from each of a list of 100 contracts. Then, present the five latest contracts in a table format

Bad solution: Prompt one agent to perform the task in its entirety, so this agent attempts to read all of the contracts, fetch the relevant info, and present it in a table format. The most likely outcome here is that the agent will present you with incorrect information.

Proper solution: Decompose the problem into multiple steps.

This figure highlights the proper approach to solving the problem of fetching and presenting data from contracts. You guide the agent through a three step process, to help the agent effectively solve the problem. Image by the author.

Information fetching (fetch all locations, times, and contact people)
Information filtering (filter to only keep the five latest contracts)
Information presentation (present the findings in a table)

Furthermore, in between steps, you can have a validator to ensure the task completion is on track (ensure you fetched information from all documents, etc)

So for step one, you will likely have a specific information extraction subagent and apply it to all 100 contracts. This should provide you with a table of 3 columns and 100 rows, each row containing one contract with location, time, and contact person.

Step two involves an information filtering step, where an agent looks through the table and filters away any contract not in the top 5 latest contracts. The last step simply presents these findings in a nice table using markdown format.

The trick is to generate this workflow beforehand to simplify the problem. Instead of an agent figuring out these three steps by itself, you create an information extraction and filtering workflow with the three predefined steps. You can then utilize these three steps, add some validation between each step, and have an effective information extraction and filtering agent. You then repeat this process for any other workflows you want to perform.

Error handling

Agent handling is a critical part of maintaining effective agents in production. In the last example, you can imagine that the information extraction agent failed to fetch information from 3/100 contracts. How do you deal with this?

Your first approach should be to add retry logic. If an agent fails to complete a task, it retries until it either successfully performs the task or reaches a max retry limit. However, you also need to know when to retry, since the agent might not experience a code failure, but rather fetch the incorrect information. For this, you need proper LLM output validation, which you can learn more about in my article on Large Scale LLM Validation.

This figure displays simple agent error handling using validate and retry logic. The agent receives a task and attempts to solve it. The output is then validated using a validation function. If the output is valid, it’s returned to the user, else the agent retries the task. Image by the author.

Error handling, as defined in the last paragraph, can be handled with simple try/catch statements and a validation function. However, it becomes more complicated when considering that some contracts might be corrupt or don’t contain the right information. Imagine, for example, if one of the contracts contains the contact person, but is missing the time. This poses another problem, since you cannot perform the next step of the task (filtering), without the time. To handle such errors, you should have predefined what happens with missing or incomplete information. One simple and effective heuristic here is to ignore all contracts that you can’t extract all three information points from (location, time, and contact person) after two retries.

Another important part of error handling is dealing with issues such as:

Token limits
Slow response times

When performing information extraction on hundreds of documents, you will inevitably face problems where you’re rate-limited or the LLM takes a long time to respond. I usually recommend the following solutions:

Token limits: Increase limits as much as possible (LLM providers are usually quite strict here), and utilize exponential backoff
Always await LLM calls if possible. This could cause issues with sequential processing taking longer; however, it will make building your agentic application a lot simpler. If you really need increased speed, you can optimize for this later.

Another important aspect to consider is checkpointing. If you have your agent performing tasks over 1 minute, checkpointing is important, because in case of failure, you don’t want your model to restart from scratch. This will usually lead to a bad user experience, since the user has to wait for an extended period of time.

Debugging your agents

A last important step of building AI agents is to debug your agents. My main point on debugging ties back to a message I’ve shared in multiple articles, posted by Greg Brockman on X:

Manual inspection of data has probably the highest value-to-prestige ratio of any activity in machine learning.
— Greg Brockman (@gdb) February 6, 2023

The tweet typically refers to a standard classification problem, where you inspect your data to understand how a machine-learning system can perform the classification. However, I find that the tweet also applies very well to debugging your agents:

You should manually inspect the input, thinking and output tokens your agents use, in order to complete a set of tasks.

This will help you understand how the agent is approaching a given problem, the context the agent is given to solve the problem, and the solution the agent comes up with. The answer to most issues your agent faces is usually contained in one of these three sets of tokens (input, thinking, output). I’ve found numerous issues when using LLMs, by simply setting aside 20 API calls I made, going through the entire context I provided the agent, as well as the output tokens, and then quickly realizing where I went wrong, for example:

I fed duplicate context into my LLM, making it worse at following instructions
The thinking tokens showed how the LLM was misunderstanding the task I was providing it, indicating my system prompt was unclear.

Overall, I also recommend creating several test tasks for your agents, with a ground truth set up. You can then tune your agents, ensure they are able to pass all test cases, and then release them to production.

Conclusion

In this article, I’ve discussed how you can develop effective production-ready agents. A lot of online tutorials cover how you can set up agents locally in just a few minutes. However, successfully deploying agents to production is usually a much greater challenge. I’ve discussed how you need to use guardrails, guiding the agent through problem-solving and effective error handling, to successfully have agents in production. Lastly, I also discussed how you can debug your agents through manually inspecting the input and output tokens it’s provided.

👉 My free eBook and Webinar:

📚 Get my free Vision Language Models ebook

💻 My webinar on Vision Language Models

👉 Find me on socials:

📩 Subscribe to my newsletter

🧑‍💻 Get in touch

🔗 LinkedIn

🐦 X / Twitter

✍️ Medium

Written By

Eivind Kjosbakken

See all from Eivind Kjosbakken