🚀 Able to supercharge your AI workflow? Attempt ElevenLabs for AI voice and speech era!
On this article, you’ll discover ways to remodel a fundamental tool-calling script right into a resilient agent that gracefully handles failures from misbehaving instruments, malformed mannequin outputs, and unavailable companies.
Subjects we are going to cowl embrace:
- Methods to construction an iterative agent loop with a security cap on iteration rely.
- The 4 distinct classes of failure an agent encounters when calling instruments, and how one can deal with every one.
- Methods to design device error messages that train the mannequin how one can get well, lowering wasted iterations.
Constructing a Multi-Software Gemma 4 Agent with Error Restoration
Introduction
In a earlier article, we wired up Gemma 4 to a handful of Python features utilizing Ollama’s tool-calling API. That gave us a working single-turn dispatcher: the mannequin picks a device, our code runs it, the mannequin solutions. It’s a helpful start line, however it’s a great distance from an agent.
One of many issues that turns a tool-calling demo into an precise agent is the way it handles issues going mistaken. Instruments fail. The mannequin hallucinates a perform title, or passes a string the place you needed a quantity, or asks a couple of metropolis your lookup desk has by no means heard of. An upstream API instances out. A required argument is lacking. Within the earlier tutorial, any of those would both crash the script or get swallowed by a attempt/besides that prints a message and provides up. That’s high-quality for a single path demo. It’s not high-quality for something you’d wish to depart operating.
This text rebuilds the agent across the assumption that issues will go mistaken, and reveals how one can get well gracefully after they do. The sample is easy: catch errors on the boundary, convert them into messages the mannequin can learn, ship them again to the mannequin, and let the mannequin determine whether or not to retry, route round the issue, or clarify the failure to the person. We’ll additionally wrap every little thing in a correct iterative agent loop with a security cap on iteration rely.
The full script will be discovered right here. This text walks via the elements that matter.
Rethinking the Software Loop
The unique dispatcher ran a single spherical: ship the person question, acquire device calls, run them, ship the outcomes again, print the mannequin’s reply. That’s a one-shot interplay. It really works high-quality when the mannequin’s first response appropriately solutions the person’s query, however it has nowhere to go when one thing goes mistaken. If a device fails, the mannequin will get one likelihood to react after which we’re accomplished. If the mannequin desires to name one other device after seeing the primary end result, too dangerous; we already exited.
A correct agent loop is iterative. The construction is simple:
- Ship the present message historical past to the mannequin.
- If the mannequin produces device calls, execute every one, append each end result to the historical past, and loop once more.
- If the mannequin produces a plain textual content response, that’s the ultimate reply. Return.
- Cap the loop at
MAX_ITERATIONSso a confused mannequin can’t burn via your CPU perpetually.
That final level is non-negotiable. Small fashions sometimes get caught calling the identical device repeatedly, or oscillating between two instruments, and there’s nothing extra demoralizing than strolling again to your terminal to seek out your laptop computer’s followers screaming as a result of Gemma determined to search for the climate in London thirty instances in a row.
Right here’s the loop:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
def run_agent(user_query):     messages = [{“role”: “user”, “content”: user_query}]      for iteration in vary(1, MAX_ITERATIONS + 1):         payload = {             “mannequin”: MODEL_NAME,             “messages”: messages,             “instruments”: available_tools,             “stream”: False,         }          print(f“[EXECUTION — iteration {iteration}]”)         print(”  ● Querying mannequin…n”)          attempt:             response_data = call_ollama(payload)         besides Exception as e:             print(f”  └─ [ERROR] Error calling Ollama API: {e}”)             print(f”  └─ Ensure Ollama is operating and {MODEL_NAME} is pulled.”)             return          message = response_data.get(“message”, {})         tool_calls = message.get(“tool_calls”) or []          # Department A: the mannequin desires to make use of instruments         if tool_calls:             print(f“[TOOL EXECUTION — {len(tool_calls)} call(s)]”)             messages.append(message)             tool_messages = print_tool_calls(tool_calls)             messages.prolong(tool_messages)             print()             proceed          # Department B: the mannequin produced a closing reply         print(“[RESPONSE]”)         print(message.get(“content material”, “”) + “n”)         return      # Security rail: we exhausted MAX_ITERATIONS with no closing reply     print(“[RESPONSE]”)     print(         f“Hit the {MAX_ITERATIONS}-iteration cap with no closing reply. “         “This normally means the mannequin is caught in a tool-calling loop. “         “Attempt simplifying the question.n”     ) |
The sample is price committing to reminiscence as a result of it reveals up in each agent framework you’ll ever learn: the message historical past is the state. For every iteration we ship the whole dialog (the unique person question, the mannequin’s tool-call request, our device outcomes, any follow-up mannequin messages) again to the mannequin. The mannequin is stateless; the checklist is the agent’s reminiscence.
This iterative construction can also be what makes error restoration attainable. When a device fails and we ship the error again as a device message, the mannequin will get to see that error and react to it on the following iteration. With out the loop, there’s nothing to react into.
Constructing the Software Registry
Right here we construct our 4 instruments, all deterministic, all offline. No API keys, no community calls, no flaky exterior companies to debug. The purpose of this text is the error-handling structure, not the instruments themselves, so we would like the instruments to behave predictably so we are able to concentrate on the framework round them, and so we are able to intentionally set off each failure mode at will.
The instruments are:
get_weather(metropolis): seems to be up a metropolis in a small dict of canned climate knowledgeget_local_time(metropolis): computes the true present time in that metropolis’s timezone utilizingzoneinfoconvert_currency(quantity, from_currency, to_currency): does the maths towards a hardcoded USD-anchored charge deskget_city_population(metropolis): one other lookup towards a small dict
The static knowledge lives on the prime of the file:
|
CITY_DATA = {     “london”:    {“timezone”: “Europe/London”,      “inhabitants”: 8_982_000},     “tokyo”:      {“timezone”: “Asia/Tokyo”,          “inhabitants”: 13_960_000},     “sao paulo”:  {“timezone”: “America/Sao_Paulo”,  “inhabitants”: 12_330_000},     “paris”:      {“timezone”: “Europe/Paris”,        “inhabitants”:  2_161_000},     “big apple”:  {“timezone”: “America/New_York”,    “inhabitants”:  8_336_000},     “sydney”:    {“timezone”: “Australia/Sydney”,    “inhabitants”:  5_312_000},     “mumbai”:    {“timezone”: “Asia/Kolkata”,        “inhabitants”: 20_410_000}, }  EXCHANGE_RATES = {     “USD”: 1.00,  “EUR”: 0.92,  “GBP”: 0.79,  “JPY”: 156.40,     “BRL”: 5.12,  “CAD”: 1.37,  “AUD”: 1.51,  “INR”: 83.20, } |
The features are intentionally easy, however they elevate on dangerous enter somewhat than returning error strings. Right here’s get_weather:
|
def get_weather(metropolis: str) -> str:     “”“Returns present climate situations for a recognized metropolis.”“”     key = metropolis.decrease().strip()     if key not in WEATHER_DATA:         elevate ValueError(             f“Unknown metropolis: ‘{metropolis}’. Recognized cities: {‘, ‘.be a part of(sorted(WEATHER_DATA.keys()))}.”         )     knowledge = WEATHER_DATA[key]     return f“The climate in {metropolis.title()} is {knowledge[‘conditions’]} with a temperature of {knowledge[‘temp_c’]}°C.” |
Two issues to name out about that error message. First, it’s particular: it tells the caller what went mistaken and what the legitimate choices are. Second, the device elevates a ValueError somewhat than returning the error as a string. Don’t catch and string-format errors contained in the device; as an alternative, allow them to propagate. We would like the dispatcher to deal with each sort of failure in a single place, and we would like the message the mannequin sees on a nasty enter to be informative sufficient that the mannequin can right itself.
get_local_time does the one actual work — precise timezone-aware datetime arithmetic — and that’s additionally the device we’ll later use to show sleek degradation towards a simulated upstream failure:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
def get_local_time(metropolis: str) -> str:     “”“Returns the present native time for a metropolis, with a cached fallback.”“”     key = metropolis.decrease().strip()      # Simulate an upstream geocoding service which will fail unpredictably     if SIMULATE_GEOCODING_OUTAGE and random.random() < 0.6:         if key in TIMEZONE_FALLBACK_CACHE:             tz_name = TIMEZONE_FALLBACK_CACHE[key]             now = datetime.datetime.now(ZoneInfo(tz_name))             return (                 f“[cached] The present native time in {metropolis.title()} is “                 f“{now.strftime(‘%H:%M on %A, %B %d, %Y’)} ({tz_name}). “                 “Observe: geocoding service is at the moment unavailable; this worth is from the native cache.”             )         elevate ToolUnavailableError(             f“Geocoding service is unavailable and ‘{metropolis}’ will not be within the native cache. “             “Please attempt once more later or use a metropolis from the cache: “             f“{‘, ‘.be a part of(sorted(TIMEZONE_FALLBACK_CACHE.keys()))}.”         )      if key not in CITY_DATA:         elevate ValueError(f“Unknown metropolis: ‘{metropolis}’. Recognized cities: {‘, ‘.be a part of(sorted(CITY_DATA.keys()))}.”)     tz_name = CITY_DATA[key][“timezone”]     now = datetime.datetime.now(ZoneInfo(tz_name))     return f“The present native time in {metropolis.title()} is {now.strftime(‘%H:%M on %A, %B %d, %Y’)} ({tz_name}).” That <code>SIMULATE_GEOCODING_OUTAGE</code> flag lets us reproduce a actual–world failure mode with out needing actual infrastructure to fail. We‘ll come again to it.  The device schemas are unchanged from the earlier tutorial’s</a> type: commonplace Ollama perform–calling format, with clear descriptions of what every device does and what arguments it expects.
 <h2>The 4 Error Restoration Patterns</h2> Time to get severe. There are 4 distinct failure modes you‘ll encounter when an agent talks to instruments, and every one wants its personal technique. They’re dealt with in a single dispatcher perform, however it‘s price understanding them as separate ideas. Â
Sample 1: Software Execution ErrorsThe primary protection is the dispatcher itself. It wraps each device name in a structured  def dispatch_tool_call(tool_call):     function_name = tool_call[“function”][“name”]     arguments = tool_call[“function”][“arguments”] or {}      # Protection 1: validate the device title towards the registry     if function_name not in TOOL_FUNCTIONS:         return “error”, (             f”Unknown device ‘{function_name}‘. “             f”Legitimate instruments are: {‘, ‘.be a part of(TOOL_FUNCTIONS.keys())}.“         )      func = TOOL_FUNCTIONS[function_name]      # Protection 2: catch argument errors (mistaken sorts, lacking or additional args)     attempt:         end result = func(**arguments)         return “okay“, str(end result)     besides TypeError as e:         return “error“, f”Unhealthy arguments for {function_name}: {e}“     besides ValueError as e:         return “error“, str(e)     besides ToolUnavailableError as e:         return “error“, f”Software briefly unavailable: {e}“     besides Exception as e:         return “error“, f”Sudden error in {function_name}: {sort(e).__name__}: {e}“ |
The important thing perception: return the error to the mannequin as a device end result as an alternative of elevating it again to the agent loop. The mannequin can learn the error, see that it requested for “Atlantis” and Atlantis isn’t a recognized metropolis, and pivot to a special metropolis, or apologize to the person. In the event you elevate as an alternative, you’ve stripped the mannequin of the power to get well.
Discover the 4 totally different exception sorts and the catch-all on the backside. Each corresponds to an actual class of failure: area errors (ValueError), signature mismatches (TypeError), infrastructure outages (ToolUnavailableError), and the Don Rumsfeld unknown unknowns (Exception). Separating them offers you cleaner error messages, which give the mannequin higher alerts for restoration.
The catch-all is necessary and maybe controversial. Some type guides will inform you by no means to catch a naked Exception. In an agent dispatcher, the choice — letting an surprising exception kill the loop — is worse. The mannequin loses the possibility to get well, the person loses the response, and also you lose the dialog historical past you could possibly have used to debug what occurred. Higher to catch, log, and hand the message to the mannequin.
Sample 2: Malformed Software Calls From the Mannequin
The mannequin sometimes hallucinates a device title that doesn’t exist, or sends arguments underneath the mistaken keys (city as an alternative of metropolis, for instance). The primary protection within the snippet above handles the primary case: earlier than we even attempt to dispatch, we verify the title towards the registry and return a corrective message itemizing the legitimate names.
The incorrect-argument case is dealt with by the second protection. Python’s **arguments unpacking raises TypeError if the mannequin sends a key phrase the perform doesn’t settle for, or omits a required one. We catch the TypeError, format it cleanly, and the mannequin will get a helpful error on the following iteration:
|
[ERROR]: Unhealthy arguments for get_weather: get_weather() bought an surprising key phrase argument ‘city’ |
That message accommodates every little thing the mannequin must right itself: the device title, the offending argument, and an implicit sign that the correct title is one thing else. In observe the mannequin normally fixes the decision on its subsequent flip.
There’s additionally a extra delicate argument-related failure: sort drift. The mannequin is aware of quantity must be a quantity, however in longer conversations it sometimes begins sending "100" as a string. Letting convert_currency elevate on that will pressure an additional flip for the mannequin to right itself. A greater method is defensive coercion within the device itself:
|
def convert_currency(quantity: float, from_currency: str, to_currency: str) -> str:     # Defensive sort coercion: the mannequin typically sends numbers as strings     attempt:         quantity = float(quantity)     besides (TypeError, ValueError):         elevate ValueError(f“‘quantity’ should be a quantity, bought: {quantity!r}”)     # … remainder of the perform |
This silently fixes the widespread case ("100" turns into 100.0) whereas nonetheless elevating a clear error for the genuinely damaged case ("fifty"). The precept: be liberal in what you settle for from the mannequin, and strict in what you complain about.
Sample 3: Area-Stage Errors
These are the errors the device itself raises when the inputs are well-formed however the request can’t be happy, equivalent to asking for the climate in Atlantis, or changing from a foreign money that isn’t within the charge desk. These ought to produce error messages that train the mannequin how one can get well, not simply say “failed.”
Evaluate these two error messages:
|
Good: “Unknown metropolis: ‘Atlantis’. Recognized cities: london, mumbai, big apple, paris, sao paulo, sydney, tokyo.” |
The great model offers the mannequin every little thing it must both retry with a legitimate enter or clarify the limitation to the person. The dangerous model forces the mannequin to guess. Each error message within the device features follows this sample: say what went mistaken, and the place attainable, checklist the legitimate alternate options.
This isn’t only a UX nicety. It instantly impacts what number of iterations the agent loop will burn earlier than attending to a very good reply. A obscure error can value you a full additional spherical journey whereas the mannequin gropes for a repair. A selected error normally will get corrected on the very subsequent flip or, when the enter is genuinely unrecoverable, lets the mannequin produce a clear rationalization with out making an attempt once more in any respect.
Sample 4: Swish Degradation for Unavailable Instruments
The final sample is for the state of affairs the place a device isn’t damaged, simply gone — a geocoding service is down, an API quota is exhausted, a database is having a nasty day. You may have three choices right here, roughly so as of how a lot you belief the mannequin to deal with the state of affairs:
- Return a cached or default worth and flag it within the end result. Finest when the device’s freshness isn’t essential.
- Skip the device solely and return a transparent message about what couldn’t be supplied. Let the mannequin determine whether or not to retry or work round it.
- Floor the outage to the person by having the agent cease and ask for steering.
get_local_time demonstrates choice 1. When SIMULATE_GEOCODING_OUTAGE is on and the random verify journeys, the device first tries the native cache:
|
if SIMULATE_GEOCODING_OUTAGE and random.random() < 0.6: Â Â Â Â if key in TIMEZONE_FALLBACK_CACHE: Â Â Â Â Â Â Â Â tz_name = TIMEZONE_FALLBACK_CACHE[key] Â Â Â Â Â Â Â Â now = datetime.datetime.now(ZoneInfo(tz_name)) Â Â Â Â Â Â Â Â return ( Â Â Â Â Â Â Â Â Â Â Â Â f“[cached] The present native time in {metropolis.title()} is “ Â Â Â Â Â Â Â Â Â Â Â Â f“{now.strftime(‘%H:%M on %A, %B %d, %Y’)} ({tz_name}). “ Â Â Â Â Â Â Â Â Â Â Â Â “Observe: geocoding service is at the moment unavailable; this worth is from the native cache.” Â Â Â Â Â Â Â Â ) Â Â Â Â elevate ToolUnavailableError( Â Â Â Â Â Â Â Â f“Geocoding service is unavailable and ‘{metropolis}’ will not be within the native cache. “ Â Â Â Â Â Â Â Â “Please attempt once more later or use a metropolis from the cache: “ Â Â Â Â Â Â Â Â f“{‘, ‘.be a part of(sorted(TIMEZONE_FALLBACK_CACHE.keys()))}.” Â Â Â Â ) |
If town is within the cache, the device returns a profitable end result tagged with [cached] and a notice explaining that the reside service is unavailable. The mannequin sees a superbly usable reply and a small caveat it could select to say to the person. If town isn’t within the cache, the device falls via to choice 2: it raises ToolUnavailableError with a message itemizing what is cached.
That ToolUnavailableError is deliberately a separate exception sort somewhat than a ValueError. The dispatcher offers it its personal catch arm with a definite error prefix (“Software briefly unavailable”) so the mannequin can inform the distinction between “you requested for one thing I don’t have” and “the service is down proper now.” These two failures have very totally different applicable responses — retry later versus choose a special enter — and giving the mannequin a transparent sign helps it choose the correct one.
In manufacturing, you’d prolong this sample with a retry-with-backoff coverage earlier than falling via to the fallback. The construction stays the identical: the dispatcher distinguishes recoverable from unrecoverable failures, and the mannequin is informed sufficient about every one to make a smart subsequent transfer.
Placing It All Collectively
Time to really run the factor. Right here’s a question that workouts every little thing — a number of cities, a number of instruments, and an intentional dangerous enter to set off error restoration in flight:
|
python most important.py “What is the climate in London, Tokyo, and Atlantis proper now? And convert 50 GBP to JPY.” |
The precise iteration rely and tool-call ordering will differ from run to run relying on how Gemma decides to sequence the work, however right here’s a consultant hint, barely trimmed:

Have a look at what occurred in iteration 3. The mannequin requested about Atlantis, the device raised ValueError, the dispatcher transformed it into an error message itemizing the legitimate cities, and the mannequin — on iteration 5 — folded that info right into a clear response. It didn’t retry Atlantis. It didn’t crash. It seen the failure, built-in it with the profitable outcomes, and produced a solution that acknowledged the limitation. That’s the whole payoff of the error-recovery structure in a single hint.
To see sleek degradation in motion, flip SIMULATE_GEOCODING_OUTAGE to True and run a question that asks for native time:
|
python most important.py “What is the native time in London and Paris?” |
About 60% of the time you’ll see the [cached] prefix within the device end result and the mannequin will point out the cached supply in its closing response. The remainder of the time the device will return efficiently and the cached path received’t set off. Both manner, the loop completes and the person will get a solution.
Conclusion
We constructed three issues on prime of the inspiration from the primary tutorial: an iterative agent loop with a tough iteration cap, a layered dispatcher that catches each class of device failure, and gear features whose error messages train the mannequin how one can get well. Collectively they’re the distinction between a tool-calling demo and an agent you’d really wish to depart operating unsupervised.
Just a few pure subsequent steps embrace:
- Persistent reminiscence throughout periods, so the agent can bear in mind what it discovered about you final week
- Retry-with-backoff insurance policies for transient upstream failures
- Reincorporating the exterior APIs rather than the static lookup tables, which principally simply means accepting that timeouts and charge limits turn into a part of the traditional failure floor
The full script is on GitHub. Clone it, run it, break it intentionally to look at the restoration in motion, and incorporate the following steps above.
🔥 Need the very best instruments for AI advertising and marketing? Take a look at GetResponse AI-powered automation to spice up your corporation!

