Training Agents with Patches
Improve your agent iteratively by evaluating results and patching its configuration — goal and tools.
The Training Loop
┌──────────────┐ ┌──────────┐ ┌──────────────┐ ┌──────────────┐
│ Run Agent │ ──▶ │ Evaluate │ ──▶ │ PATCH Agent │ ──▶ │ Run Again │
│ │ │ Output │ │ goal/tools │ │ │
│ │ │ │ │ │ │ │
└──────────────┘ └──────────┘ └──────────────┘ └──────────────┘
Unlike fine-tuning a model, training an agent is about refining its instructions and tool set based on observed behavior. Every improvement is a simple PATCH call.
What you can patch
| Field | Description | Use case |
|---|---|---|
goal | Agent instructions / prompt | Refine behavior, add edge cases |
tools | Attached MCP tools | Add/remove capabilities |
name | Display name | Versioned naming |
status | Agent lifecycle | Activate/archive |
Pattern 1: Refine instructions based on output
The simplest training loop — run, evaluate, patch the goal.
from flymyai import AgentClient
client = AgentClient(api_key="fly-***")
# Create tools first — each tool has an integer ID
web_search = client.tools.create(mcp_tool="tavily")
agent = client.agents.create(
name="Email Drafter",
goal="Write a professional email about {{ topic }}.",
tools=[web_search.id],
)
# Run and evaluate
run = client.runs.create(agent_id=agent.id)
result = client.runs.wait(run.id)
# Too formal? Patch the goal
client.agents.update(agent.id,
goal="""Write a professional but friendly email about {{ topic }}.
Keep paragraphs short (2-3 sentences). Use a warm sign-off.
Avoid corporate jargon.""",
)
# Run again — agent now uses refined instructions
run2 = client.runs.create(agent_id=agent.id)
result2 = client.runs.wait(run2.id)
Pattern 2: Automated training loop
Evaluate output programmatically and patch until quality meets a threshold.
def evaluate_output(output: dict) -> dict:
"""Returns {"score": 0-1, "issues": [...]}"""
issues = []
if len(output.get("summary", "")) < 100:
issues.append("Summary too short")
if not output.get("sources"):
issues.append("No sources cited")
score = 1.0 - (len(issues) * 0.3)
return {"score": max(0, score), "issues": issues}
web_search = client.tools.create(mcp_tool="tavily") # tool.id is an integer
agent = client.agents.create(
name="Research Bot v1",
goal="Research {{ topic }} and return a summary with sources.",
tools=[web_search.id],
)
for iteration in range(5):
run = client.runs.create(agent_id=agent.id)
result = client.runs.wait(run.id, timeout=600)
if result.status != "completed":
print(f" Run failed: {result.error}")
break
eval_result = evaluate_output(result.output)
print(f" Iteration {iteration + 1}: score={eval_result['score']:.1f}")
if eval_result["score"] >= 0.8:
print(" ✓ Quality threshold reached")
break
# Refine the goal based on issues
current = client.agents.get(agent.id)
corrections = "; ".join(eval_result["issues"])
client.agents.update(agent.id,
goal=current.goal + f"\n\nCorrection (iteration {iteration + 1}): {corrections}",
)
Pattern 3: Tool evolution
Start simple, add tools based on what the agent struggles with.
# Create tools — each returns an object with an integer .id
web_search = client.tools.create(mcp_tool="tavily")
financial_api = client.tools.create(mcp_tool="financial_datasets")
# Start with just web search
agent = client.agents.create(
name="Analyst",
goal="Analyze {{ company }} and produce a report.",
tools=[web_search.id],
)
run = client.runs.create(agent_id=agent.id)
result = client.runs.wait(run.id)
# Agent couldn't access financial data — add a specialized tool
if "financial data unavailable" in str(result.output):
client.agents.update(agent.id,
tools=[web_search.id, financial_api.id],
)
Pattern 4: Review run logs before patching
Use execution logs to understand what the agent did before making corrections.
run = client.runs.create(agent_id=agent.id)
# Stream live
for event in client.runs.stream_events(run.id, timeout=600):
if event.type == "tool_called":
print(f" Tool: {event.message}")
print(f" Args: {event.data}")
elif event.type == "tool_call_exception":
print(f" ✗ Error: {event.message}")
# Or inspect after completion
result = client.runs.get(run.id)
for log in result.logs:
print(f"[{log.type}] {log.message}")
if log.data:
print(f" data: {log.data}")
Pattern 5: A/B testing agents
Create variants, run them on the same input, compare.
variants = [
{"name": "v1-concise", "goal": "Give a brief 2-paragraph answer about {{ topic }}."},
{"name": "v2-detailed", "goal": "Give a thorough analysis of {{ topic }} with examples."},
{"name": "v3-structured", "goal": "Analyze {{ topic }}. Return JSON with: summary, pros, cons, recommendation."},
]
web_search = client.tools.create(mcp_tool="tavily")
results = {}
for v in variants:
agent = client.agents.create(**v, tools=[web_search.id])
run = client.runs.create(agent_id=agent.id)
result = client.runs.wait(run.id)
results[v["name"]] = result.output
client.agents.delete(agent.id)
# Compare outputs
for name, output in results.items():
print(f"\n--- {name} ---")
print(str(output)[:200])
Best practices
- Small patches — change one thing at a time so you can attribute improvement
- Log everything — use
stream_events()to understand why the agent behaved a certain way - Version your agents — use descriptive names (
v1-concise,v2-with-sources) or clone agents for A/B tests - Automate evaluation — write scoring functions for your use case and loop until threshold
- Review before patching — always check logs first; the issue might be a tool failure, not a prompt problem