Agents are workflows
An agent "loop" in production is really a durable workflow with:
- Explicit step boundaries
- Timeouts per step
- Compensation on failure
Cron + scripts fail silently. Temporal (or similar) gives you history you can audit.
Tool gateway pattern
Never let the LLM call arbitrary URLs. Route through a gateway that:
- Validates payload against JSON Schema
- Applies per-tenant rate limits
- Attaches trace IDs and logs structured results
Planner vs executor
Split responsibilities:
- Planner proposes steps (LLM)
- Executor validates and runs (code)
The runtime should reject plans that reference unknown tools or exceed step budgets.
Failure modes I've seen
| Symptom | Root cause | Fix |
|---|---|---|
| Infinite retries | No max attempts on activity | Cap + dead-letter queue |
| Nondeterministic replays | Random IDs in workflow code | Deterministic IDs only |
| Tool hallucination | Schema too loose | Tighten required fields |
Takeaway
Ship the smallest agent that solves one workflow end-to-end. General-purpose agents are a research problem, not a v1 product feature.