Sergey Orsik.dev
← projects

// case study

AI Agent Orchestration Platform

Teams needed multi-step agents with tool use, retries, and audit trails — without a spaghetti of cron jobs.

Role

Systems engineer — agent runtime, tool gateway, observability

Stack

PythonTemporalRedisPostgresOpenTelemetry

Architecture

Highlights

  • Durable workflows for agent steps with automatic retry and compensation
  • Tool gateway with schema validation, rate limits, and per-tenant isolation

Architecture

Agents are state machines backed by Temporal. Each step is an activity with explicit timeouts. Tool calls route through a gateway that validates JSON Schema payloads before execution.

Engineering details

  • Planner / executor split — LLM proposes plan; runtime validates against allowed tools.
  • Idempotency keys on every external side effect.
  • Trace context propagated from workflow → activity → HTTP tool call.

Outcomes

  • 40% reduction in failed runs after adding structured retry policies
  • Full audit log per agent run for compliance review