AI Run Audit Trail and Prompt Versioning

Every LLM call creates an ai_runs row with input hash, prompt version FK, token usage, and validation errors for replay and debugging.

Purpose

Non-deterministic LLM outputs need production debugging: which prompt version, which model, and which input produced a bad clip boundary. The system treats AI as metered, auditable batch jobs — not fire-and-forget HTTP.

`ai_runs` lifecycle

Each agent step in AnalyzeTranscriptWorkflowService follows the same pattern:

Resolve prompt — promptVersions.resolveActive(promptKey) → FK to prompt_versions.
Open run — aiRuns.startRun({ workflowName: 'analyze_transcript', agentName, inputHash, inputPreview, status: 'running' }).
Call model — ai.generateStructured(...) via OpenRouter / Mastra.
Close run — aiRuns.finishRun with succeeded | failed | repaired, plus optional outputJson, validationErrorsJson, tokens, and latencyMs.

Input fingerprint

inputHash = sha256(JSON.stringify(input)) — dedup analysis and a hook for future cache layers (cache not implemented today).
inputPreview — first 600 chars for admin UI; full chunk text stays in transcript_chunks.

Prompt registry

Piece	Role
`PROMPT_DEFINITIONS` + `PROMPT_KEYS`	Central agent names and instructions
`PROMPT_VERSION`	Bumps tie into idempotency keys (`analyzeTranscriptKey(..., promptVersion)`)
`ensureRegistered` on workflow start	Creates DB rows when keys are missing

Structured output

Zod schemas (chunkScoutOutputSchema, clipRankerOutputSchema, clipPlanOutputSchema, jsonRepairOutputSchema) validate every response before persisting candidates.

Invalid scout / ranker / planner → failed run + AiAnalysisError with stable error codes for API consumers.

JSON repair path

A separate run with status repaired when the fix agent succeeds — separates first-pass planner failure from recovery for support triage.

Product linkage

clip_candidates.sourceAiRunId points to the planner run, not scout runs. Scouts are many-to-one per project analyze job.

Operational invariant: every LLM invocation must leave an ai_runs row before side effects touch clip_candidates. If you cannot replay from the row, the step is not production-ready.

Failure modes

Risk	Impact
Partial scout success, then job failure	Orphan succeeded `ai_runs` for early chunks; rerun may duplicate scout spend unless idempotency blocks
`inputPreview` carries PII	Chunk snippet in DB — fine for internal admin; consider redaction for multi-tenant
`AI_USE_FIXTURE=true` in prod	Deterministic fake clips — requires env guard

Tradeoff

Per-chunk scout runs improve quality and context fit but multiply cost vs single-pass summarization. tokenEstimate on chunks is the hook for future budgeting dashboards.