Clips AI — Video-to-Shorts Pipeline

Architecture

The system is a monorepo (frontend, backend, packages/shared) that turns one long video into many vertical shorts. The API is a control plane only: it authenticates, issues presigned MinIO URLs, enqueues jobs, and serves read models. All heavy work runs in BullMQ workers backed by Redis, with PostgreSQL as the source of truth for projects, jobs, transcripts, clips, and an append-only job_events log.

Blobs never land in Postgres — only bucket + objectKey on assets. Workers download to temp disk, call FFmpeg / Whisper / OpenRouter, upload results, then append events the UI can poll.

Deploy invariant: HTTP request lifecycle must not run probe, transcribe, analyze, or render. Docker Compose runs one API container and six worker containers so CPU-heavy FFmpeg and GPU/CPU Whisper do not starve the API.

Engineering details

Dual-layer jobs — Every enqueue creates a processing_jobs row (idempotencyKey unique), then queue.add with jobId = row.id. Workers no-op if status is already succeeded. Retries get a new key suffix (:timestamp or reprocess:…).
Queue chaining — Happy path: media.probe → media.audio_extract → transcription.whisper → transcript.normalize → ai.analyze_transcript → optional N× render.clip → export.zip.
AI pipeline — Per-chunk scout (Mastra + OpenRouter) → deterministic dedupe → global ranker → render planner → optional JSON repair → BoundaryRefiner on word/silence timings → clip_candidates + render plans.
Storage split — MinIO internal client for workers; separate public endpoint for browser presigned PUT/GET (MINIO_PUBLIC_ENDPOINT).
Project FSM — projects.status transitions are explicitly whitelisted; illegal transitions return 400 before workers run.
Observability — job_events per stage + ai_runs (input hash, prompt version, tokens, validation errors). Admin jobs UI lists queue depth via shared QUEUE_NAMES.

Subsystems

Layer	Responsibility
Next.js frontend	Auth, upload to MinIO, status polling, transcript/clips review, export download
NestJS API	RBAC, upload sessions, enqueue, cancel/reprocess, health (`DB` + `Redis` + `MinIO`)
Workers	FFmpeg probe/extract/render, faster-whisper, LLM workflow, ZIP export, cleanup sweeps
Postgres	Metadata, transcripts, processing_jobs, job_events, ai_runs, clip_candidates
Redis	BullMQ transport only (not pub/sub fan-out)
MinIO	Raw video, audio, transcripts, rendered MP4s, exports

Failure modes

Risk	Mitigation / gap
Workers stopped	Jobs stay `queued` in Postgres; no in-app alert
Bull enqueue after DB insert	Row without `bullmqJobId` — no reconciler in code yet
Long video analyze	~1 LLM scout call per ~7 min chunk — linear cost
Render retry	New render via clips API, not generic job replay

Clips AI — Video-to-Shorts Pipeline

Role

Stack

Architecture

Highlights

Architecture

Engineering details

Subsystems

Failure modes