Sergey Orsik.dev
← projects

// case study

Clips AI — Video-to-Shorts Pipeline

Long streams need automated transcription, LLM clip discovery, and vertical renders without blocking HTTP or duplicating work across retries.

Role

Full-stack pipeline — NestJS control plane, BullMQ workers, multi-agent AI, FFmpeg render engine

Stack

Node.jsNestJSNext.jsPostgreSQLRedisBullMQMinIOMastraOpenRouterFFmpegfaster-whisper

Architecture

Highlights

  • Write path: API → processing_jobs (Postgres) → BullMQ → domain workers → job_events
  • Read path: frontend polls GET /projects/:id/events (~2.6s) — durable timeline, no WebSocket layer yet
  • Six isolated worker processes: media, transcription, AI, render, export, maintenance
  • LLM proposes clips; BoundaryRefiner + Zod validate before FFmpeg touches source video

Architecture

The system is a monorepo (frontend, backend, packages/shared) that turns one long video into many vertical shorts. The API is a control plane only: it authenticates, issues presigned MinIO URLs, enqueues jobs, and serves read models. All heavy work runs in BullMQ workers backed by Redis, with PostgreSQL as the source of truth for projects, jobs, transcripts, clips, and an append-only job_events log.

Blobs never land in Postgres — only bucket + objectKey on assets. Workers download to temp disk, call FFmpeg / Whisper / OpenRouter, upload results, then append events the UI can poll.

Deploy invariant: HTTP request lifecycle must not run probe, transcribe, analyze, or render. Docker Compose runs one API container and six worker containers so CPU-heavy FFmpeg and GPU/CPU Whisper do not starve the API.

Engineering details

  1. Dual-layer jobs — Every enqueue creates a processing_jobs row (idempotencyKey unique), then queue.add with jobId = row.id. Workers no-op if status is already succeeded. Retries get a new key suffix (:timestamp or reprocess:…).
  2. Queue chaining — Happy path: media.probemedia.audio_extracttranscription.whispertranscript.normalizeai.analyze_transcript → optional N× render.clipexport.zip.
  3. AI pipeline — Per-chunk scout (Mastra + OpenRouter) → deterministic dedupe → global ranker → render planner → optional JSON repair → BoundaryRefiner on word/silence timings → clip_candidates + render plans.
  4. Storage split — MinIO internal client for workers; separate public endpoint for browser presigned PUT/GET (MINIO_PUBLIC_ENDPOINT).
  5. Project FSMprojects.status transitions are explicitly whitelisted; illegal transitions return 400 before workers run.
  6. Observabilityjob_events per stage + ai_runs (input hash, prompt version, tokens, validation errors). Admin jobs UI lists queue depth via shared QUEUE_NAMES.

Subsystems

LayerResponsibility
Next.js frontendAuth, upload to MinIO, status polling, transcript/clips review, export download
NestJS APIRBAC, upload sessions, enqueue, cancel/reprocess, health (DB + Redis + MinIO)
WorkersFFmpeg probe/extract/render, faster-whisper, LLM workflow, ZIP export, cleanup sweeps
PostgresMetadata, transcripts, processing_jobs, job_events, ai_runs, clip_candidates
RedisBullMQ transport only (not pub/sub fan-out)
MinIORaw video, audio, transcripts, rendered MP4s, exports

Failure modes

RiskMitigation / gap
Workers stoppedJobs stay queued in Postgres; no in-app alert
Bull enqueue after DB insertRow without bullmqJobId — no reconciler in code yet
Long video analyze~1 LLM scout call per ~7 min chunk — linear cost
Render retryNew render via clips API, not generic job replay