Topology (inferred from worker-bootstrap.ts + Compose)
| Process | Queues | Typical bottleneck |
|---|---|---|
media.worker | media.probe, media.audio_extract | FFmpeg, disk I/O |
transcription.worker | transcription.whisper | Python faster-whisper subprocess |
ai.worker | transcript.normalize, ai.analyze_transcript | OpenRouter latency, token cost |
render.worker | render.clip | FFmpeg filter graphs, CPU |
export.worker | export.zip | Zip + large downloads |
maintenance.worker | maintenance.cleanup | Scheduled sweeps |
Each process boots WorkerAppModule (NestJS application context without HTTP) and registers BullMQ Worker instances with configurable concurrency from env (workers.*Concurrency).
Why not one mega-worker
- Resource isolation — Render and probe both use FFmpeg; separate pools prevent probe latency from blocking exports.
- Scaling — Compose can scale
worker-renderreplicas independently (inferred ops pattern; single replica in default Compose). - Failure blast radius — OOM in whisper worker does not take down API.
- Dependency packaging — Transcription worker image includes Python + faster-whisper; AI worker needs Mastra/OpenRouter only.
Maintenance scheduler
maintenance worker owns a second BullMQ Queue used as a ticker: setInterval enqueues maintenance.cleanup jobs with time-based idempotency keys (maintenance.cleanup:${Date.now()}). Concurrency = 1 avoids overlapping sweeps.
Shared infrastructure
All workers share:
DATABASE_URL(Prisma)REDIS_URL(BullMQ connection fromRedisService.getBullMqClient())- MinIO credentials
WORKER_TEMP_DIRfor per-job temp directories (createJobTempDir+cleanupTempDir)
Graceful shutdown
SIGINT / SIGTERM → close all Worker instances → close maintenance queue → app.close() → process.exit(0).
Failure mode
If only API is up and workers are down, jobs accumulate in Redis as queued DB rows — UI shows stalled pipeline with no automatic alert in code (inferred ops gap).
See diagram: diagrams/system-overview.mmd.