Worker Process Isolation by Domain

Six separate Node worker entrypoints consume disjoint BullMQ queues so CPU-heavy FFmpeg, GPU transcription, and LLM calls do not starve each other.

Topology (inferred from `worker-bootstrap.ts` + Compose)

Process	Queues	Typical bottleneck
`media.worker`	`media.probe`, `media.audio_extract`	FFmpeg, disk I/O
`transcription.worker`	`transcription.whisper`	Python faster-whisper subprocess
`ai.worker`	`transcript.normalize`, `ai.analyze_transcript`	OpenRouter latency, token cost
`render.worker`	`render.clip`	FFmpeg filter graphs, CPU
`export.worker`	`export.zip`	Zip + large downloads
`maintenance.worker`	`maintenance.cleanup`	Scheduled sweeps

Each process boots WorkerAppModule (NestJS application context without HTTP) and registers BullMQ Worker instances with configurable concurrency from env (workers.*Concurrency).

Why not one mega-worker

Resource isolation — Render and probe both use FFmpeg; separate pools prevent probe latency from blocking exports.
Scaling — Compose can scale worker-render replicas independently (inferred ops pattern; single replica in default Compose).
Failure blast radius — OOM in whisper worker does not take down API.
Dependency packaging — Transcription worker image includes Python + faster-whisper; AI worker needs Mastra/OpenRouter only.

Maintenance scheduler

maintenance worker owns a second BullMQ Queue used as a ticker: setInterval enqueues maintenance.cleanup jobs with time-based idempotency keys (maintenance.cleanup:${Date.now()}). Concurrency = 1 avoids overlapping sweeps.

Shared infrastructure

All workers share:

DATABASE_URL (Prisma)
REDIS_URL (BullMQ connection from RedisService.getBullMqClient())
MinIO credentials
WORKER_TEMP_DIR for per-job temp directories (createJobTempDir + cleanupTempDir)

Graceful shutdown

SIGINT / SIGTERM → close all Worker instances → close maintenance queue → app.close() → process.exit(0).

Failure mode

If only API is up and workers are down, jobs accumulate in Redis as queued DB rows — UI shows stalled pipeline with no automatic alert in code (inferred ops gap).

See diagram: diagrams/system-overview.mmd.