Pipeline stages (numbered)
| Stage | Tier | Sync/async | Primary outputs |
|---|---|---|---|
| 0 | Mobile/web client | Sync | Reduced file on disk/memory |
| 1 | API initiate | Sync | uploadId, presigned part URLs |
| 2 | Client → object store | Sync (parallel parts) | ETags per part |
| 3 | API complete | Sync | DB row, Kafka request event |
| 4 | Media worker | Async | Variant objects in bucket |
| 5 | API Kafka consumer | Async | Updated URLs in DB |
| 6 | Client read + edge | Sync | Bytes from CDN |
[Client] → initiate/complete → [API + DB + Kafka]
↓
[Media worker + object store]
↓
[Kafka result] → [API updates DB]
[Client] ← profile/deck API ← [DB with optimized URLs]
[Client] → GET /media/{key} → [Edge] → 302 → [CDN]
Stage 0 — Client preprocessing
Goal: Cut upload time and storage ingress cost; catch obvious validation errors before network.
Image path (typical parameters)
| Step | Technique | Parameters (example) |
|---|---|---|
| Decode | Platform decoder / image package | Reject unknown magic bytes |
| Bound dimensions | Aspect-preserving resize | maxWidth=1920, maxHeight=1080 |
| Re-encode | JPEG Q≈92 or PNG | Enforce max file size (e.g. 10–15 MB cap) |
| Validation | Extension allowlist | .jpg, .jpeg, .png, .webp |
Output is a temporary file (native) or in-memory blob registry (web) consumed by the upload layer.
Video path (native vs web)
| Platform | Strategy |
|---|---|
| iOS/Android | On-device compress when fileSize > threshold (e.g. 50 MB); duration/max-size guards |
| Web | Often cannot rely on hardware encoders — optional sync transcode HTTP on worker (POST /transcode) before multipart upload, streaming body to avoid loading entire file into API memory |
Design note: Client preprocessing and server optimization are intentionally redundant. Client reduces p99 upload time; server produces the canonical ladder (WebP qualities, MP4 faststart) used by other users' clients in feed/deck.
Stage 1–2 — Multipart upload protocol
initiate (API)
Inputs: fileName, fileSize, optional description, authenticated userId.
Actions:
- Validate size against per-type caps (
maxImageBytes,maxVideoBytes). CreateMultipartUploadon object store with key pattern{userId}/{sanitizedFileName}.- Generate presigned
UploadPartURLs — part size commonly 5 MiB (balances request count vs memory). - Optionally stash
descriptionin Redis:media:init:desc:{uploadId}TTL ~1h (metadata not carried in multipart completion body).
Outputs: { uploadId, parts: [{ partNumber, partUrl }] }.
Presigned URLs may be rewritten to pass through the app domain (PRESIGNED_URL_BASE) so nginx can fix Host for signature validation against the storage endpoint.
Client upload
For each part i:
PUT {partUrl}
Content-Type: application/octet-stream
Body: bytes [i * partSize, min((i+1)*partSize, fileSize))
Collect { partNumber, eTag } from response headers.
Concurrency: Parts may upload sequentially or with bounded parallelism; completion requires ordered part list for CompleteMultipartUpload.
Stage 3 — complete synchronous gate (API)
This is the trust boundary before async work and before other users should see the asset.
Ordered steps
1. CompleteMultipartUpload(parts)
2. validateMedia(probe) → on fail: DeleteObject, 4xx to client
3. moderateContent(url) → policy fail: DeleteObject, 4xx
→ vendor outage: retryable exception, object retained
4. INSERT profile_media → media_url = API indirection URL
5. PUBLISH optimize.request → Kafka
6. RETURN { mediaId, mediaUrl }
media_url at this moment
Not the CDN URL. Pattern:
https://{api}/v1/media/{percentEncode(objectKey)}
The row is visible to the owner immediately with the original bytes reachable via indirection, but feed consumers may still prefer derivative URLs once stage 5 completes.
Gallery ordering (illustrative rules)
When position is supplied on complete:
- Non-negative integer slot in profile gallery.
- If profile already has video, slot
0may be reserved for video type. - Concurrent completes on same profile:
SELECT … FOR UPDATE+ shiftserial_numberto avoid unique constraint races.
Batch enqueue on registration
Profile creation flows that attach multiple media in one transaction often loop all rows and publish one Kafka event per mediaId, not only the multipart-complete code path. Avoids "onboarding photos stuck unoptimized" when upload APIs differ between registration steps.
Stage 4 — Media worker (Go or similar)
Consumption
- Consumer group:
media-worker(illustrative). - Payload parser: fail fast if any of
s3Key,s3Bucket,mediaId,mediaUrlmissing — prevents partial processing. ProcessMediaEventreturns quickly; work runs in goroutine + bounded channel semaphore (workerPoolSize).
Per-job filesystem discipline
/tmp/{jobUuid}/
input.{ext}
output/
{base}_high.webp
{base}_medium.webp
…
defer RemoveAll(/tmp/{jobUuid})
Never share temp dirs across jobs — concurrent FFmpeg/libvips corruption risk.
Image processing (libvips / bimg pattern)
| Quality key | Max edge (px) | Format | Typical Q |
|---|---|---|---|
high | 1920 | WebP | 90 |
medium | 1280 | WebP | 85 |
low | 854 | WebP | 80 |
Resize uses max edge with aspect ratio preserved (portrait-safe for mobile UGC).
Upload key layout (pattern):
{userId}/images/{quality}/{filename}.webp
Video processing (current-generation pattern)
Many codebases migrated from HLS ladders to single MP4 + poster for profile clips (simpler client, fewer manifest edge cases):
| Output | Encoder flags (conceptual) | Storage key pattern |
|---|---|---|
| MP4 | H.264 + AAC, +faststart (moov at front) | {userId}/videos/{mediaId}/mp4/{name}.mp4 |
| Thumbnail | FFmpeg frame @ ~1s, 1280×720 | {userId}/thumbnail/{mediaId}/{name}.jpg |
Thumbnail failure is often non-fatal (warn + continue) so video playback still ships.
CDN URL construction
Worker builds same URL shape the API uses, via configured CDN_BASE:
{CDN_BASE}/{userId}/{relativePathAfterUserId}
Mismatch between worker CDN_BASE and edge rewrite rules is a common production bug — treat as contract tested in staging.
Retry semantics (worker-internal)
for retry in 0..maxRetries:
acquire(poolSlot)
err = processJob()
release(poolSlot)
if err == nil: break
sleep(retry * 1s) // linear backoff
publish result (success or error message)
Distinct from Kafka consumer redelivery — worker publishes success: false instead of infinite loop.
Stage 5 — API result consumer
@KafkaListener on optimize response topic (transactional per message):
| Condition | Action |
|---|---|
success == false | Log + metric; do not clear original URL |
| image + success | media_url ← processed where quality=medium; set low/medium/high columns |
| video + success | media_url ← processed where quality=original (MP4); thumbnail_url ← quality=thumbnail |
No push to client in baseline design — UI refreshes on next profile/deck fetch or local optimistic state merge.
Virus side-channel
POST /scan (ClamAV) may run outside hot path. On detection:
- Worker deletes object(s) from bucket.
- Publishes to
{responseTopic}.virus. - API listener deletes DB row and best-effort deletes derivative keys if URLs were partially written.
Stage 6 — Delivery
Edge behavior
location /v1/media/ {
auth_request /internal/auth; # illustrative
rewrite → CDN origin;
return 302;
}
Separate location may proxy /v1/media-process/* to worker HTTP for web transcode and health.
Client playback
- Images: Deck cards request
mediumorlowURL from API model when available; fallback tomedia_url. - Video: ExoPlayer/AVPlayer on MP4 URL; thumbnail for placeholders. Legacy HLS clients may still exist in code comments — verify which path is live before documenting player behavior.
Latency budget (order-of-magnitude)
| Segment | Typical range | Notes |
|---|---|---|
| Client preprocess | 0.5–5 s | Video dominates |
| Multipart upload | Depends on bandwidth | Parallel parts help |
complete gate | 1–4 s | Moderation API p95 |
| Queue + worker | 5–60+ s | Video length, pool saturation |
| Profile refetch | Next user action | No websocket in baseline |
Idempotency and duplicate processing
| Layer | Behavior |
|---|---|
| Kafka | At-least-once delivery possible |
| Worker | Same mediaId + deterministic keys → overwrite variants (last write wins) |
| API consumer | Blind UPDATE by mediaId — generally safe to replay |
| Missing idempotency key | Re-published events after manual fix may re-run FFmpeg unnecessarily |
Hardening options (not always present in v1): dedupe table keyed by (mediaId, processingVersion), outbox for publish-after-commit, processing_status enum on row (pending|ready|failed).
Observability checklist (pattern)
| Signal | Emitter | Use |
|---|---|---|
media_processing_seconds histogram | Worker | SLO, capacity planning |
media_processing_errors_total | Worker | Alert on rate |
active_jobs gauge | Worker | Autoscaling signal |
| Trace spans per stage | OpenTelemetry | Debug slow transcodes |
| Kafka consumer lag | Broker metrics | Backlog incidents |
| Upload 4xx by reason | API | Moderation vs validation split |