Sergey Orsik.dev
← notes

2026-05-16

Async Profile Media Pipeline — Technical Reference

Anonymized end-to-end design: client preprocessing, multipart upload gate, Kafka handoff, Go worker transforms, DB URL swap, CDN delivery.

Diagram

Pipeline stages (numbered)

StageTierSync/asyncPrimary outputs
0Mobile/web clientSyncReduced file on disk/memory
1API initiateSyncuploadId, presigned part URLs
2Client → object storeSync (parallel parts)ETags per part
3API completeSyncDB row, Kafka request event
4Media workerAsyncVariant objects in bucket
5API Kafka consumerAsyncUpdated URLs in DB
6Client read + edgeSyncBytes from CDN
[Client] → initiate/complete → [API + DB + Kafka]
                                    ↓
                              [Media worker + object store]
                                    ↓
                              [Kafka result] → [API updates DB]
[Client] ← profile/deck API ← [DB with optimized URLs]
[Client] → GET /media/{key} → [Edge] → 302 → [CDN]

Stage 0 — Client preprocessing

Goal: Cut upload time and storage ingress cost; catch obvious validation errors before network.

Image path (typical parameters)

StepTechniqueParameters (example)
DecodePlatform decoder / image packageReject unknown magic bytes
Bound dimensionsAspect-preserving resizemaxWidth=1920, maxHeight=1080
Re-encodeJPEG Q≈92 or PNGEnforce max file size (e.g. 10–15 MB cap)
ValidationExtension allowlist.jpg, .jpeg, .png, .webp

Output is a temporary file (native) or in-memory blob registry (web) consumed by the upload layer.

Video path (native vs web)

PlatformStrategy
iOS/AndroidOn-device compress when fileSize > threshold (e.g. 50 MB); duration/max-size guards
WebOften cannot rely on hardware encoders — optional sync transcode HTTP on worker (POST /transcode) before multipart upload, streaming body to avoid loading entire file into API memory

Design note: Client preprocessing and server optimization are intentionally redundant. Client reduces p99 upload time; server produces the canonical ladder (WebP qualities, MP4 faststart) used by other users' clients in feed/deck.


Stage 1–2 — Multipart upload protocol

initiate (API)

Inputs: fileName, fileSize, optional description, authenticated userId.

Actions:

  1. Validate size against per-type caps (maxImageBytes, maxVideoBytes).
  2. CreateMultipartUpload on object store with key pattern {userId}/{sanitizedFileName}.
  3. Generate presigned UploadPart URLs — part size commonly 5 MiB (balances request count vs memory).
  4. Optionally stash description in Redis: media:init:desc:{uploadId} TTL ~1h (metadata not carried in multipart completion body).

Outputs: { uploadId, parts: [{ partNumber, partUrl }] }.

Presigned URLs may be rewritten to pass through the app domain (PRESIGNED_URL_BASE) so nginx can fix Host for signature validation against the storage endpoint.

Client upload

For each part i:

PUT {partUrl}
Content-Type: application/octet-stream
Body: bytes [i * partSize, min((i+1)*partSize, fileSize))

Collect { partNumber, eTag } from response headers.

Concurrency: Parts may upload sequentially or with bounded parallelism; completion requires ordered part list for CompleteMultipartUpload.


Stage 3 — complete synchronous gate (API)

This is the trust boundary before async work and before other users should see the asset.

Ordered steps

1. CompleteMultipartUpload(parts)
2. validateMedia(probe)     → on fail: DeleteObject, 4xx to client
3. moderateContent(url)     → policy fail: DeleteObject, 4xx
                          → vendor outage: retryable exception, object retained
4. INSERT profile_media     → media_url = API indirection URL
5. PUBLISH optimize.request → Kafka
6. RETURN { mediaId, mediaUrl }

media_url at this moment

Not the CDN URL. Pattern:

https://{api}/v1/media/{percentEncode(objectKey)}

The row is visible to the owner immediately with the original bytes reachable via indirection, but feed consumers may still prefer derivative URLs once stage 5 completes.

Gallery ordering (illustrative rules)

When position is supplied on complete:

  • Non-negative integer slot in profile gallery.
  • If profile already has video, slot 0 may be reserved for video type.
  • Concurrent completes on same profile: SELECT … FOR UPDATE + shift serial_number to avoid unique constraint races.

Batch enqueue on registration

Profile creation flows that attach multiple media in one transaction often loop all rows and publish one Kafka event per mediaId, not only the multipart-complete code path. Avoids "onboarding photos stuck unoptimized" when upload APIs differ between registration steps.


Stage 4 — Media worker (Go or similar)

Consumption

  • Consumer group: media-worker (illustrative).
  • Payload parser: fail fast if any of s3Key, s3Bucket, mediaId, mediaUrl missing — prevents partial processing.
  • ProcessMediaEvent returns quickly; work runs in goroutine + bounded channel semaphore (workerPoolSize).

Per-job filesystem discipline

/tmp/{jobUuid}/
  input.{ext}
  output/
    {base}_high.webp
    {base}_medium.webp
    …
defer RemoveAll(/tmp/{jobUuid})

Never share temp dirs across jobs — concurrent FFmpeg/libvips corruption risk.

Image processing (libvips / bimg pattern)

Quality keyMax edge (px)FormatTypical Q
high1920WebP90
medium1280WebP85
low854WebP80

Resize uses max edge with aspect ratio preserved (portrait-safe for mobile UGC).

Upload key layout (pattern):

{userId}/images/{quality}/{filename}.webp

Video processing (current-generation pattern)

Many codebases migrated from HLS ladders to single MP4 + poster for profile clips (simpler client, fewer manifest edge cases):

OutputEncoder flags (conceptual)Storage key pattern
MP4H.264 + AAC, +faststart (moov at front){userId}/videos/{mediaId}/mp4/{name}.mp4
ThumbnailFFmpeg frame @ ~1s, 1280×720{userId}/thumbnail/{mediaId}/{name}.jpg

Thumbnail failure is often non-fatal (warn + continue) so video playback still ships.

CDN URL construction

Worker builds same URL shape the API uses, via configured CDN_BASE:

{CDN_BASE}/{userId}/{relativePathAfterUserId}

Mismatch between worker CDN_BASE and edge rewrite rules is a common production bug — treat as contract tested in staging.

Retry semantics (worker-internal)

for retry in 0..maxRetries:
    acquire(poolSlot)
    err = processJob()
    release(poolSlot)
    if err == nil: break
    sleep(retry * 1s)  // linear backoff
publish result (success or error message)

Distinct from Kafka consumer redelivery — worker publishes success: false instead of infinite loop.


Stage 5 — API result consumer

@KafkaListener on optimize response topic (transactional per message):

ConditionAction
success == falseLog + metric; do not clear original URL
image + successmedia_urlprocessed where quality=medium; set low/medium/high columns
video + successmedia_urlprocessed where quality=original (MP4); thumbnail_urlquality=thumbnail

No push to client in baseline design — UI refreshes on next profile/deck fetch or local optimistic state merge.

Virus side-channel

POST /scan (ClamAV) may run outside hot path. On detection:

  1. Worker deletes object(s) from bucket.
  2. Publishes to {responseTopic}.virus.
  3. API listener deletes DB row and best-effort deletes derivative keys if URLs were partially written.

Stage 6 — Delivery

Edge behavior

location /v1/media/ {
  auth_request /internal/auth;   # illustrative
  rewrite → CDN origin;
  return 302;
}

Separate location may proxy /v1/media-process/* to worker HTTP for web transcode and health.

Client playback

  • Images: Deck cards request medium or low URL from API model when available; fallback to media_url.
  • Video: ExoPlayer/AVPlayer on MP4 URL; thumbnail for placeholders. Legacy HLS clients may still exist in code comments — verify which path is live before documenting player behavior.

Latency budget (order-of-magnitude)

SegmentTypical rangeNotes
Client preprocess0.5–5 sVideo dominates
Multipart uploadDepends on bandwidthParallel parts help
complete gate1–4 sModeration API p95
Queue + worker5–60+ sVideo length, pool saturation
Profile refetchNext user actionNo websocket in baseline

Idempotency and duplicate processing

LayerBehavior
KafkaAt-least-once delivery possible
WorkerSame mediaId + deterministic keys → overwrite variants (last write wins)
API consumerBlind UPDATE by mediaId — generally safe to replay
Missing idempotency keyRe-published events after manual fix may re-run FFmpeg unnecessarily

Hardening options (not always present in v1): dedupe table keyed by (mediaId, processingVersion), outbox for publish-after-commit, processing_status enum on row (pending|ready|failed).


Observability checklist (pattern)

SignalEmitterUse
media_processing_seconds histogramWorkerSLO, capacity planning
media_processing_errors_totalWorkerAlert on rate
active_jobs gaugeWorkerAutoscaling signal
Trace spans per stageOpenTelemetryDebug slow transcodes
Kafka consumer lagBroker metricsBacklog incidents
Upload 4xx by reasonAPIModeration vs validation split