Scope of this document
This note describes a recurring architecture pattern seen in mobile social products: presigned multipart upload, synchronous safety gates on the API, asynchronous transcoding on a dedicated worker fleet, and URL indirection at the edge.
Anonymization rule: Where the implementation uses a concrete hostname,
bucket, or topic string, this document uses role names (api-service,
media-worker, object-store) or pattern placeholders ({tenant} .media.optimize.request). Behavior is technically accurate; identifiers are
not.
High-level topology
The system splits into four planes:
| Plane | Responsibility | Typical runtime |
|---|---|---|
| Client | Local resize/compress, multipart upload to object store via presigned URLs, poll/read APIs for finalized URLs | Flutter (iOS/Android/Web) |
| Control (API) | AuthZ, upload session orchestration, DB metadata, inline validation/moderation, Kafka produce/consume for async results | Spring Boot monolith or modular service |
| Async (media worker) | CPU-heavy transforms: WebP ladders, MP4 remux/faststart, thumbnails; optional virus scan HTTP API | Go service(s), horizontally scaled |
| Edge + data | TLS termination, auth-aware redirect to CDN, Postgres, Redis, Kafka, S3-compatible storage | nginx/Envoy + managed data stores |
Client ──► Edge ──► API ──► Postgres / Redis / Object store
│ │
│ └──► Kafka ──► Media worker ──► Object store
│ │
└◄──────── Kafka (results) ◄───┘
Write path (media) — engineering contract
- Client preprocessing (best-effort) — Reduces bytes before network upload (max dimension caps, on-device video compress above a size threshold). This is not the canonical quality ladder; server output is authoritative for feed/deck rendering.
- Presigned multipart upload — API issues
uploadId+ N presignedPUTURLs; client uploads parts directly to object storage. API never streams large bodies. - Synchronous gate on
complete— AfterCompleteMultipartUpload: probe/validate media (FFmpeg or equivalent), call moderation API, then insert metadata row. Invalid or policy-violating objects are deleted from storage before any async job is enqueued. - Async optimization — API publishes a compact event
{ objectKey, bucket, mediaId, canonicalUrl }. Worker downloads, transforms, uploads variants, publishes result event. - URL swap in DB — API consumer updates primary
media_urland derivative columns (low/medium/high/thumbnail) from the result payload.
Invariant: HTTP request thread must not run full transcode. Separation is enforced by process/deployment boundary (API pods vs worker pods), not only by async method calls inside the JVM.
Read path (media)
Public URLs exposed to clients often use API indirection:
GET https://{api-host}/v1/media/{url-encoded-object-key}
→ edge authenticates (session/JWT)
→ 302 to CDN/object-store URL
Benefits: revoke access without rotating CDN keys; hide raw storage layout; centralize logging. Tradeoff: extra redirect latency vs direct CDN URL in DB.
Presigned upload URLs are similarly rewritten so signatures match the Host header the storage provider expects when traffic passes through the app domain.
Event bus coupling (API ↔ worker)
| Direction | Semantic | Payload shape (conceptual) |
|---|---|---|
| API → worker | "Process this object" | s3Key, bucket, mediaId, mediaUrl (all required in strict parsers) |
| worker → API | "Here are variants" | mediaId, success, processed[] with { quality, format, url, size } |
| worker → API (side) | "Malware detected" | mediaId, virusName, actionTaken on a derived topic suffix .virus |
Topic names are environment-specific strings wired via config; the pattern is stable even when literal names differ between media.raw (dev compose) and app.media.optimize.v1 (prod).
Adjacent domain services (pattern-level)
Products at this scale typically do not embed matching, chat, or push inside the media worker. Common decomposition (each own deployable + schema):
- Matching / deck — Redis-backed candidate sets, Postgres for likes/matches; may call profile read APIs that return optimized media URLs.
- Chat — Separate service; media in chat may reuse the same upload pipeline or a lighter attachment path.
- Notifications — FCM/APNs; triggered by domain events, not by media worker directly.
- Realtime gateway — WebSockets for chat/presence; profile media readiness is usually poll-based unless a separate notification event is added.
Exact service count varies; the important design point is bounded context between "profile media pipeline" and "social graph."
Data model (conceptual)
profile_media (name illustrative):
| Column | Role |
|---|---|
id | UUID primary key |
profile_id | FK |
media_type | image | video |
media_url | Canonical display URL (updated after optimization) |
original_url | Optional retention of pre-optimize reference |
low_*, medium_*, high_* | Image derivatives |
thumbnail_url | Video poster frame |
serial_number | Gallery ordering; video often pinned to slot 0 |
Blobs are never stored in RDBMS — only keys and CDN-facing URLs.
Failure modes and operational gaps (honest)
| Risk | Typical mitigation | Common gap in v1 implementations |
|---|---|---|
| Kafka publish fails after DB insert | Outbox pattern or reconciler cron | Row exists with original URL only; no automatic retry |
| Worker fleet down | Kafka retention + alert on lag | Clients see unoptimized media indefinitely |
| Moderation API outage | Retry on complete without deleting object | Upload latency spikes; duplicate moderation calls |
| Policy violation | Delete object, fail upload to client | — |
Optimization success: false | Keep original URL; log/metric | No user-visible "processing failed" state |
| Consumer handler error | DLQ, skip poison pill | Some Kafka clients exit consume loop on first handler error — partition stall until restart |
| Re-upload same content | New mediaId | No content-hash dedupe across users |