Skip to content
Dev Dump

Design YouTube

YouTube is a massive video streaming platform supporting video uploads, playback, and various interactions. This chapter focuses on designing a scalable video streaming system with the following core features:

  • Fast video uploads
  • Smooth video streaming
  • Ability to change video quality
  • Low infrastructure cost
  • High availability and reliability
  • 2 billion monthly active users
  • 5 billion videos watched per day
  • 37% of mobile internet traffic comes from YouTube
  • Available in 80 languages
  • $15.1 billion ad revenue in 2019

  1. Upload videos
  2. Watch videos
  • Mobile apps, web browsers, and smart TVs
  • Daily Active Users (DAU): 5 million
  • Average Video Size: 300 MB
  • Upload Limits: Max 1 GB per video
  • Daily Storage Need: 150 TB
  • CDN Costs: 5 million * 5 videos * 0.3GB * $0.02 = $150,000/day (using Amazon CloudFront)

Figure

  1. Client: Devices like smartphones, computers, and TVs.
  2. CDN (Content Delivery Network): Stores and streams videos.
  3. API Servers: Handles all user interactions except video streaming (e.g., uploads, metadata updates).
  4. Metadata Database: Stores video metadata (e.g., title, description, size).
  5. Original Storage: Blob storage for uploaded videos.
  6. Transcoding Servers: Convert videos into multiple resolutions and formats.
  7. Transcoded Storage: Blob storage for transcoded videos.

  • Parallel Processes:

    1. Upload video to original storage.
    2. Update video metadata in the database.
  • Video Upload (Steps):

    Figure

    • [1] Videos are uploaded to blob storage.
    • [2] Transcoding servers convert videos to multiple formats.
    • [3] One trasncoding is complete, following two steps are exectued in parallel.
      • [3a] Transcoded videos are sent to transcoded storage.
      • [3b] Transcoding completion events are queued in the completion queue.
    • [3a.1] Videos are distributed to the CDN.
    • [3b.1] Completion handlers update metadata and inform users.
  • Metadata Upload (Steps):

    Figure

    • The client in parallel sends a request to update the video metadata
    • The request contains video metadata, including file name, size, format, etc.

Figure

  • Videos are streamed directly from the CDN using edge servers to minimize latency.
  • Some of te popular streaming protocols are MPEG_DASH, Apple HLS, Adobe HDS.
  • Different streaming protocols support different video encodings and playback players.

  1. Raw video consumes large amounts of storage space. It Reduces storage space.
  2. Ensures compatibility across devices and browsers.
  3. Adapts video quality to network conditions.
  • Container: Encapsulates video, audio, and metadata (e.g., MP4, AVI).
  • Codecs: Compression and Decompression algorithms (e.g., H.264, VP9).

Figure

  • Transcoding a video is computationally expensive and time-consuming.

  • DAG Model defines tasks like encoding, thumbnail generation, and watermarking.

  • Allows high parallelism in video processing.

  • The original video is split into video, audio, and metadata.

    • Video encodings: Videos are converted to support different resolutions, codec, bitrates.
    • Thumbnail: It can either be uploaded by a user or automatically generated bythe system.
    • Watermark: Image overlay on top of your video contains identifying information about the video.

Figure

  1. Preprocessor: Splits videos into smaller chunks (GOP alignment). It has 4 responsibilities.

    Figure

    • Video splitting: Video stream is split or further split into smaller Group of Pictures (GOP) alignment.
    • It split videos by GOP alignment for old clients.
    • It generates DAG based on configuration files client programmers write.
    • It stores GOPs and metadata in temporary storage in case the encoding fails, the system could use persisted data for retry operations.
  2. DAG Scheduler: Organizes tasks into sequential or parallel stages. Figure

    • It splits a DAG graph into stages of tasks and puts them in the task queue in the resource manager.
    • Stage 1: video, audio, and metadata.
    • The video file is further split into two tasks in stage 2: video encoding and thumbnail.
  3. Resource Manager: Responsible for managing the efficiency of resource allocation.It contains 3 queues and a task scheduler. Figure

    • Task queue: priority queue that contains tasks to be executed.
    • Worker queue: priority queue that contains worker utilization info.
    • Running queue: contains currently running tasks and workers running the tasks.
    • Task scheduler: picks the optimal task/worker, and instructs the chosen task worker to execute the job.
  4. Task Workers: Perform transcoding and other operations. Figure

    • Different task workers may run different tasks
  5. Temporary Storage: Stores intermediate data for retries.

    • The choice of storage system depends on factors like data type, data size, access frequency, data life span, etc.
  6. Output: Transcoded videos ready for distribution.


  1. Parallel Video Uploads: Split videos into smaller chunks for faster, resumable uploads.

    Figure

  2. Distributed Upload Centers: Use CDNs as upload hubs close to users.

  3. Parallel Processing: Decouple modules using message queues for high parallelism.

    Figure Figure

  1. Pre-Signed URLs: Restrict video uploads to authorized users.

    Figure

  2. Protect Videos:

    • DRM Systems (e.g., Apple FairPlay, Google Widevine).
    • AES Encryption.
    • Watermarking.
  1. Serve only popular videos via CDN; less popular ones from high-capacity servers.
  2. Encode on-demand for rarely accessed videos.
  3. Regionalize video distribution based on popularity.
  4. Build custom CDNs and partner with ISPs to reduce bandwidth costs.

  • Retry failed uploads, transcoding, or resource allocation tasks.
  • Stop malformed video processing and return error codes.

Study notes aligned with System Design Interview — An Insider’s Guide (Vol. 1). Source notes: liquidslr/system-design-notes.