Launching Soon in 2025!

Design Spotify Top Hit Songs

Introduction

Key Problem Statement

In this question, we need to design a system that keeps track of popular songs, typically based on how often it is played and show to the user a list of top K, typically top 10 or 100 over time

This is a general class of problem to detect Trending items based on stream of events happening on those items

Solution Discussion

At the core of this question are two things that we need to do very efficiently - time and memory wise

  1. How do we update popularity score for each of the millions or billions of item we want to rank
  2. How do we rank and show only the top K , typically less than 100, most important items among millions or billions of items

Key Bottlenecks

  1. Scoring events typically arrive as a large stream of individual events like upvote, song played or song listened for at least 5 seconds. For a company like Spotify, these would be around 10B/day. Yotube has about 70B/day views
  2. Number of items to update could be fairly large. For example, Spotify has about 100+ Million songs. Youtube has about 5B videos
  3. Number of Top K lists can grow pretty big when option to filter TopK list by dimensions like Genre, Artist, Language and country are added. For example Billboard 100 typically supports such lists

Common situations where this problem arises

Finding Top K heavy hitters is a problem solved in many different situations with a varying range of accuracy and delay requirements.

Some examples,

  1. Listing Trending Top 100 songs or videos - for different periods of time and filters
    • Ranking is more important and approximate count is usually good enough. 100% accurate count is not critical for the functionality or user experience.
    • Support for multiple different types of Top K lists is important. Think of Top 100 in Genre Pop and English language in last month. Or Just Top 100 all time etc.
    • Delay for few seconds to minutes is acceptable for user view
    • Delay for few seconds is acceptable for internal use to determine which next trending song to be cached or promoted. Too long latency may cause poor user experience.
  2. Listing Top 100 IP address in network packets to detect DoS attack
    • Ranking and approximate count is good enough. Low latency and low memory are critical requirements. DoS detection relies on other long term signals.
    • TopK over short term - minutes to hours is typically important to track.
    • Delay for few milliseconds to seconds is acceptable but not more as it may incur cost on the hardware/business.
  3. Leaderboard in an online Game (live or offline) - say top 50 players
    • Both Ranking and correct score is important. Players do not want to see incorrect score
    • Delay may be acceptable for few minutes depending on game type.
  4. Top 10 contestants in a live show or a competition where people are voting realtime
    • Both Ranking and correct score is important. Contestants do not want to be eliminated due to incorrect vote count
    • Delay may be acceptable for few minutes.

Technical Dive deep

        System.out.println("Hello world!")
        

Introduction

This is some content

Code

            System.out.println("Hello world!")
        
Tab content 3