Design Spotify Top Hit Songs
Introduction
Key Problem Statement
In this question, we need to design a system that keeps track of popular songs, typically based on how often it is played and show to the user a list of top K, typically top 10 or 100 over time
This is a general class of problem to detect Trending items based on stream of events happening on those items
Solution Discussion
At the core of this question are two things that we need to do very efficiently - time and memory wise
- How do we update popularity score for each of the millions or billions of item we want to rank
- How do we rank and show only the top K , typically less than 100, most important items among millions or billions of items
Key Bottlenecks
- Scoring events typically arrive as a large stream of individual events like upvote, song played or song listened for at least 5 seconds. For a company like Spotify, these would be around 10B/day. Yotube has about 70B/day views
- Number of items to update could be fairly large. For example, Spotify has about 100+ Million songs. Youtube has about 5B videos
- Number of Top K lists can grow pretty big when option to filter TopK list by dimensions like Genre, Artist, Language and country are added. For example Billboard 100 typically supports such lists
Common situations where this problem arises
Finding Top K heavy hitters is a problem solved in many different situations with a varying range of accuracy and delay requirements.
Some examples,
- Listing Trending Top 100 songs or videos - for different periods of time and filters
- Ranking is more important and approximate count is usually good enough. 100% accurate count is not critical for the functionality or user experience.
- Support for multiple different types of Top K lists is important. Think of Top 100 in Genre Pop and English language in last month. Or Just Top 100 all time etc.
- Delay for few seconds to minutes is acceptable for user view
- Delay for few seconds is acceptable for internal use to determine which next trending song to be cached or promoted. Too long latency may cause poor user experience.
- Listing Top 100 IP address in network packets to detect DoS attack
- Ranking and approximate count is good enough. Low latency and low memory are critical requirements. DoS detection relies on other long term signals.
- TopK over short term - minutes to hours is typically important to track.
- Delay for few milliseconds to seconds is acceptable but not more as it may incur cost on the hardware/business.
- Leaderboard in an online Game (live or offline) - say top 50 players
- Both Ranking and correct score is important. Players do not want to see incorrect score
- Delay may be acceptable for few minutes depending on game type.
- Top 10 contestants in a live show or a competition where people are voting realtime
- Both Ranking and correct score is important. Contestants do not want to be eliminated due to incorrect vote count
- Delay may be acceptable for few minutes.
Technical Dive deep
System.out.println("Hello world!")
Introduction
This is some content
Code
System.out.println("Hello world!")
Tab content 3