Hulkshare Recommendation Algorithm Challenge
Can you predict if a song on a streaming platform is liked by looking at viewing patterns?
Prize
$7 500
Time
Ended 2 months ago
Participants
70 active ยท 477 enrolled
Advanced
About

The data for this competition is collected listening patterns from songs listened to by different people.

The objective of this challenge is to create a machine learning model that predicts whether the song has a high number of likes to views (listens) since the song was uploaded on the platform. The number of likes to views is a good proxy for whether the song will be enjoyed or not.

The frames given in this competition are from the most recent 6 months.

Some important things to note: When a user logs onto the platform, they enter a session. In this session the user can listen to different songs and they can listen to the same songs multiple times in the same way or differently.

The data provided to you is the duration of the song, split into 1000 frames and the timeline of the user listening to the song, or going from frame to frame. In addition to this, you are given the number of times the song was listened to in the last 6 months, along with a way to merge this information with the frames dataset.

From this data you can see if the user listened to the song linearly, if they paused, fast forwarded or rewound. This could all be useful information to determine if the song has a high likes to view ratio.

Files available for download:

  • Train.csv - contains the target. This is the dataset that you will use to train your model.
  • Test.csv- resembles Train.csv but without the target-related columns. This is the dataset on which you will apply your model to.
  • SampleSubmission.csv - shows the submission format for this competition, with the ‘Vid’ column mirroring that of Test.csv and the ‘target’ column containing your predictions. The order of the rows does not matter, but the names of the ‘Vid’ must be correct.
  • Frames.zip - this file contains the viewing pattern for each session.

Variable Definitions

Information in Test and Train

  • Vid - A unique ID for each song listened to in the last 6 months. Note: that these songs are only a subset of all the songs listened to in the last 6 months.
  • Session_id - Unique ID of a user each time they log into the platform. During this time they can listen to one song multiple times and they can listen to different songs. For some songs you will notice the same session_id but different session_duration, you could also find the same session_id but for different songs.
  • Session_duration - How long a user listened to a song. The session duration is in seconds.
  • Num_sessions - The number of times the song was listened to in the last 6 months. Note: One user can listen to the same song multiple times, each time they listen to the song it is counted as a new session.
  • Session_ids_and_duration - A list of dictionaries containing each session_id and its duration.

You need to use Vid and Session_ids_and_duration to merge the Frames files with Test and Train.

Frames Files

  • frame - These are the frames of a song calculated from how long a song is without being fast forwarded, paused or rewinded. Each song has exactly 1000 frames. If the difference between the frames is linear, this means the user is listening to the song normally, without fast forwarding or slowing down. If there is no difference between the frames, this means the user has paused the song. If the difference between the frames is not linear, this means the user is either fast forwarding or going backwards.
  • VidTime - is the current session_duration in seconds. Changes between 0 and session_duration. Each line represents information about 1/1000 of session_duration. If the difference between the line is linear, this means the user is listening to the song normally, without fast forwarding or slowing down. If there is no difference between the lines, this means the user has paused the song. If the difference between the lines is not linear, this means the user is either fast forwarding or going backwards.
  • TimelineRatio - is vidtime / video duration
  • Target- The target is a ratio of likes to the number of views per song. This is the number of likes and number of views the song has had since it was uploaded. Some songs were uploaded years ago.

The outcome of this challenge is to determine if a song was enjoyed or not, you will predict the ratio of likes to views.

Files
Description
Files