Sea Turtle Rescue: Error Detection Challenge
Cash and prizes worth $1,950 USD
Help Local Ocean Conservation clean their sea turtle rescues database
30 November 2018–28 April 2019 23:59
281 data scientists enrolled, 53 on the leaderboard
Unbalanced Binary Classification
published 4 Feb 2019, 13:27
edited 31 minutes later

Hello everyone, this is my first time here and I am happy to participate in this competition. However, I have a little trouble creating a model that works well because of unbalanced data. I want to have some ideas on how to handle this. (oversampling, downsampling, add weight in the cost function for error == 1). I thought about using anomaly detection, but it is an unsupervised algorithm so I will not use the clean dataset. What do you think ?

edited 19 days later

Hi Seifeddin. It's my first competition as well. My first approach was to address the issue as a spam classifier using Naive Bayes. However it does not score so well.

I'm not sure if anomaly detection will work. Which algorithm do you use for binary classification?