Hello everyone, this is my first time here and I am happy to participate in this competition. However, I have a little trouble creating a model that works well because of unbalanced data. I want to have some ideas on how to handle this. (oversampling, downsampling, add weight in the cost function for error == 1). I thought about using anomaly detection, but it is an unsupervised algorithm so I will not use the clean dataset. What do you think ?
hi @ Seifeddine_Fezzani.
Hi Seifeddin. It's my first competition as well. My first approach was to address the issue as a spam classifier using Naive Bayes. However it does not score so well.
I'm not sure if anomaly detection will work. Which algorithm do you use for binary classification?