Just trust your private LB. When I saw the public LB score, I was so determined that I ignored my private LB. I wish I could rely on my CV. I'm excited to see what the top five did to get their scores. Congratulations to the winners.
Congratulations to the winners. I'd love to hear how those with top scores handled their cross-validation and avoided overfitting as well as feature engineering techniques used.
Congratulations to all. Thanks to my teammate @Koleshjr. It was really an amazing competition we learned a lot from it and thanks to @zindi for hosting such a competition.
Our approach is
Deal with missing values
Drop unnecessary columns
Drop outliers (rows with the same data but different categories)
Count encoding & new features
MERCHANT CLUSTERING to 10 categories
Aggregation and feature combination
Automatic clustering using KMeans from CountVectrozer
No we decomposed the aggregate features which were formed from the MERCHANT_NAME. The outputs of CountVectrizer clustered into groups using KMeans clustering.
Congratulations to the winners. I'd love to hear how those with top scores handled their cross-validation and avoided overfitting as well as feature engineering techniques used.
Congratulations to all. Thanks to my teammate @Koleshjr. It was really an amazing competition we learned a lot from it and thanks to @zindi for hosting such a competition.
Our approach is
Did you decompose all features or it's only the outputs from CountVectorizer? If you don't mind, please share the method as well
No we decomposed the aggregate features which were formed from the MERCHANT_NAME. The outputs of CountVectrizer clustered into groups using KMeans clustering.
Can you share your solution please 🙏?
Sorry but Zindi is validating our solution, till then we can't share the notebook.