Primary competition visual

Zimnat Insurance Recommendation Challenge

Helping Zimbabwe
$5 000 USD
Completed (over 5 years ago)
Prediction
Collaborative Filtering
1777 joined
612 active
Starti
Jul 01, 20
Closei
Sep 13, 20
Reveali
Sep 13, 20
Train Data Formation
Notebooks · 27 Aug 2020, 12:58 · 6

Which is the best way to prepare the data?

1. Melt the products columns and make it a binary-classification problem.

2. Duplicating the rows for all the products the customer has bought and removing each product per row and making it the product the y_label. Similiar to how the organiser has prepared the test_data

Please register your inputs...Thanks in advance

Discussion 6 answers

Hi Sanjay, I have tried both, and have not had much luck. I made it a binary problem on each problem and ran a logit model and got a score of about 0.06X. I then did a multinol (like you said by splitting it into more rows) and got nearly the same score! I don't think I'm doing as well as some :) since my best score is around 0.06X. I'm amazed how some got 0.03X. I've tried different models (using some of the variables or all of the variables or a forward selection process). I'm just not having the same luck. I must be missing something.

Thanks for your comment

For me, my CV scores are better for binary-classif problem

I've got 0.04 Log Loss using a single model, here are some tips:

1. Encoding your categorical variables - For most models, they are only able to use numeric data. Beware of encoding techniques that result in large expansion in the feature space that would hurt most models (See: Curse of dimensionality). How you encode the categorical variables impacts your models.

2. Feature engineering ---- Creating additional features from the dataset.

3. Hyperparameter Tuning --- Its not always sufficent to just use the default parameters for your model, you need to optimize the parameters based on what gets you the best results on you cross validation or development set. Using K Fold cross validation here is useful, along with RandomizedSearchCV for example.

3. Bias and Variance ---- If your model is performing poorly on both test and training then your model is not complex enough and maybe use a different model or lower the regularization parameter. If you are doing well on the training set and not the test set then maybe increase regularization.

4. Imbalanced Classes ---- Some models perform poorly when we have imbalanced classes. Try oversampling or undersampling techniques to help, and use metrics to score your models that handles well for imbalanced datasets. i.e Accuracy is a poor metric if your classes are imbalanced.

5. Error Analysis ---- Split into training and test and look at the precision and recall of the different predicted products. Is your model doing better for some products and not in others. Use this as a base to decide next steps as well.

Good tips... Thanks

Please I don't know if anyone can give me clue on how to achieve this, I have been having problem with this part and Google is not giving me the best

27 Aug 2020, 15:31
Upvotes 0
User avatar
Prospect33

awesome tips @darrel

5 Sep 2020, 07:21
Upvotes 0