Everytime I try to submit I get the following error:
Missing entries for IDs Order_No_768, Order_No_15332, Order_No_21373, Order_No_14573, Order_No_18436 and more
However I checked my submission csv and it contains these IDs, what am I doing wrong?
Do this : your_submit_file.to_csv('Path_To_File',index=False)
+1 For Blenz suggestion. If you are having trouble I'd be more than happy to connect with you via Google Hangouts and see if we can hash out the error together? Are you using Python or R?
As said by Blenz, make sure you set the index=False when saving your file to .csv
Thanks to all of you for responding timeously, one of the guys on here shared his github repo and had this really neat function that helped out a lot.The problem is solved. @jonathansp_datathonsa I am using python3.Don't know if I should start another thread for this or not but here goes: Another issue I'm having lies in outlier removal.I usually start from simple models and scale up, so I began my modeling by using a Linear Regressor and as you know, this model is not so robust to outliers.So I decided to remove outliers(anything that's 3 standard deviations away from the mean) by subsetting the data to exclude anything that met that criterea and keep everything that doesn't(for both the train and test).Automatically this means I will lose some ID's and that will conflict with the expected submission file.How does one navigate around this problem?
you don't remove ID's from test. ( You could remove them , and predict for the rest using your model but then you're gonna have to assign values to the removed ID's or the submit won't work either manually or using some other model if you prefer ).
Also a tip from experience working on this data, the outliers ( in the target ) exist both in train and test more or less equally, so training on "normal" samples will result in a poor performance ( this has been pointed out in earlier discussions ), since you'll be missing the mark on many outliers in the test set.
Keep the outliers and try to build a robust model? is the best i can tell you.
@Blenz, thanks for the heads up, I will be moving on to more robust modelling.Any recommendations interms of models to use??
Xgboost! :D i have a model that is looking spicy. I think XGBoost with grid search will be fire
@jonathanasp_datathonsa Whoa! now wondering about the specs of your machine, what's your RAM look like?What's your CPU generation and more!XGboost is very computationally expensive, not to mention it's GridSearchCV, you must have a beast!Do you think a Kaggle Kernel will suffice?
Haha am fortunate yes :D but if you're looking for free computational power and have access to internet hit up Google Co Lab
Wow that tool had completely slipped my mind, thanks for the nudge and happy modelling!Let's collaborate sometime soon!