First of all I am extremely sorry for uploading my solution so late, I was busy with some work, plus a lot of cleaning was required.
Secondly I decided not to post my link on Akeelah's thread, because it was crowded with comments and I thought it would be difficult for a few people to find my link. Kudos to Akeelah and everyone who has shared their solutions.
Congratulations to everyone who participated and learnt new things in this competition, its a win for everyone.
And lastly I am thankful to Zindi for organizing such a wonderful competition. Hope there are many more ZindiWeekendz to come.
Here is my solution
https://github.com/nikhilmishradevelop/zindi-winning-solutions
Congratulation, Nikhil ! Thanks for sharing ur solution.
Congratulations and Thanks for the solution
Hi Nikhil,
Congratulation for the winning. Can I ask you a favour for you and other top two solutions, if you don't mind could you comment your line of codes or just state objectives on some blocks of codes. This will help most of us to closely and easily follow up your code.
I appreciate and thank you for sharing with us.
I agree. If you can even go ahead and explain your thought process through a video, that will be much appreciated.
msamwelmollel and Ogyao, sure I will try to make my notebook more readable, and add more comments. Thank you.
i have some questions please, did you use a gridsearch firstly ??
and can you explain you feature engineering ?
I added some comments and thought process about feature engineering in repo. Please check it out. I did not use any grid search, did manual tuning of hyperparams.
ok thanks
Big thanks to you , Mishra.. Now i think i have a better understanding of your solution. If i may ask, How long did it take to train on kaggle kernel, Considering that you had over 3400 features
Hi , it took 2-3 hours run on Kaggle for 10 folds
i have some questions please, did you use a gridsearch firstly ??
and can you explain you feature engineering ?
Please why do you use train data in valid_sets with simple test data ???
Did not understand your question?
Hello @devnikmishra, in your code i noticed you did-
for i in range(1, 20): df[f'prev_target_{i}'] = df.sort_values(by='Date')[TARGET_COL].fillna(method='ffill').shift(i).sort_index() df[f'next_target_{i}'] = df.sort_values(by='Date')[TARGET_COL].fillna(method='bfill').shift(-i).sort_index()
yeah so this is to get previous and next target yeah but the test set does not have target column so how did use those features in making preditions please explain what you did here Thanks
and this also
for i in tqdm_notebook(range(1, 15)): df[f'magic_{i}'] = df.sort_values(by='Date')[TARGET_COL].shift(i).expanding().mean().fillna(method='ffill').sort_index() df[f'magic2_{i}'] = df.sort_values(by='Date')[TARGET_COL].shift(-i).expanding().mean().fillna(method='bfill').sort_index()
please i'ld like ur explanation