Hi guys, trust you are all doing great. I'm quite late for this competition but wanted to share some of my findings and assumptions, especially for those who are far away from the benchmark.
1) If your score is really far from the benchmark, this means that you didn't set missing rows in the target variable to zero.
2) I don't think no 1 is a good idea in real life.
I also think zindi has missing data for the true targets of the corresponding test set and have set them to zero for the purpose of scoring, hence the high benchmark RMSE. Train set alone has about 30% missing.
3) if you round off values less than (approximately 4000) to zero, you should move slightly up the leaderboard.