Hello, everyone. I hope you're all enjoying this competition, which feels like a lottery game.
I've been using basic models with some tweaks to score high on the public leaderboard, and surprisingly, these models are doing well on the private leaderboard, too (17th place with an RMSE of 120).
My approach involved using the DBSCAN method to identify outliers. I then adjusted these outliers by either multiplying or dividing them by 10 to fit the main regression line better. For the predictions, I used three simple models - Extra Trees, Catboost, and LightGBM - without any special adjustments to their settings. These models worked fast, giving results in less than 20 seconds.
Initially, this method gave me an RMSE score above 400 without any manual changes, which wasn't very good. However, by manually adjusting two outliers that I found from the public leaderboard, I saw a big improvement. I changed the value of ID_PMSOXFT4FYDW, which many people discussed, to 8000. The second one, ID_BI4VNVU7JAXF, was harder to figure out, but I estimated it to be 3200. These changes helped me climb to 3rd place on the public leaderboard.
I know that these manual changes won't help on the private leaderboard. So, I tried another method to make the model work better for the entire dataset, but unfortunately, it didn't succeed. The original simple model has better performance for the private dataset.
Among my simple models, I found that Extra Trees gave the best results. But I don't think the choice of model is the main issue. After looking at other top solutions, I realized this competition's unpredictability makes it feel like a lottery. Different models and settings (or even seed selection) can lead to different outcomes. Additionally, many competitors will discover that some 'poor' results they previously submitted might actually achieve better final scores due to the unreliable nature of public scores and local cross-validation.
I've shared my basic models on GitHub at (https://github.com/cliff003/Digital-Green-Crop-Yield-Estimate-Challenge ). I hope they are helpful and explain how I did well on the public leaderboard.
It’s fascinating how your approach mirrors the unpredictability of Bat Smash—just when you think you have a winning strategy, the game throws a curveball! Your use of DBSCAN to tackle outliers is clever. Just like in bat smash, sometimes a few tweaks can make all the difference. Keep up the great work!