Happy New Year everyone,
For those who may be struggling to get started, I am sharing a Kaggle-based notebook that you can copy, edit, and run end-to-end to generate a submission that beats the baseline on the public leaderboard, with a runtime of under 10 minutes.
Notebook: https://www.kaggle.com/code/juliusmwangi/ey-urban-heat-island-challenge
I’m currently facing a large performance gap compared to top competitors (PL scores in the 0.8–0.9 range), while my models are oscillating around PL ≈ 0.39, despite a CV ≈ 0.55, which suggests domain shifts.
I’d appreciate any pointers on improving generalization e.g. cv shcheme that worked, features that are really important...Any insights would be greatly appreciated.
All the best to everyone
Thanks @Juliuss for sharing !
Of course there is a domain shift, because the training data are from South America countries (Chile, Brazil) while the test data comes from Sierre Leone (Africa). We know that it is hotter in Africa, that's where the domain shift lies for sure and also in the features from Sentinel-2. I was able to address it by some things that I can't share at this stage of the challenge. For instance, once I used it I had a CV=0.62 and LB = 0.64.
my guy! always doing marvelous. thanks for the insights.
check your inbox man