Primary competition visual

Inundata: Mapping Floods in South Africa

Helping South Africa
$10 000 USD
Completed (~1 year ago)
Classification
1340 joined
315 active
Starti
Nov 22, 24
Closei
Feb 16, 25
Reveali
Feb 17, 25
User avatar
marching_learning
Nostalgic Mathematics
How to break past under 0.003 ?
Help · 5 Feb 2025, 11:05 · 10

Hello guys, I hope you're enjoying this challenge. I have been scratching my heed with both neural nets and boosting models. Still I couldn't break under 0.003. I will appreciate if you mind sharing some tips.

Happy Zinding !!!

Discussion 10 answers
User avatar
crossentropy
Federal university of Technology, Akure

Same here, struggled to break under this range too regardless of the approach.

I'm here for the tips too 👍

5 Feb 2025, 11:10
Upvotes 1
User avatar
Semaka_Mathunyane
University of South Africa

Atleast you got 0.003 mine is a disaster

5 Feb 2025, 11:27
Upvotes 1

Looks like boosting models are the successful approach here

I've got 0.0027 on leaderboard with a LightGBM model averaging the predictions of a 10-fold CV, stratifiying the time series according to whether or not they contain a flood

For each day I simply used as features the day number (0-729), the precipitation value, and all the other days precipitation values (729 lags)

I'm sure with some parameter tuning and better featurization the score can improve

I'm curious if any numeric feature calculated from images can help, in all my experiments the images were of no help

6 Feb 2025, 09:47
Upvotes 7
User avatar
marching_learning
Nostalgic Mathematics

Thank you for sharing. So for a given day let say day t, you are using lags from day t-1, day t-2,....., to day 1.

I also wrap around, so for day t I use the previous t-1 precipitations and the following 730 - t ones, as well as the precipitation of the day itself obviously

Very interesting, mine also works very well with lightgbm, in general I see classifiers work better than regressors. So far I am still trying to combine lightgbm and MLPClassifier. It seems that both models work quite well.

User avatar
Zambia_Kuchalo
Typaflow Software Systems Limited

Sounds Great. Are you applying anything to the data, any tips?

User avatar
CodeJoe

I am using boosting models and I have tried placing lags as you said. I have tried winsorization, i have tried removing outliers, I have tried groupkfold, I have tried stratifiedkfolds, I did extensive feature engineering and still no significant boost. Am I missing something here?

Sorry to hear bro but the basic setup is really simple, just create lags for each day (I used cycling but also padding yields the same results) of each time series and binary classify each day, no particular preprocessing as tree models are not sensitive to data range.

As for the split, for each time series id, assign 1 if it contains a flood and 0 otherwise, then split the dataset so that both training and validation have the same percentage of time series with floods. Nothing else

User avatar
CodeJoe

Worked like magic ! I'm really grateful.