Primary competition visual

IBM SkillsBuild Hydropower Climate Optimisation Challenge

Helping the World
$3 000 USD
Completed (12 months ago)
Prediction
Forecast
1231 joined
466 active
Starti
Mar 03, 25
Closei
Apr 13, 25
Reveali
Apr 14, 25
User avatar
RareGem
Please , Share your Solutions
Platform · 14 Apr 2025, 10:08 · 8

Please, those topping the private leader board can you share your solutions, we want to learn from you 🙏

Discussion 8 answers
User avatar
isaacOluwafemiOg
Kwame nkrumah university of science and technology

I think they'd have to wait till the end of code review

14 Apr 2025, 10:44
Upvotes 0
User avatar
RareGem

Okay. Thanks

User avatar
data_style_bender

I will gladly share my solutions once all have sent their code for review

14 Apr 2025, 12:28
Upvotes 2
User avatar
RareGem

Thank you data_style_bender and congratulations for the win 🏆 . Well done for your dedication and and hardworking and to the rest of the winners. You guys really tried

User avatar
100i
Ghana Health Service

Not topping private LB, but here's my approach.

My Soln:

I am also curious to learn about the winning ideas, especially how they approached cross validation and how they aggregated their data/labels.

Data

I took a slightly different approach from the starter notebook .

I rounded the 5-minute intervel readings to the nearest hour (made no difference?), then aggregated data by Datetime (or Date - made no difference?) and Source.

I dropped all duplicates in the full train set and only kept first/last data points, then merged with weather data. Experimented with both first and last and decided to stick with first because it gave my best public LB score.

I engineered new set of features from both weather data and full train data - cyclic time features, statistical features, lag features (dropped due to overfitting), season, wind speed, etc etc.

I experimented with 2 parallel ideas - 2 stream of experiments on data with and without capped weather variables.

The categorical features were label encoded - Season and Source

Total # of features : 69

Cross Validation Strategy

all_data['season_month_group'] = all_data['season'].astype(str)+'_' + all_data['month'].astype(str)

all_data['bins'] = pd.cut(all_data['kwh'], bins=num_bins, labels=False)

all_data ['bins'].hist()

all_data['fold'] = -1

stratify = all_data['season_month_group'].astype(str) + all_data['bins'].astype(str)

strat_kfold = StratifiedKFold(n_splits=10, random_state=42, shuffle=True)

for i, (_, val_index) in enumerate(strat_kfold.split(all_data, stratify)):

all_data.iloc[val_index, -1] = i

Modeling

Models: XGB (dropped categoricals - Source and Season), LGBM (categorical features = ['Source','season', 'month','data_user','mday', 'consumer_device']), ENSEMBLE MODEL on all features(12 models in total using random forest and extra tree regressors , 6 each where each model was built with different # of trees from 50 to 175 at 25 intervals )

Experiments

I will just describe my selected private score submission

I eventually experiemnted with only the data without capping the weather variables because all experiments that used the raw data scored well on public LB in comparison with capped data.

Final submission = 2 XGB (dropped source and season) + ENSEMBLE MODEL (0.5*RANDOM FOREST + 0.5*EXTRA TREE)

Each Model was trained on same data to generate out of fold predictions

Ensemble

Used scipy.optimize to search best weights from OOF predictions

res = scipy.optimize.minimize(min_func, [1/3]*3, method='Nelder-Mead', tol=1e-6)

ypredtest= res.x[0]*modelA['kwh'] + res.x[1]*modelB['kwh'] + res.x[2]*modelC['kwh']

forecast['kwh'] = ypredtest*0.1 (didn't play around with this parameter because I sensed the predictions were just too large perhaps due to the un-normalised weights suggested by scipy optimize)

Selected submissions

Best manual ensemble : 6.34/5.36

Scipy optimize ensemble : 6.68/5.16

Key learnings:

- I spent lot of my time thinking about cross validation and target representation so much so that I even forgot what I did to get my best LB score (Always find a way to log/keep track of good turn arounds)

- Few hours to end of comp, I had to dump every other experiment and dig my colab version history to find my best LB notebook.

- Regret for not adding more models that I have worked on in the final ensemble - Catboost, Lightgbm and NN

- Build a grounded intuition and trust it

- You never know if something works until you try it

14 Apr 2025, 18:28
Upvotes 4
User avatar
RareGem

I appreciate this."You never know if something works until you try it", I love this statement. Thank you for sharing

User avatar
CodeJoe

100i never disappoints. Well done big man.

User avatar
100i
Ghana Health Service

Thank you CodeJoe. Congrats on your win! I learnt a lot from your notebooks. Keep doing more bro!