🔋 AI in Focus: Please , Share your Solutions

IBM SkillsBuild Hydropower Climate Optimisation Challenge

Helping the World

$3 000 USD

Completed (over 1 year ago)

Skills you will learn

Prediction

Forecast

1236 joined

462 active

Info Data Chat Leaderboard

Start

Mar 03, 25

Apr 13, 25

Reveal

Apr 14, 25

RareGem

Please , Share your Solutions

Platform · 14 Apr 2025, 10:08 · 8

Please, those topping the private leader board can you share your solutions, we want to learn from you 🙏

Discussion 8 answers

isaacOluwafemiOg

Kwame nkrumah university of science and technology

I think they'd have to wait till the end of code review

14 Apr 2025, 10:44

Upvotes 0

RareGem

Okay. Thanks

replied to isaacOluwafemiOg14 Apr 2025, 11:28

Upvotes 0

data_style_bender

I will gladly share my solutions once all have sent their code for review

14 Apr 2025, 12:28

Upvotes 2

RareGem

Thank you data_style_bender and congratulations for the win 🏆 . Well done for your dedication and and hardworking and to the rest of the winners. You guys really tried

replied to data_style_bender14 Apr 2025, 12:34

Upvotes 0

100i

Ghana Health Service

Not topping private LB, but here's my approach.

My Soln:

I am also curious to learn about the winning ideas, especially how they approached cross validation and how they aggregated their data/labels.

Data

I took a slightly different approach from the starter notebook .

I rounded the 5-minute intervel readings to the nearest hour (made no difference?), then aggregated data by Datetime (or Date - made no difference?) and Source.

I dropped all duplicates in the full train set and only kept first/last data points, then merged with weather data. Experimented with both first and last and decided to stick with first because it gave my best public LB score.

I engineered new set of features from both weather data and full train data - cyclic time features, statistical features, lag features (dropped due to overfitting), season, wind speed, etc etc.

I experimented with 2 parallel ideas - 2 stream of experiments on data with and without capped weather variables.

The categorical features were label encoded - Season and Source

Total # of features : 69

Cross Validation Strategy

all_data['season_month_group'] = all_data['season'].astype(str)+'_' + all_data['month'].astype(str)

all_data['bins'] = pd.cut(all_data['kwh'], bins=num_bins, labels=False)

all_data ['bins'].hist()

all_data['fold'] = -1

stratify = all_data['season_month_group'].astype(str) + all_data['bins'].astype(str)

strat_kfold = StratifiedKFold(n_splits=10, random_state=42, shuffle=True)

for i, (_, val_index) in enumerate(strat_kfold.split(all_data, stratify)):

all_data.iloc[val_index, -1] = i

Modeling

Models: XGB (dropped categoricals - Source and Season), LGBM (categorical features = ['Source','season', 'month','data_user','mday', 'consumer_device']), ENSEMBLE MODEL on all features(12 models in total using random forest and extra tree regressors , 6 each where each model was built with different # of trees from 50 to 175 at 25 intervals )

Experiments

I will just describe my selected private score submission

I eventually experiemnted with only the data without capping the weather variables because all experiments that used the raw data scored well on public LB in comparison with capped data.

Final submission = 2 XGB (dropped source and season) + ENSEMBLE MODEL (0.5*RANDOM FOREST + 0.5*EXTRA TREE)

Each Model was trained on same data to generate out of fold predictions

Ensemble

Used scipy.optimize to search best weights from OOF predictions

res = scipy.optimize.minimize(min_func, [1/3]*3, method='Nelder-Mead', tol=1e-6)

ypredtest= res.x[0]*modelA['kwh'] + res.x[1]*modelB['kwh'] + res.x[2]*modelC['kwh']

forecast['kwh'] = ypredtest*0.1 (didn't play around with this parameter because I sensed the predictions were just too large perhaps due to the un-normalised weights suggested by scipy optimize)

Selected submissions

Best manual ensemble : 6.34/5.36

Scipy optimize ensemble : 6.68/5.16

Key learnings:

- I spent lot of my time thinking about cross validation and target representation so much so that I even forgot what I did to get my best LB score (Always find a way to log/keep track of good turn arounds)

- Few hours to end of comp, I had to dump every other experiment and dig my colab version history to find my best LB notebook.

- Regret for not adding more models that I have worked on in the final ensemble - Catboost, Lightgbm and NN

- Build a grounded intuition and trust it

- You never know if something works until you try it

14 Apr 2025, 18:28

Upvotes 4

RareGem

I appreciate this."You never know if something works until you try it", I love this statement. Thank you for sharing

replied to 100i14 Apr 2025, 20:56

Upvotes 1

CodeJoe

100i never disappoints. Well done big man.

replied to 100i17 Apr 2025, 21:10

Upvotes 1

100i

Ghana Health Service

Thank you CodeJoe. Congrats on your win! I learnt a lot from your notebooks. Keep doing more bro!

replied to CodeJoe18 Apr 2025, 11:39

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status