Primary competition visual

IBM SkillsBuild Hydropower Climate Optimisation Challenge

Helping the World
$3 000 USD
Completed (12 months ago)
Prediction
Forecast
1231 joined
466 active
Starti
Mar 03, 25
Closei
Apr 13, 25
Reveali
Apr 14, 25
User avatar
skaak
Ferra Solutions
Struggle with kwh - how to submit?
Help · 12 Apr 2025, 10:45 · 14

I'm really struggling ... I either get a great score or a very bad one ... it lead me to this ... how to aggregate ... what is it exactly that we are predicting and how do we aggregate to it?

The column is labelled 'kwh' so it seems we need to convert to a 'per hour' unit, presumably the average kwh produced during the day. This makes sense, as 'kwh' really is an hourly unit. Almost like reporting what our average speed was in km/h for the day.

However, if I look at the starter, the kwh is aggregated using sum!?!?!?! Thus we need to report how far we travelled for the day rather than how fast we travelled if I continue to use the speed analogy. So we need to report total production for the day, not the kw*h* really, more the kw for the day?

But then, the data file we're given ticks over every 5 minutes, and it is labelled kwh. So presumably we should average that value per hour and then sum those averages to get the value we need to submit?

Power is measured in kwh - this also confuses me. You typically report the total kw that a given usage would consume in an h, so kwh right. But this is just a mess ... colums labelled as kwh but actually, I guess, meaning kw per day ...

Apologies, yes, I am really frustrated, but can somebody please shed some light on this for me. I will really really appreciate it.

Discussion 14 answers
User avatar
the_specialist
Optinum Solutions Pty

Hi brother,

I see your point but I will advise that you dont over-think it.

Use the aggregated sum as shown in the starter workbook. So, you will have total kwh per consumer_device_x_data_user_y per day. Build your models around that.

12 Apr 2025, 11:06
Upvotes 2
User avatar
skaak
Ferra Solutions

Thanks special ... good point actually ... one I guess I should have figured out myself, but I've worked so hard on all the aggregations and conversions to keep it kwh but that just destroys my score.

Wow you're doing well. <6 score! You using matlab?

User avatar
the_specialist
Optinum Solutions Pty

Matlab is an excellent tool but Zindi only allows open source tools.

For this competition, I am using R. I used R because of some statistical packages that are readily available BUT the trick to a good score is the post-processing.

kWh is not a ratio as in km/h, it is a measure of total energy you get when you multiply power (units of energy per time, 1 watt = 1 joule per second) by the time during which that power was sustained. One kWh is the same as 1000 Joules per second (1 kW) times 1 hour. You essentially "cancel" the time units to get back an energy unit. Confusing I know, but the good part is the kWh of a day is simply the sum of all kWh registered across that day, no need for fancy aggregations. If you want to work with voltage, current and power factor, then you need to work that out, but not if you're just predicting kWh directly.

12 Apr 2025, 12:50
Upvotes 1
User avatar
skaak
Ferra Solutions

Slow down, you're breaking my brain, just when I had the solution ...

I was thinking of an appliance, that eg has a 2000W rating. That means if it is on for an hour, it will use 2kwh right? Same with a metered connection. If it shows 1000W usage right now, and sustains that for an hour, you'd use 1 unit. So I assumed the kwh in the 5m intervals was what you had to sustain to get that amount after an hour.

See - I'm starting to go in circles already ... bleh, just going to follow special's advice and use sum to aggregate per day and then predict that. But that would be same as using `mean` to aggregate 5m intervals and then load your final prediction x 12 x 24 into sub. x 12 to make it hour and x 24 to make it day. My brain is hurting again ... just gonna take sum and hope for the best ...

User avatar
skaak
Ferra Solutions

lol ... you know, this feels just like those ... uhhhhh ... robust discussions I (sometimes) have with my wife. I'm not sure if we agree or disagree, I'm so confused and we are arguing and I'm getting deeper into trouble, but I really can't even expain why ...

User avatar
skaak
Ferra Solutions

special is like the pastor ... or councellor ... just saying "Just use sum" and I'm like: Yessssss!!!! Finally!!!! Problem solved, let's get on with our lives ....

User avatar
skaak
Ferra Solutions

Back to the road, how on earth did we get so lost in the woods here?

You're correct up until the assumption about the 5 minute intervals. The unit itself is confusing, lemme try to give you an example: imagine one of the rows of the dataset has a value of 1 kWh for one of the 5 minute intervals. That means the amount of energy consumed during that period was the same as if you had a 1kW appliance turned on for 1 full hour.

User avatar
skaak
Ferra Solutions

Ok honey, you win. I was wrong ...

Jokes aside - thanks. I really do appreciate it and ok, that is starting to make sense.

So that is why I have to sum, all the 5m intervals in an hour sum into the total usage for that hour and then all the hours sum into the usage for the day. Then yes, the unit is very confusing, especially the *h* at the end.

User avatar
the_specialist
Optinum Solutions Pty

Lol. Life is already complicated, no need to make it more complex.

User avatar
Knowledge_Seeker101
Freelance

🤣🤣🤣 @skaak do you do comedy shows

User avatar
skaak
Ferra Solutions

Gee KS, my score is a joke ... I'm now using sum and if anything my score is worse.

At least you are doing really well - hope you can get into the top 10 when it all ends.

User avatar
Knowledge_Seeker101
Freelance

Post processing did it for me , but you know with post processing you might end up in 100s in private board,