Forecasting future energy consumption is no small feat—especially when the task spans dozens of devices, sources, and 31 days into the future. For the IBM SkillsBuild Hydropower Climate Optimisation Challenge, the goal was clear: predict kilowatt-hour (kWh) consumption for different users and devices, 31 days ahead.
Belal Emad, a data science enthusiast and engineering student at Cairo University with a strong passion for solving real-world problems using machine learning.
Belal approached this challenge by viewing it through the lens of multistep time series prediction. Instead of using the full dataset as-is, he cleverly broke it down into overlapping 10-day windows: 5 days for inputs and 5 days for predictions. This structure allowed him to generate a much larger training dataset from limited raw records and train a model that could capture short-term energy usage patterns.
The first step in Belal’s pipeline was to group the data by date and energy source, summing up daily kilowatt-hour consumption. He then constructed sliding windows of 5-day sequences to use as inputs, and 5 subsequent days as labels.
Here’s how he created these sequences:
pythonCopyEditdef create_sequences(data, seq_length):
X, y = [], []
for i in range(len(data) - seq_length):
if i+seq_length +5 < len(data):
X.append(data[i:i+seq_length])
y.append(data[i+seq_length : i+seq_length +5])
return np.array(X).reshape((-1, seq_length)), np.array(y).reshape((-1, 5))
In addition to consumption patterns, Belal also included a static feature representing the consumer’s device ID—this helped the model distinguish between different usage patterns across device types.
Belal’s architecture embraced the complexity of the task with a dual-branch neural network.
One branch used an LSTM (Long Short-Term Memory) layer to learn sequential patterns in the energy data. The second branch—a simpler dense layer—processed the static consumer/device features.
The two branches were then merged and passed through a final dense layer to generate the 5-day forecast. This combination allowed the model to stabilize predictions by blending temporal trends with contextual device information.
Here’s a simplified version of the model architecture:
pythonCopyEditinput_1 = Input(shape=(seq_length, 1), name="input1")
input_2 = Input(shape=(seq_length+1,), name="input2")
# LSTM branch
x_1 = LSTM(50, activation="relu", return_sequences=True)(input_1)
x_1 = LSTM(50, activation="relu")(x_1)
x_1 = Dense(5, activation="relu")(x_1)
# Dense branch
x_2 = Dense(100, activation="relu")(input_2)
x_2 = Dense(5, activation="relu")(x_2)
# Merge and output
merged = Concatenate()([x_1, x_2])
output = Dense(5, activation="relu")(merged)
model = Model(inputs=[input_1, input_2], outputs=output)
model.compile(optimizer="adam", loss="mse")
The model was trained using early stopping to avoid overfitting, and performed well even without heavy feature engineering—proof that smart architecture can sometimes beat brute force.
Belal’s best-performing model achieved a mean squared error (MSE) of 5.22 on the validation set. It ran efficiently too—training took about 20 to 40 minutes, and loading the pre-trained weights took just 3 minutes.
Rather than trying to predict all 31 days in one go, Belal used a recursive forecasting approach. Starting with the last known 5 days of consumption for each device, the model predicted the next 5 days. These predictions were then fed back into the model to predict the next 5, and so on, until he had forecasts for 35 days (from which the first 31 were used).
pythonCopyEditdef recursive_forecast(model, initial_input, initial_input2, n_steps=35):
predictions = np.zeros((585, n_steps))
input_seq, input_seq2 = initial_input.copy(), initial_input2.copy()
for step in range(0, n_steps, 5):
next_step = model.predict([input_seq, input_seq2], verbose=0)
predictions[:, step:step+5] = next_step
input_seq = next_step
input_seq2 = np.concatenate([next_step, input_seq2[:, 5:]], axis=1)
return predictions
This method helped the model remain consistent over time and reflect changing trends as the forecast progressed.
To submit the predictions, Belal mapped them back to the unique source IDs provided in the sample submission:
pythonCopyEditsubmission = pd.read_csv('SampleSubmission.csv')
submission['source'] = submission['ID'].apply(lambda x: x.split('_', 1)[1])
for i in pd.unique(submission.source):
submission.loc[submission.source == i, 'kwh'] = predicted_values[key_to_num[i], :31]
submission = submission.drop(['source'], axis=1)
submission.to_csv('submission.csv', index=False)
Belal’s approach was efficient and clever, especially given its minimal reliance on complex features. But like any great data scientist, he’s already thinking ahead. Here are a few ideas he suggests for future improvements:
Belal’s solution is a great example of simplicity meeting strategy. It also shows how thoughtful modeling—rather than just feature engineering—can carry a solution far in a real-world forecasting challenge.
I'm Belal Emad, a data science enthusiast and engineering student at Cairo University with a strong passion for solving real-world problems using machine learning. I enjoy exploring time series forecasting, deep learning, and data-driven optimization—and applying these techniques to domains like energy, education, and climate.
I’m also active on Kaggle, where I ranked in the top 10 on the “efficiency” leaderboard for the “Predict Student Performance” challenge. There, I focused on building a highly optimized model that balanced performance and runtime—earning me strong scores in my first solo ML competition.
Outside of competitions, I’m constantly upskilling through hands-on projects and online certifications.
Congratulations @Belal_Emad ! I am now getting the feeling, I should start using deep learning for tabular competitions.
Thank you! 😁