Primary competition visual

AI/ML for 5G-Energy Consumption Modelling by ITU AI/ML in 5G Challenge

20 000 CHF
Completed (over 2 years ago)
Prediction
1027 joined
277 active
Starti
Jul 26, 23
Closei
Oct 13, 23
Reveali
Oct 17, 23
User avatar
Koleshjr
Multimedia university of kenya
Rules Clarification:
Platform · 1 Sep 2023, 10:52 · 10

I know not many people have read the Model Integrity discussion but the organizer has just clarified that the Model MUST not take 'future' info as input just to let you guys know

https://zindi.africa/competitions/aiml-for-5g-energy-consumption-modelling/discussions/18079

Discussion 10 answers

This is the weird rule, because need to predict target from the past for some base station in test. Data absent from begin of time series and need to recover 'Energy' by the model on future data in fact.

1 Sep 2023, 11:05
Upvotes 0
User avatar
Koleshjr
Multimedia university of kenya

The problem is on runtime/ or in real world you won't be having a sneak peak to the future data

I understand data leakege problem, but as i understand the task - goal to recover missing data and predict target for new base which not present in train. If we go with mentioned rule - we should use only calendar features from time column....

User avatar
skaak
Ferra Solutions

I was wondering about this - because then it is interpolation problem and you can get a very good score. So I suppose zindi will check to make sure you do not use future values.

This makes sense of course, you want to predict at a given time without having seen the future, thanks for also clarifying it here.

1 Sep 2023, 11:20
Upvotes 0

I assume this only applies to the energy values?

The model is not intended to be used for live forecasting - the host even stated that it is not a time series problem. The goal is to be able to estimate how a particular BS configuration, especially new ones, will behave in terms of energy consumption.

Therefore, an application scenario could be that at test time (at a static point in time), a day or a week of measurements needs to be evaluated in terms of energy consumption.

Since energy consumption depends on the time of day, temporal information is obviously required. Since this is not a time series forecasting problem, it should not be a problem to use, for example, future load values as input. These would be given at the test time anyway. If I understand the scope correctly, only the energy values should be considered problematic.

2 Sep 2023, 08:38
Upvotes 2
User avatar
skaak
Ferra Solutions

Hmmmm - interesting argument. I do not agree with you entirely, and understand the rule to state that you can only look backwards ... even if you have a new configuration, you can train perhaps on forward values but once trained, the model then is not useful as in a live environment you do not have access to those values even if you may have had a set containing them after a week. But perhaps the host can clarify? @Zindi ? @Koleshjr ? Does this restriction apply to just energy or to all data? And how will this be verified eventually?

User avatar
Koleshjr
Multimedia university of kenya

Okay I'm not the best person to answer this but @nicolapiovesan

Your point would be applicable in a live environment. However the model is not meant to be used in a live Environment as the three stated objectives are more focused in accurate static prediction and generalization. Hence, I assume the model will be used to assess different Base station configurations under otherwise equal conditions. This is also why the test set does not strictly contain future samples. In fact using time information might even be important to accurately disentangle configuration effects from random time effects.

In general, I understand the first intuition that using future values is somehow wrong.

But if this would be a competition about predicting future values, then the whole train/test split and framing of the challenge would be wrong and ideally would have to be redefined. On the other hand, if it's not about seeing into the future, there is no need to prohibit using future values - it is even unclear whether this helps generalization.

I observed that using future values the CV validation score of a simple LightGBM model is ~.75. When not using the CV validation score it is ~1.15. On the leaderboard, both perform similarly. Hence, using future values leads to strong overfitting on the train set. But take this with a grain of salt, as this was only a small experiment with a simple baseline model.

Totaly agree, data were provided in test prove it. We have base station which not presented in train. For instance B_828 we have all data with timeline(cell info file) for feature generation only for predicted period, so in this case mentioned rule are not follow. So business value of this competition exactly descripted by @atschalz or goal in recovery missing data. In opposite case organizers of this competition provide "not correct" data ( if goal is timeseries problem).