Primary competition visual

AI/ML for 5G-Energy Consumption Modelling by ITU AI/ML in 5G Challenge

20 000 CHF
Challenge completed ~2 years ago
Prediction
989 joined
278 active
Starti
Jul 26, 23
Closei
Oct 13, 23
Reveali
Oct 17, 23
User avatar
LROUZZ
Ecole polytechnique de tunisie
Leadeboard and future values
Platform · 9 Oct 2023, 18:09 · 24

Hello,

I've just seen a lot of ridiculous scores, so I tried using future values, and my score decreased from 1.32 without feature values to 0.98. In the final selection, I will choose the predictions without the future values, but I want to inform @nicolapiovesan to check the first 20 solutions on the leaderboard.

Best regards,

Discussion 24 answers

Good point ! But why 20 not 30 or even 50!

9 Oct 2023, 18:15
Upvotes 0

Best solution in my opinion: it is to allow usine future data and extend the deadline one month. @nicolapiovesan

9 Oct 2023, 18:23
Upvotes 0
User avatar
ff
University of Yaoundé I

What do yo mean by "it is to allow usine future data"?

User avatar
Rajat_Ranjan
Allstate

I guess, it should be based on the rules of the competition, we know there is a Gray area, but the hosts should comment on this and clear the doubt so that we can share the correct solution.

9 Oct 2023, 18:35
Upvotes 0
User avatar
Koleshjr
Multimedia university of kenya

The host has done this severally though. Their stand is do not use future values in your final submissions as the solutions will be disqualified

Hi, thanks for pointing out this problem.

As stated in many discussions, the goal of the challenge is to model how multiple instantaneous features collected in each hour affect the energy consumption in such hour, and it must be clear that using future values as input does not make sense, as in the real world such values will not be available.

To answer your question, could the current top 10 participants in the leaderboard please confirm if they are using or not future values in their solutions? @Yisakberhanu, @rafael_zimmermann, @Krishna_Priya, @NxGTR, @LROUZZ, @heyyou, @tomy4reel, @imakarov, @Koleshjr, @Hakim04

Finally, I'd like to remind that, at the end of the competition, the top participants will be required to submit a report and the code to train/test the model, which will be used to provide the final score. Solutions in which future values are taken as inputs of the model will not be considered.

11 Oct 2023, 13:44
Upvotes 0
User avatar
Koleshjr
Multimedia university of kenya

Our current score uses future values, but we won't select that since as you have already clarified Many times that they won't be considered and thank you for confirming that again

User avatar
rafael_zimmermann

Thank you for bringing up this important issue. To be fully transparent, my best score on the leaderboard does indeed involve the use of future values. However, as you clearly outlined in the competition guidelines, only models that do not use future data will be considered for final submissions and validation. The focus is truly on creating a model that is applicable in the real world, where such future data would not be available.

User avatar
Krishna_Priya

My team's current best score on LB does NOT use future value features as input to the model. As this rule was already established a month back, I stopped creating features using future values.

User avatar
Ecole polytechnique de tunisie

I'm completely confident that no model can achieve a score below 1.2 without using future data, let alone get down to 0.8. Just take a look at the feature importance plot to see for yourself.

User avatar
Koleshjr
Multimedia university of kenya

What feature engineering are they doing which we aren't , this is so demotivating haha,

User avatar
tomy4reel
Nexford University

please does this include aggregate base station features like mean, median, std....

User avatar
Krishna_Priya

If you decide to calculate aggregate features, ideally any central tendency should be calculated using values of the past. you should not just calculate the mean without filtering the data.

PS: This is my opinion, otherwise it would just be an alternate way to leak the future data.

User avatar
tomy4reel
Nexford University

I absolutely agree

User avatar
Koleshjr
Multimedia university of kenya

So @Krishna_priya your current score , the aggregate are from the previous hours ???

User avatar
Krishna_Priya

Hey @Koleshjr, For now, I cannot comment on whether I am using agg features, but yes any feature being used only has the data from the previous hours.

User avatar
Koleshjr
Multimedia university of kenya

Damn you are good👏👏 but we will get there with time.

User avatar
Krishna_Priya

All the best bro. Let's keep learning from each other. Anyway, we will see a lot of shuffling in the private leaderboard in this one. Fingers crossed, May the best approach win.

User avatar
yanteixeira

@tomy4reel you have to use .shift(1) to ensure no data leakage.

User avatar
Yisakberhanu
wachemo university

yes, i used future value but there is not much difference

User avatar
ff
University of Yaoundé I

Looking forward to seeing the best solutions and/or approaches. On this one, I'm completely lost 🙌🏿

User avatar
Koleshjr
Multimedia university of kenya

Me too @ff 😂😂 I have given up seeing people getting 0.82 with no future values whatttt!!! That's freaking impressive tbh and I don't think I can get there Even if I was added 30 more days 😂

User avatar
ff
University of Yaoundé I

😂😂 Day after tomorrow there will be a terrible shake up in the ranking!

User avatar
rafael_zimmermann

The use of future data is a problem that can be subjective if the rules aren't clear, such as issues related to aggregation or how to handle null values. It's not necessarily just about using lead functions; training on the complete dataset is also a form of using future data. It would be interesting and fair to have an objective rule to justly choose the top 10.