I followed all the required steps and I did sign-up @ "challenge.aiforgood.itu" afterwish I created a team and finally downloaded the dataset. However, I have few questions:
1) Where is the test set?
2) How did you guys submit giving the fact that some information here is vague
3) Any starter notebook that can direct/help us. Thank you.
I agree. The problem definition is quite different.
1) You can download the submission file and test set from Zindi in the data section
2) yeah... To be honest, the context is very well written, but the actual ML problem and the description of the data are a little confusing. It's also confusing that the validation test changed in the middle of the competition, and the text information wasn't updated. I'm pretty sure a lot of people here are working with both test sets (the old and the new one). Zindi should state if both are allowed or not, and if yes, the old one should be made available for everyone again
3) What you need to know is that the "target" that you need to predict is not given in the training set. You will need to engineer it.
Guys, I would never give you the target KPI when the fault occurs in the test set. Then, you would not require any model to predict the target label.
to get the test set you should go to the link provided create an account , join the comp and you'll find 'download data'
Okay. thank you
I started the challenge the day before yesterday. And everything was clear to me thanks to the "discussion" section. Read all the threads you will find answers there.
This is not a problem like the others because we must pay particular attention to the variables we use.
In the test data, you should know that at the last timestep, we have no record and we must be able to predict the response variable which is noted "target" (you can also predict the "data_rate" variable then deduce the "target" variable. It all depends on your approach).
In the training data, if you use certain variables you will have a perfect score. it is necessary to eliminate/handle with precaution certain variables (those with which one can predict "date_rate" directly). Train your model such that it can't see the last timestep of each NE because when inferring, the model won't see the last timestep of the NEs (reason why all values are NaN at the last timestep of the NEs).
In my opinion, here are the 2 most important discussions of the discussion section (but remember to read all the discussions)
https://zindi.africa/competitions/fault-impact-analysis-towards-service-oriented-network-operation-maintenance/discussions/17877
https://zindi.africa/competitions/fault-impact-analysis-towards-service-oriented-network-operation-maintenance/discussions/17921
Thank you @ff for the clearification.