The data during the faults is not reported as you would not need a model but you could just calculate the data rate trends comapring the rate before and during the fault. The previously reported validation data is a processed version of the currently available, where still you would not have the data measured during the fault but just the data collected before the fault + (fault duration and the relation between the node where the fault occurs and the where the rate is measured)
Hi, Amy.
The test data is not there. Now, there is only SampleSubmission CSV
And now, there is no test data. or am I missing something?
Could someone please tell me where I can find the data? I have already followed all the steps to register on this website :https://challenge.aiforgood.itu.int/match/matchitem/78, but I am unable to locate the data.
Could you please tell us where the Test Data is ?
Hello, I am checking whether there is a problem with data availability. Sorry for this delay.
Antonio
Any timeline on when the test data will be uploaded ?
Hello, the validation is now available.
@amyflorida626
The data changed or is the same as before?
Some reason why the team deleted the validation data?
I would recommend downloading the new data in case one or two features changed.
@amyflorida626, many of the new validation data provided has missing "access_success_rate" value.
To be specific, 572 files have the access_success_rate value as all NaN. e.g. (B0017-25_27.csv.csv)
This was not the case for the old validation data provided. Is this intentional?
The data during the faults is not reported as you would not need a model but you could just calculate the data rate trends comapring the rate before and during the fault. The previously reported validation data is a processed version of the currently available, where still you would not have the data measured during the fault but just the data collected before the fault + (fault duration and the relation between the node where the fault occurs and the where the rate is measured)
I am refering to the access_success_rate for the data collected before the fault.
I see, in this case, handling anomalies and missing values is part of the challenge
In the previously provided validation data. The values are provided for the instance just before the fault occurs. This is now missing for some files.
I wish to have an option for posting figures. I would have added some pictorial example of our observation
You can point me out the specific file. In general, as I said, managing missing data is part of the challenge.
Examples include
'B0017-25_27.csv.csv'
'B0017-32_16.csv.csv'
'B0017-33_16.csv.csv'
For instance, B0017-25_27 below
compared to the old processed validation data
One can see clearly that all the access_success_rate values are NaN even before fault in the new validation data for this particular ID
My understanding is that these lines with NaN values are your target. You want to know whether the data rate goes up or down.
No, the targets are actually not any of the provided columns.
For the lines before the last, we have no fault..... so, no NaN.
For the last line (when there is fault), all other columns are intentionally turned to NaN except the fault_duration and relation columns.
As said, it is part of the job, you can try to infer the data or neglect it.