I dont know if its me or has someone noticed that from the defination of the problem, it suggests its does not require a machine learning solution. I have experimented with a rule based approach and its better that guessing. Can someone with a different opinion explain further. Some who understands it.
Exactly I don't think this needs machine learning model, rule based better than ml model. Instead of classified this they can create rule based code, I don't know what they are expecting from this competition.
Honestly I think there are no restrictions, you are free to choose whatever methods that work for you , If rule based works then use that, if ml works for you well use that
you know that there are many different factors that makes increase and decrease data rate so how can you label data. i don't believe this is the correct way, and it is a useless effort.
This is exactly the reason for which a ML model is useful
How do you label the data
Hi Antonio,
If simple if else statements are used to label the data. We don't need the ML models here. Simple Rule based code will do.
In order to use ML here the labeling should be based on many factors, Not simply IF Else statements on the values of Two columns.
Please provide use the labeled dataset to apply ML.
What people got from your presentation is that you label a node a 1 if there is a decrease in data rate whenever a fault occurs. Thats why I personally think this is a hack, or the problem requirement is not properly defined.
Hi, this is the labeling processing in the train set. The data_rate when the fault accurs is not known in the validation data. How do you label the fault impact using IF Else in this case?
This is what I was not getting. Now I understand. But the validation data is not available. I had it when I first downloaded the data, but its now not present.
Can you please tell me which files you can download currently? There should be 1932 input files (where the data_rate when the fault accurs is not known) and a sample submission file (with 1932 values).
We can currently download 7256 CSV training files (`imgs/2023050915314323740.rar`) from ITU platform and Sample submission file from the Zindi.
We are not able to access the test data.
And regarding the data_rate values....there are known in the test data which was previously shared.
Here is the test sample:
ID access_success_rate resource_utilition_rate TA bler cqi mcs data_rate fault_duration relation 0 B0017-25_24 99.357688 84.004 2.923368 14.209819 5.582824 5.667775 1.175289 301 0.654162 1 B0017-25_25 99.642289 92.242 2.877206 15.083843 5.628569 5.051611 0.966620 145 0.654162 2 B0017-25_26 99.546228 80.028 3.151677 13.437244 5.226969 4.896700 1.561278 250 0.654162 3 B0017-25_27 100.000000 8.616 3.728730 8.817188 5.947785 7.884572 10.963935 1971 0.654162 4 B0017-32_1 99.597616 70.445 2.732496 12.644968 6.445368 7.136024 4.471131 3461 0.654162 5 B0017-32_10 99.781591 60.941 2.727843 12.841164 6.161731 6.602028 3.161234 64 0.654162 6 B0017-32_11 99.389205 74.666 2.750890 13.120919 6.302626 6.807933 2.339437 93 0.654162 7 B0017-32_12 99.773719 62.808 2.721176 12.314137 6.191431 7.228925 2.901728 43 0.654162 8 B0017-32_13 99.832905 64.025 2.859035 12.810227 6.218475 6.776292 4.002578 37 0.654162 9 B0017-32_14 99.709197 79.159 2.648477 13.542462 6.265120 6.653105 3.521976 76 0.654162Sorry for the format, I don't know how to upload image here
In the files we can download the data_rate and when the fault occurs are all
.
The test set is missing. Or do you have it? I was only asking this because I thought the data was complete. I now understand the problem.
Oh sorry about that, Yes The test set had been posted earlier , I don't know why its missing @amyflorida626 can you kindly solve this?
But you have been scored on the lb, what have you used??
I was experimenting with the data to better understand what I was missing. I just used the sample submission files. That is how I noticed it was missing.
Hello everyone, the required data is now available. Antonio
Thanks and correct me if I'm wrong but the folder given to us provides the different test csv files I guess (without the row for which the fault has occured). Based on that, I'm not really sure of what to predict anymore. Can you please clarify this? Thanks :)
For each of these files, you need to predict whether the data_rate in the row where the fault has occured or not is larger or smaller than the data_rate measured just before the fault occurs. One file, one output, as you can see in the SampleSubmission.
Ok makes more sense, thanks
Hello @AntonioDeDomenico, amyflorida626
I have an ask!
-Can I choose which validation set to use at my discretion? If I decide to stick with the original validation set and disregard the new one, would that pose any issues?
-Alternatively, if I opt to utilize both validation sets, would there be any problems associated with that approach as well
Regards
Hello Antonio,
Regarding the training files....Do we label them at observation level or file level (Because we will be predicting one value per file in the validation)?
- If it were to label at obsevation level....then the validation needs KPI values (EXCEPT data_rate) when the fault occurs to make prediction.
- If it were to label at file level....how to consider multiple 1 state in a single file.
Thanks
Hi Julius, the first validation dataset is just a pre-processed version of the new one, which I transfer to Amy by mistake. Using both of them would just add redundant information. The new one is in line with the training data, includes more info and your solution will be evaluated based on It, we expect 1932 prediction, one per file in the dataset.
Hi Rakesh, I hope i understand well your question. In the training set you need to label the datarate change when the fault occurs, max 1label per file, comparing the datarate in the row prior to the fault and the datarate measured when the fault appears.
Noted and thanks for the prompt reply @AntonioDeDomenico?
Hi Julius, if i give you that information, you do not need a model, you would just compare the two measured datarates to provide me the output i asked. Is it clear?
Yea got it now. Thanks alot
One label per file. Got it
Sorry I completly forgot to ask...What does relation<0 mean. As per my understanding it's the distance to other NE where the fault occured.
Thank you.