Primary competition visual

Fault Impact Analysis: Towards Service-Oriented Network Operation & Maintenance by ITU

8 000 CHF
Completed (over 2 years ago)
Classification
273 joined
89 active
Starti
Jul 26, 23
Closei
Aug 18, 23
Reveali
Aug 18, 23
User avatar
Rakesh_Jarupula
National Institute of Technology Silchar
Challenges with Competition design.
Help · 6 Aug 2023, 10:53 · 13

Dear @AntonioDeDomenico ,

I have a genuine concern regarding the test data. We are suppose to predict the impact of fault on the data rate. But I have few concerns regarding the same:

1. All the KPI values for that row are Nan. We may use different techniques to fill those values (ex: mean, ffill etc). But we are not providing the model with KPI values specific to that row. In addition, If we impute those rows using mean for example....we are implicitly saying that there is NO impact on the specific KPI when the fault occured as there are many rows where fault < 0.

2. There are some files where the sample period (1 Hr) is not uniform. Ex: Test - 'B0017-32_1.csv.csv'. There is approx. 9 hr gap between Non-fault and fault instance. Which again suggest we can't simply use FFill or Mean as circumstances might have changed.

I think due to the above issues I am currently getting different f1_score values on the local validation and leaderborad validation (CV : 0.8 and LB : 0.4 - 0.6). Without addressing the Nan values I think the trained model may not be useful.

Possible solution:

Keep all the KPI values in the test set and REMOVE the data_rate column (It also helps in addressing any leaks). This way we know the characteristic of the NE when the fault occured and will be able to generalize well.

I may misundestood the objective. Please correct me if I am wrong.

Thank you for your patience.

Discussion 13 answers
User avatar
Koleshjr
Multimedia university of kenya

I will give my thoughts:

In the data page they say:

Input:

  1. Data rate and other features collected before the fault occurs, where fault duration is 0;
  2. Fault duration and relation in the first hour during which the fault appears, i.e., the fault duration is larger than 0;

Output:

  1. Status of data rate in the first hour during which the fault appears, i.e., the fault duration is larger than 0.

So based on the information in Input 1. The kpis we are supposed to use are not when the fault occurred but before it occurred

Or I am the one interpreting this wrong?

6 Aug 2023, 11:00
Upvotes 1
User avatar
Rakesh_Jarupula
National Institute of Technology Silchar

1. How do we take these values when we have non-uniformity when sampling. Test - 'B0017-32_1.csv.csv', There is a 9 Hr gap.

2. The data_rate will be same even during the fault in the test set then, Nothing left to predict.

User avatar
Koleshjr
Multimedia university of kenya

Now thats a problem , I also don't know how to handle that case

User avatar
yanteixeira

@Koleshjr The information on the data page is accurate for the old validation test. However, I don't think it is still applicable to the current one.

You are completely right.

Dear Rakesh, this a real world problem and you are working on real network data. This means that

1) I am not giving you data measured when the fault appears. This is a prediction problem. Then, we do not what is going to happen in advance.

2) As we are using real data, there may be missing information. Then, it is up to you to try to infer this, e.g., using data from the same file or other files in the test set.

6 Aug 2023, 16:17
Upvotes 2
User avatar
Rakesh_Jarupula
National Institute of Technology Silchar

Thank you for your response.

User avatar
Rakesh_Jarupula
National Institute of Technology Silchar

Hey @AntonioDeDomenico,

I have a quick question, How do we label the files that have fault > 0 for all rows. Ex: pd.read_csv(os.path.join(FILES_DIR, 'Train', 'B0652-23_1.csv.csv')). Simply `0` ?

Thanks.

Hi Rakesh, you can try to filter our any file like this in the training set

User avatar
Rakesh_Jarupula
National Institute of Technology Silchar

You mean simply exclude them ?

Yes, you should try, as they are not related to the target of the challenge. However, you can not filter them out in case any file like this appears in the test set :(

User avatar
Rakesh_Jarupula
National Institute of Technology Silchar

Ofcourse 👍