🤖 This Week on Zindi: Challenges with Competition de...

Fault Impact Analysis: Towards Service-Oriented Network Operation & Maintenance by ITU

8 000 CHF

Completed (almost 3 years ago)

Skills you will learn

Classification

277 joined

88 active

Info Data Chat Leaderboard

Start

Jul 26, 23

Aug 18, 23

Reveal

Aug 18, 23

Rakesh_Jarupula

National Institute of Technology Silchar

Challenges with Competition design.

Help · 6 Aug 2023, 10:53 · 13

Dear @AntonioDeDomenico ,

I have a genuine concern regarding the test data. We are suppose to predict the impact of fault on the data rate. But I have few concerns regarding the same:

1. All the KPI values for that row are Nan. We may use different techniques to fill those values (ex: mean, ffill etc). But we are not providing the model with KPI values specific to that row. In addition, If we impute those rows using mean for example....we are implicitly saying that there is NO impact on the specific KPI when the fault occured as there are many rows where fault < 0.

2. There are some files where the sample period (1 Hr) is not uniform. Ex: Test - 'B0017-32_1.csv.csv'. There is approx. 9 hr gap between Non-fault and fault instance. Which again suggest we can't simply use FFill or Mean as circumstances might have changed.

I think due to the above issues I am currently getting different f1_score values on the local validation and leaderborad validation (CV : 0.8 and LB : 0.4 - 0.6). Without addressing the Nan values I think the trained model may not be useful.

Possible solution:

Keep all the KPI values in the test set and REMOVE the data_rate column (It also helps in addressing any leaks). This way we know the characteristic of the NE when the fault occured and will be able to generalize well.

I may misundestood the objective. Please correct me if I am wrong.

Thank you for your patience.

Discussion 13 answers

Koleshjr

Multimedia university of kenya

I will give my thoughts:

In the data page they say:

Input:

Data rate and other features collected before the fault occurs, where fault duration is 0;
Fault duration and relation in the first hour during which the fault appears, i.e., the fault duration is larger than 0;

Output:

Status of data rate in the first hour during which the fault appears, i.e., the fault duration is larger than 0.

So based on the information in Input 1. The kpis we are supposed to use are not when the fault occurred but before it occurred

Or I am the one interpreting this wrong?

6 Aug 2023, 11:00

Upvotes 1

Rakesh_Jarupula

National Institute of Technology Silchar

1. How do we take these values when we have non-uniformity when sampling. Test - 'B0017-32_1.csv.csv', There is a 9 Hr gap.

2. The data_rate will be same even during the fault in the test set then, Nothing left to predict.

replied to Koleshjr6 Aug 2023, 11:14

Upvotes 0

Koleshjr

Multimedia university of kenya

Now thats a problem , I also don't know how to handle that case

replied to Rakesh_Jarupula6 Aug 2023, 11:18

Upvotes 0

yanteixeira

@Koleshjr The information on the data page is accurate for the old validation test. However, I don't think it is still applicable to the current one.

replied to Koleshjr6 Aug 2023, 13:58

Upvotes 1

AntonioDeDomenico

You are completely right.

replied to Koleshjr6 Aug 2023, 16:02

Upvotes 0

AntonioDeDomenico

which information?

replied to yanteixeira6 Aug 2023, 16:18

Upvotes 0

AntonioDeDomenico

Dear Rakesh, this a real world problem and you are working on real network data. This means that

1) I am not giving you data measured when the fault appears. This is a prediction problem. Then, we do not what is going to happen in advance.

2) As we are using real data, there may be missing information. Then, it is up to you to try to infer this, e.g., using data from the same file or other files in the test set.

6 Aug 2023, 16:17

Upvotes 2

Rakesh_Jarupula

National Institute of Technology Silchar

Thank you for your response.

replied to AntonioDeDomenico6 Aug 2023, 16:53

Upvotes 1

Rakesh_Jarupula

National Institute of Technology Silchar

Hey @AntonioDeDomenico,

I have a quick question, How do we label the files that have fault > 0 for all rows. Ex: pd.read_csv(os.path.join(FILES_DIR, 'Train', 'B0652-23_1.csv.csv')). Simply `0` ?

Thanks.

replied to AntonioDeDomenico7 Aug 2023, 14:05

Upvotes 0

AntonioDeDomenico

Hi Rakesh, you can try to filter our any file like this in the training set

replied to Rakesh_Jarupula7 Aug 2023, 14:10

Upvotes 0

Rakesh_Jarupula

National Institute of Technology Silchar

You mean simply exclude them ?

replied to AntonioDeDomenico7 Aug 2023, 14:16

Upvotes 0

AntonioDeDomenico

Yes, you should try, as they are not related to the target of the challenge. However, you can not filter them out in case any file like this appears in the test set :(

replied to Rakesh_Jarupula7 Aug 2023, 14:19

Upvotes 0

Rakesh_Jarupula

National Institute of Technology Silchar

Ofcourse 👍

replied to AntonioDeDomenico7 Aug 2023, 14:20

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status