Primary competition visual

Fault Impact Analysis: Towards Service-Oriented Network Operation & Maintenance by ITU

8 000 CHF
Completed (over 2 years ago)
Classification
273 joined
89 active
Starti
Jul 26, 23
Closei
Aug 18, 23
Reveali
Aug 18, 23
User avatar
yanteixeira
Fault Competition 101
Help · 6 Aug 2023, 17:12 · 2

Hello everyone,

I've decided to offer a quick recap of the competition for those who may be feeling lost. I also hope to encourage more participants to join because I genuinely believe this is an exciting challenge.

The first thing to note is that this competition is distinct from traditional ML competitions. The data isn't directly ready for predictions. Both the training and test sets are available across several CSV files, so you'll need to merge them before use. Furthermore, the target variable we're supposed to predict isn't immediately available; you'll need to engineer it. The host's decision to present the data in this manner is commendable, as it mirrors real-world situations where data is rarely perfectly structured for modeling.

So, what exactly is the target? It's best described as the "status of the data rate when a fault occurs." In the telecom O&M context, it's vital to ascertain whether a fault will directly affect the end-user. How can we determine this? A fault is likely to impact the end-user when the data rate at the fault's moment is lower than the rate before the fault. If this occurs, the network's service quality deteriorates, likely frustrating the user and perhaps even prompting them to switch to a competitor.

For this competition, our primary concern is the kind of fault that results in a decreased data rate. This is deemed an urgent fault that the company must promptly address.

Take note of the discrepancy in row numbers between the submission file and the test data. This is another unique aspect of this competition; not all test data needs to be used.

Why is this competition engaging?

  • Preparing the training data is a challenge in itself. Because you need to engineer the target, there's ample room for creativity. You can establish rules for dropping rows, retain all rows, focus solely on fault instances, and so on. Your creativity drives a good CV.
  • The training data mirrors real-life datasets, replete with NaNs, outliers, and sometimes nonsensical data. As the host mentioned in the discussion section, managing these challenges is part of the job.
  • The test data is also demanding. Since the "rows we need to submit" consist solely of NaNs, your approach to inputting this data greatly affects your final score. You can opt to handle it or even choose a model that can automatically manage NaNs. The choice is yours!

Final thoughts: Given that data gathering is integral to this challenge and there are various ways to create the target, producing a starter notebook that doesn't influence newcomers can be tricky. Nevertheless, it's an essential skill for a data scientist.

I'm eager to hear thoughts from other competitors. And, if the host spots any inaccuracies in my recap, kindly point them out! :D

Discussion 2 answers
User avatar
ff
University of Yaoundé I

Hi @yanteixeira. I agree with you.

6 Aug 2023, 19:36
Upvotes 1

OK. THANK YOU. I APPRECIATE.

7 Aug 2023, 06:25
Upvotes 1