🤖 Hot Topic: Fault Competition 101

Fault Impact Analysis: Towards Service-Oriented Network Operation & Maintenance by ITU

8 000 CHF

Completed (almost 3 years ago)

Skills you will learn

Classification

277 joined

88 active

Info Data Chat Leaderboard

Start

Jul 26, 23

Aug 18, 23

Reveal

Aug 18, 23

yanteixeira

Fault Competition 101

Help · 6 Aug 2023, 17:12 · 2

Hello everyone,

I've decided to offer a quick recap of the competition for those who may be feeling lost. I also hope to encourage more participants to join because I genuinely believe this is an exciting challenge.

The first thing to note is that this competition is distinct from traditional ML competitions. The data isn't directly ready for predictions. Both the training and test sets are available across several CSV files, so you'll need to merge them before use. Furthermore, the target variable we're supposed to predict isn't immediately available; you'll need to engineer it. The host's decision to present the data in this manner is commendable, as it mirrors real-world situations where data is rarely perfectly structured for modeling.

So, what exactly is the target? It's best described as the "status of the data rate when a fault occurs." In the telecom O&M context, it's vital to ascertain whether a fault will directly affect the end-user. How can we determine this? A fault is likely to impact the end-user when the data rate at the fault's moment is lower than the rate before the fault. If this occurs, the network's service quality deteriorates, likely frustrating the user and perhaps even prompting them to switch to a competitor.

For this competition, our primary concern is the kind of fault that results in a decreased data rate. This is deemed an urgent fault that the company must promptly address.

Take note of the discrepancy in row numbers between the submission file and the test data. This is another unique aspect of this competition; not all test data needs to be used.

Why is this competition engaging?

Preparing the training data is a challenge in itself. Because you need to engineer the target, there's ample room for creativity. You can establish rules for dropping rows, retain all rows, focus solely on fault instances, and so on. Your creativity drives a good CV.
The training data mirrors real-life datasets, replete with NaNs, outliers, and sometimes nonsensical data. As the host mentioned in the discussion section, managing these challenges is part of the job.
The test data is also demanding. Since the "rows we need to submit" consist solely of NaNs, your approach to inputting this data greatly affects your final score. You can opt to handle it or even choose a model that can automatically manage NaNs. The choice is yours!

Final thoughts: Given that data gathering is integral to this challenge and there are various ways to create the target, producing a starter notebook that doesn't influence newcomers can be tricky. Nevertheless, it's an essential skill for a data scientist.

I'm eager to hear thoughts from other competitors. And, if the host spots any inaccuracies in my recap, kindly point them out! :D

Discussion 2 answers

University of Yaoundé I

Hi @yanteixeira. I agree with you.

6 Aug 2023, 19:36

Upvotes 1

kenyor

OK. THANK YOU. I APPRECIATE.

7 Aug 2023, 06:25

Upvotes 1

Join the largest network for
data scientists and AI builders

About FAQs

Status