🌱 Hot Topic: Train.csv might have some wron...

Unifi Value Frameworks PDF Lifting Competition

Helping South Africa

$5 000 USD

Completed (over 2 years ago)

Skills you will learn

Generative AI

452 joined

73 active

Info Data Chat Leaderboard

Start

Dec 21, 21

Mar 17, 24

Reveal

Mar 17, 24

HackP

National School Of Computer Science (ENSI) - Tunisia

Train.csv might have some wrong labeled values

Data · 10 Mar 2024, 16:48 · 6

Hello All, I hope you are doing fun with this awesome competition. I would like to encounter a critical point that made me confused about whether to the Train.csv or not.

Okay for example, when i wanted to dive into EDA and see how values of year 2021 have been collected, i remarked that labeling might have some issues. For example, for the Impala company, the pdf is ESG-spreads.pdf, I started by selecting Train.csv rows that have this Group to know what different metrics it has. (As the photo below shows : ).

Focusing in metric 128: Total Direct CO2

I am back to the pdfs to found out that these are not the actual values for the metric and they are different from those mentioned in Train.csv. Shall we rely on train.csv in that case ?

Photo Link:

https://drive.google.com/file/d/1wG60luQtKMb_fZymvCLMBGH-wFy9OQt1/view?usp=sharing

Discussion 6 answers

Juliuss

Freelance

Was about to start this thread..yes the train.csv is terrible and its not only for Impalla. Data entry issues?? If the file we are scored against is also having these issues, its even a big issue. @Zindi ??

10 Mar 2024, 16:51

Upvotes 0

HackP

National School Of Computer Science (ENSI) - Tunisia

Sorry there was an error with the picture and now it is uploaded well.

We need clarification into this issue as it might affect our approchs.

replied to Juliuss10 Mar 2024, 16:56

Upvotes 0

Koleshjr

Multimedia university of kenya

Actually it's not an error. Dive into the data and understand how they got that value. Because it's actually a correct value. I had this assumption when I started but after more analysis, I found that it's not a data entry issue

10 Mar 2024, 17:07

Upvotes 2