🌊 Let's Talk About: Challenges During Data Process...

Inundata: Mapping Floods in South Africa

Helping South Africa

$10 000 USD

Completed (over 1 year ago)

Skills you will learn

Classification

1342 joined

314 active

Info Data Chat Leaderboard

Start

Nov 22, 24

Feb 16, 25

Reveal

Feb 17, 25

mihir

Challenges During Data Processing

Help · 27 Nov 2024, 03:19 · 3

Mismatch in Event IDs:

event_id formats differ across datasets (train.csv includes time-step suffix; processed_chirps_data.csv does not), causing failed merges.

Limited Overlap in Event IDs

train.csv contains 492,020 unique entries, while processed_chirps_data.csv has only 898, severely restricting merged data.

Let me know if any solution or help.

Discussion 3 answers

isaacOluwafemiOg

Kwame nkrumah university of science and technology

The 492,020 unique entries to which you are referring is that way because they have an '_X_xxx' appended to each of the 898 events where xxx denotes a day in the 730 days being considered.

674 events are provided for training hence 674 * 730 unique train entries.

898*730 gives you the sum of the rows in both the train and test data.

27 Nov 2024, 05:40

Upvotes 0

Hookslaw

That is correct, not sure if that answers his question

replied to isaacOluwafemiOg3 Dec 2024, 21:51

Upvotes 0

moverlan

Hi, the starter colab notebook contains code to extract just the event IDs from the longer '_X_xxx' IDs.

4 Dec 2024, 13:54

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status