Primary competition visual

Inundata: Mapping Floods in South Africa

Helping South Africa
$10 000 USD
Completed (~1 year ago)
Classification
1340 joined
315 active
Starti
Nov 22, 24
Closei
Feb 16, 25
Reveali
Feb 17, 25
Challenges During Data Processing
Help · 27 Nov 2024, 03:19 · 3

Mismatch in Event IDs:

event_id formats differ across datasets (train.csv includes time-step suffix; processed_chirps_data.csv does not), causing failed merges.

Limited Overlap in Event IDs

train.csv contains 492,020 unique entries, while processed_chirps_data.csv has only 898, severely restricting merged data.

Let me know if any solution or help.

Discussion 3 answers
User avatar
isaacOluwafemiOg
Kwame nkrumah university of science and technology

The 492,020 unique entries to which you are referring is that way because they have an '_X_xxx' appended to each of the 898 events where xxx denotes a day in the 730 days being considered.

674 events are provided for training hence 674 * 730 unique train entries.

898*730 gives you the sum of the rows in both the train and test data.

27 Nov 2024, 05:40
Upvotes 0

That is correct, not sure if that answers his question

Hi, the starter colab notebook contains code to extract just the event IDs from the longer '_X_xxx' IDs.

4 Dec 2024, 13:54
Upvotes 0