Mismatch in Event IDs:
event_id formats differ across datasets (train.csv includes time-step suffix; processed_chirps_data.csv does not), causing failed merges.
Limited Overlap in Event IDs
train.csv contains 492,020 unique entries, while processed_chirps_data.csv has only 898, severely restricting merged data.
Let me know if any solution or help.
The 492,020 unique entries to which you are referring is that way because they have an '_X_xxx' appended to each of the 898 events where xxx denotes a day in the 730 days being considered.
674 events are provided for training hence 674 * 730 unique train entries.
898*730 gives you the sum of the rows in both the train and test data.
That is correct, not sure if that answers his question
Hi, the starter colab notebook contains code to extract just the event IDs from the longer '_X_xxx' IDs.