Hello @Zindi @amyflorida626
We are supposed to forecast 2 weeks ahead but for some clients the ID's you have given are not 2 weeks ahead leaving us with nans in our submissions.
For example
Client: ID_608a6897d96a507cd36c935d
The max date for this client is 21st dec 2023 but in the sample submission for this same client:
the first date which we are supposed to submit forecasts for is 2024-02-21 which is clearly not 1 week ahead its more than a month a head. Was this intentional or was it a data labelling mistake?
Other clients with this discrepancy are:
{'608a6897d96a507cd36c935d', '618a50e0500d9413097ccd75', '619b545521ae9a187f0f0d97', '61efa4189f0e4645b652296a', '62123b0e0c4f78309f3d8ac6', '6255767a2a9dd71a6713a095', '6298b6310def91542a5752f4', '62a03c6d638c217d405a9155', '6361f9ffedd8353ed336f745', '63b3b2bf9cd0314f465f8744', '645c94036caf5c77fe3a6151', '649011d41e41763cc27f09b6', '6492ce879e3baf373a55aab5', '649964e26db1286c56156c66', '64aaf747729a5508256653e5', '64b6a2e5ec8e2640b008dad3', '64be5c51a5db0060046fa166', '64f83d674494f609796268e8', '6554ec6993170f75bd752ec8'}
Hi @Koleshjr,
As the data is in a time series format, there are missing entries for some clients at some dates resulting from factors such as inactivity and churn. The test set is prepared using the final dates of record for each client, i.e. the final two weeks of activity, which also includes entries missing from within that date range.
Some clients were inactive for a brief period before the final month of record and as such there are a number of entries missing for the final dates in the training set. It's therefore part of the challenge to build models that account for this kind of activity and try to forecast as accurately as possible the future number of clicks.
Hope this clarifies the data, and helps with the challenge!
Yes it does, thanks@Nelly43