yeah I've just been thinking of features to engineer... way too long. The questions are vaguely related to my FE trials ^0^
The questions are related to columns 'MERCHANT_CATEGORIZED_AT' and 'PURCHASED_AT'
1) what is the difference between the two columns?(the merchant_categorized_at and the purchased_at column). The purchased_at column is pretty self-explanatory but the merchant_cateogrized_at is pretty vague. Even after refrencing the "VariableDefinitions.csv" I still do not understand what it is for.
2)Are all the dates correctly added?(in the train and test ), why is it that certain rows have values in "merchant_categorized_at" that are earlier than the "purchased_at" date. Example in the 3rd row of the train dataset the "merchant_categorized_at" has a date of 2022-05-20 but the "purchased_at" date is 2022-05-27. In other rows I see the complete opposite where I get a date in "purchased_at"(2020-05-29) that has well over a year in difference between the "merchant_categorized_at"(2022-05-31) date ,the Item was purchased in 2019 and only categorized in 2022, not sure if that was a production delay or something to do with covid(maybe a hard lockdown preventing delivery) or something intentional(it's a consipracy I say).
On the competition info page it has "The client is interested in EDA, features and a strong classification model."
Does that mean the "client" wants visualizations or do they want strong features that maybe they did not think of? Because one can get strong features without visualization(not saying visualization is not important) I just wanted a bit more clarity on that part
If you missed the webinar you can watch it over here.
There was a question about duplicates across train and test, we have checked and there are no exact duplicates however there could be instances that I buy coffee on the way to work and the way home from work from the same place for the same amount so that might pop up twice.
yeah I've just been thinking of features to engineer... way too long. The questions are vaguely related to my FE trials ^0^
The questions are related to columns 'MERCHANT_CATEGORIZED_AT' and 'PURCHASED_AT'
1) what is the difference between the two columns?(the merchant_categorized_at and the purchased_at column). The purchased_at column is pretty self-explanatory but the merchant_cateogrized_at is pretty vague. Even after refrencing the "VariableDefinitions.csv" I still do not understand what it is for.
2)Are all the dates correctly added?(in the train and test ), why is it that certain rows have values in "merchant_categorized_at" that are earlier than the "purchased_at" date. Example in the 3rd row of the train dataset the "merchant_categorized_at" has a date of 2022-05-20 but the "purchased_at" date is 2022-05-27. In other rows I see the complete opposite where I get a date in "purchased_at"(2020-05-29) that has well over a year in difference between the "merchant_categorized_at"(2022-05-31) date ,the Item was purchased in 2019 and only categorized in 2022, not sure if that was a production delay or something to do with covid(maybe a hard lockdown preventing delivery) or something intentional(it's a consipracy I say).
Lastly just curious
1) Why so many target variables?
Are we trying to predict what next transaction a User will make, or what next transaction we can recommend to them?
On the competition info page it has "The client is interested in EDA, features and a strong classification model."
Does that mean the "client" wants visualizations or do they want strong features that maybe they did not think of? Because one can get strong features without visualization(not saying visualization is not important) I just wanted a bit more clarity on that part
If you missed the webinar you can watch it over here.
There was a question about duplicates across train and test, we have checked and there are no exact duplicates however there could be instances that I buy coffee on the way to work and the way home from work from the same place for the same amount so that might pop up twice.