Alvin Smart Money Management Classification Challenge
Can you classify purchases recorded on Alvin into different categories?
$3 000 USD
Ended 22 days ago
221 active · 451 enrolled
Financial Services
Q&A Webinar - Wednesday 13 July at 6PM EAT
Connect · 11 Jul 2022, 12:14 · 4

Join us on Wednesday, 13July at 6PM EAT for a webinar. Sign up here -->

Drop your questions in the comments or send me a personal message so we can answer them in depth during the webinar.

Discussion 4 answers

yeah I've just been thinking of features to engineer... way too long. The questions are vaguely related to my FE trials ^0^

The questions are related to columns 'MERCHANT_CATEGORIZED_AT' and 'PURCHASED_AT'

1) what is the difference between the two columns?(the merchant_categorized_at and the purchased_at column). The purchased_at column is pretty self-explanatory but the merchant_cateogrized_at is pretty vague. Even after refrencing the "VariableDefinitions.csv" I still do not understand what it is for.

2)Are all the dates correctly added?(in the train and test ), why is it that certain rows have values in "merchant_categorized_at" that are earlier than the "purchased_at" date. Example in the 3rd row of the train dataset the "merchant_categorized_at" has a date of 2022-05-20 but the "purchased_at" date is 2022-05-27. In other rows I see the complete opposite where I get a date in "purchased_at"(2020-05-29) that has well over a year in difference between the "merchant_categorized_at"(2022-05-31) date ,the Item was purchased in 2019 and only categorized in 2022, not sure if that was a production delay or something to do with covid(maybe a hard lockdown preventing delivery) or something intentional(it's a consipracy I say).

Lastly just curious

1) Why so many target variables?

Are we trying to predict what next transaction a User will make, or what next transaction we can recommend to them?

On the competition info page it has "The client is interested in EDA, features and a strong classification model."

Does that mean the "client" wants visualizations or do they want strong features that maybe they did not think of? Because one can get strong features without visualization(not saying visualization is not important) I just wanted a bit more clarity on that part

If you missed the webinar you can watch it over here.

There was a question about duplicates across train and test, we have checked and there are no exact duplicates however there could be instances that I buy coffee on the way to work and the way home from work from the same place for the same amount so that might pop up twice.