Primary competition visual

Alvin Smart Money Management Classification Challenge

Helping Kenya
$3 000 USD
Challenge completed ~3 years ago
Classification
497 joined
220 active
Starti
Jun 22, 22
Closei
Jul 24, 22
Reveali
Jul 24, 22
User avatar
Amy_Bray
Zindi
Q&A Webinar - Wednesday 13 July at 6PM EAT
Connect · 11 Jul 2022, 12:14 · 4

Join us on Wednesday, 13July at 6PM EAT for a webinar. Sign up here --> https://bit.ly/3umHKWP

Drop your questions in the comments or send me a personal message so we can answer them in depth during the webinar.

Discussion 4 answers
User avatar
wuuthraad

yeah I've just been thinking of features to engineer... way too long. The questions are vaguely related to my FE trials ^0^

The questions are related to columns 'MERCHANT_CATEGORIZED_AT' and 'PURCHASED_AT'

1) what is the difference between the two columns?(the merchant_categorized_at and the purchased_at column). The purchased_at column is pretty self-explanatory but the merchant_cateogrized_at is pretty vague. Even after refrencing the "VariableDefinitions.csv" I still do not understand what it is for.

2)Are all the dates correctly added?(in the train and test ), why is it that certain rows have values in "merchant_categorized_at" that are earlier than the "purchased_at" date. Example in the 3rd row of the train dataset the "merchant_categorized_at" has a date of 2022-05-20 but the "purchased_at" date is 2022-05-27. In other rows I see the complete opposite where I get a date in "purchased_at"(2020-05-29) that has well over a year in difference between the "merchant_categorized_at"(2022-05-31) date ,the Item was purchased in 2019 and only categorized in 2022, not sure if that was a production delay or something to do with covid(maybe a hard lockdown preventing delivery) or something intentional(it's a consipracy I say).

Lastly just curious

1) Why so many target variables?

11 Jul 2022, 22:07
Upvotes 0

Are we trying to predict what next transaction a User will make, or what next transaction we can recommend to them?

13 Jul 2022, 07:43
Upvotes 0
User avatar
wuuthraad

On the competition info page it has "The client is interested in EDA, features and a strong classification model."

Does that mean the "client" wants visualizations or do they want strong features that maybe they did not think of? Because one can get strong features without visualization(not saying visualization is not important) I just wanted a bit more clarity on that part

13 Jul 2022, 08:33
Upvotes 0
User avatar
Amy_Bray
Zindi

If you missed the webinar you can watch it over here.

There was a question about duplicates across train and test, we have checked and there are no exact duplicates however there could be instances that I buy coffee on the way to work and the way home from work from the same place for the same amount so that might pop up twice.

15 Jul 2022, 07:28
Upvotes 0