Should I use Machine Learning models like RandomForest, DecisionTree, CatBoost etc. or is there any other approach towards recommendation systems ?
That's what I'm also wondering. No clue about recommendation systems approaches but initially I thought that could also be done as a classification task with the common ML tree models since there are many features in the datasets.
But after inspecting the data, I'm wondering how to come up with the negative samples that the model should also learn? The data in the orders table basically means that we have a positive data point for that particular customer and vendor and the information in all the other columns are only available since the order was places but otherwise isn't available. So I'm wondering if that's of any use then?
That is what I was thinking too. The only useful file would be the vendors.csv, which contains important features like vendor rating, location, type of food delivered etc.
Can anyone more experienced help us out here ??
As for building recommendation systems, I don't know much about that except for some technique called collaborative filtering, but I believe it's a problem that can be modeled by traditional methods.
From the little analysis I've done, there are about 10,000 unique customers in the test set, 12 unique locations, and 100 unique vendors/restaurants. For a combination of these 3 variables, we are to predict if the certain combination occurred, in other words, given a customer, in a location, did they make an order from a particular vendor. For example given A1S2D3 X 12 X 44, did the customer with ID A1S2D3, at location 12, make an order from vendor 44, predict yes (1) or no (0).
Given the above scenario, one can generate negative samples for the train set by generating data combinations based on the format "CUSTOMER ID X LOCATION X VENDOR" that doesn't exist in the given order data. This makes the samples negative as it is assumed the customer didn't make the order given the flow and some other variables. The implication of this is that the dataset would explode and become larger which brings up the question as to how it would affect the quality of modelling, also, how would other variables relevant to when an order is made be matched to the negative samples? I think this approach was adopted by some people in this similar competition which I must say was difficult to model (LB scores is an evidence)
NB: Test set can be assumed to be SampleSubmission as that's what would be eventually evaluated, also, I observed that not all customers in the test set have the same ID X LOCATION X VENDOR combination, some customers existed in one location, while some other in multiple locations. Therefore there won't be a total of 10000 x 12 x 100 samples to be predicted.
Another alternative to dealing with no negative samples is to treat the problem as One-Class Classification e.g by using anomaly or novelty detection techniques. I doubt their modelling effectiveness as they are not so popular.
Btw I'm still trying to formulate this competition problem.
Use suprise frameworks