There are NaN 'customer_id' and 'id' values in the Test set (about 40200 of them) as well as many NaN 'CID X LOC_NUM X VENDOR' values in the Train set. I'm wondering, what is the point of having them there if they can't be used to make predictions?
Yea I discovered so, but was able to get the test set to be the same with the sample submission file by merging the test customers file to the test locations, instead of the test locations to the test customers. Do same to the train set. You can cross check with the sample submission when you are done.
Yea I discovered so, but was able to get the test set to be the same with the sample submission file by merging the test customers file to the test locations, instead of the test locations to the test customers. Do same to the train set. You can cross check with the sample submission when you are done.