Hi, I see some sku_name&date duplicates only in train data. e.g.
Does anyone know if they are real duplicates and we should eliiminate them (I mean it is a kind of data entry problem)? Or there is some other reason for them to exist?
Thanks, I have just checked. So not in all the cases there was a price change for duplicates. For ABEMULAASHL sku there was no change, but we have 4 records for 2019-10, but even if it was a price change - in most of the cases we see something like that for ABEAHAMASHL (2 duplicates for one price and two for another).
Another strange thing is that all duplicates are for 2019-10... That's why I though that is may be some data collection problem
Good question ... I *think* there was a price change in the middle of the month and you get the data on the two sides of that.
Thanks, I have just checked. So not in all the cases there was a price change for duplicates. For ABEMULAASHL sku there was no change, but we have 4 records for 2019-10, but even if it was a price change - in most of the cases we see something like that for ABEAHAMASHL (2 duplicates for one price and two for another).
Another strange thing is that all duplicates are for 2019-10... That's why I though that is may be some data collection problem
Nice spot Ililily ... this data is a bit broken ... looks like duplicates. I'd suggest just ignoring 44893, 44895, 44575 and 44577?