Sendy Logistics Challenge
$7,000 USD
Predict the estimated time of arrival (ETA) for motorbike deliveries in Nairobi
1176 data scientists enrolled, 431 on the leaderboard
23 August 2019—26 November 2019
Is there a general way to approach outliers?
published 17 Sep 2019, 11:11

Should outliers be kept in your modelling process or excluded?

If the test set was free of them, then eliminating them would be necessary, but in this case it is infected with those. Obviously, keeping them will hurt your predictors, and eliminating them would cause your model to be only viable to produce good results on your training set. Any ideas on how to approach the problem here ?

In statistics! Deleting outliers is not a good approach so far.. Only if outliers is obvious. Like Mohammed_jedidi said (is it bike/airplane) In the data a rider forget to click the app at the appropriate time. In such cases outliers should be deleted completely, without such inferior data in never delete outlier it's worsen the performance

Take time to analyse what might causes the outliers..

o take a log function to skew the distribution.

Do you know for sure if the test(public/private) is free from the outliers?