Should outliers be kept in your modelling process or excluded?
If the test set was free of them, then eliminating them would be necessary, but in this case it is infected with those. Obviously, keeping them will hurt your predictors, and eliminating them would cause your model to be only viable to produce good results on your training set. Any ideas on how to approach the problem here ?
In statistics! Deleting outliers is not a good approach so far.. Only if outliers is obvious. Like Mohammed_jedidi said (is it bike/airplane) In the data a rider forget to click the app at the appropriate time. In such cases outliers should be deleted completely, without such inferior data in never delete outlier it's worsen the performance
Take time to analyse what might causes the outliers..
o take a log function to skew the distribution.
Do you know for sure if the test(public/private) is free from the outliers?