Hey guys, How is the CV vs LB of the model according to competition rules? Will help us raise the bar and deliver quality models in general.
NOTE: The only rule stated by the host, the model must NOT use future data for modeling.
My latest: CV - 0.739, LB - 0.9933
My best: CV - 2.16, LB - 2.13
What's your CV strategy? How many folds do you use? Thanks.
custom kfolds (based on competition test set), 10 folds
CV - 1.64, LB - 2.15
Did you use historical data, especially historical data of the Energy?
Yes, I have. There is no rule from the organizers stating not to use it until now . But yes, we will have to explain the approach and it's real world application.
I am a bit confused about the use of future data.
Does the restriction considers only the use of future values of the energy consumption field (lag features for example ) or the entire future data ?
If the response is the latter, how is it possible to predict energy consumption for the first hours of the first day ( in that case only few training samples can be used ) to train the model.
Hi @ahmedattia,
In the past discussions, it was stated not to use the future values as features in the model which should include any KPI + target variable.
However, we can train on the complete data, as training on complete data will capture the instantaneous physical relationships between the target and independent KPIs, so for the test samples of the first hours your prediction will be based on these instantaneous relationships.
PS: The above thoughts are solely mine and do not represent the organizer's point of view.
my : CV-1.33,LB-1.68
iam trying with xgboost but still it CV is not decreasing any suggestions...
hi , I have a question out of the subject , what is the best way to merge the data ??