Hi ZINDI I am still confused, please I need a clarification:
Which one of these two strategies is correct:
- Train on field 1 then predict on field 1
- Train on field 2 then predict on field 2
- Train on field 3 then predict on field 3
- Train on field 4 then predict on field 4
- Train on all 4 fields without mixing information from different fields then predict on all 4 fields
From my point of view, I think you should train in individual fields. Prediction on the test set should be as a result of different models trained on the different fields data but the parameters of the models(the four models) should be the same. For instance, if you are using a tree-based algorithm if you have used 100 estimators to model field 1 that should be constant throughout the other fields. That's my take.
Thank you @javablack for your response.
But in production if we want to deploy our solution in new Field x, should us train our model in that new Field in order to be operative ?
I think in practice we don't have to train our work each time for each new Farmer (user). It is weird !
Yes, I totally agree with you. However, if you think critically about the aim of the competition is to find the best model which can generalize on any field given the exact same features as the four fields. Maybe they(The people who want the solution) have more data which they will try with the model and hyperparameters the competition winner would have submitted to train now a big general model.