Wazihub Soil Moisture Prediction Challenge
$8,000 USD
Predict soil humidity using sensor data from low-cost DIY Internet of Things in Senegal
738 data scientists enrolled, 96 on the leaderboard
29 July 2019—21 October 2019
Can the model use irrigation data from the future?
published 3 Oct 2019, 11:47

For the model can you assume you will know the irrigation schedule for the days you need to forecast?

No, the only future data one is allowed to use are the peak soil humidity values. Remember the model is supposed to help determine irrigation schedules, so knowing them beforehand is counterintuitive.

but then why can we even use the peak soil since it will be unknown when the model is deployed

I understand that our goal is to determine irrigation schedules which is why I was surprised that there was still irrigation in the days to forecast each field. It seems to me that a model that takes future irrigation values as input would be useful, as you could input hypothetical irrigation schedules and retrieve predicted soil humidity, conditional on the input irrigation schedule. It is difficult to predict soil humidity accurately when you do not know at what times the field is irrigated as it is the irrigation variable that is most highly associated with soil humidity.

Don't worry, I was also confused by that, but we are definitely not allowed to use it, as it is considered future information which we won't have in reality. If they were being nice, they would provide the data for us with the test data having no irrigation at all, but that's not a realistic thing to do, so they give us the peak soil humidity with which one can augment one's forecasts instead.

Can you please help me with this question, is the peanuts context data is correct (I mean the Water_Need_1day column is zero !)

and for "Water_Need_2day", we have a value for each day instead of aggregating by two days !

Hi there!

The important thing here is to understand what the eventual output of the model has to be, and the only way to do that is to read the "Info" and "Data" tabs thoroughly to fully understand what is being asked. Eventually what is relevant and what is irrelevant becomes clear.

When in doubt, remember the golden rule of any model in production: If the historical data for a variable was bad without explanation, the variable should not be used.

ok really thank you very very much for this advice :) just a final quick question please, when you said "so they give us the peak soil humidity with which one can augment one's forecasts instead", you meant that this value should play a role of an indicator for our forcasting results, if we are going the right way or we should try another path, is this what those peaks about ?

So in reality, the model will be predicting future soil moisture levels for which we don't know what the irrigation schedule will be. But seeing as this is not the case for our test data, we probably have to use the peak values to "reset" our forecasts to a certain level before letting the model predict again until the next peak, which will allow our predictions to look like the test data. I personally don't want to use the irrigation schedule explicitly, as it may not always be available, but I am sure many of the top solutions use it anyway.

yeah that's what I thought also, it's like a station or a reset like you said, to continue the forcasting, any way, thank again pal for you help :)

I wish @Zindi helps us figure it out exactly what should be used or not ?

Hi Zindians,

You may not use any information from the future.