amyflorida626Good afternoon. And how will it be checked which time interval the winners have chosen? It is just noticeable that for Afghanistan, with an increase in the interval of one year, the metric on validation and on the "public" is growing
@D_S while waiting for @amyflorida626 to respond. I think it's very easy to check the time interval used. You are expected to submit all your codes and according to this.
########
4. Code used for data processing.
Note: we will evaluate the potential of the submitted code for data processing according to the technical document (from the RAW data to the resulting maps); the submission will not be accepted if the methodology is evaluated as unrepeatable. The submitted script is limited to python and Google
########,
Going out of the stipulated timing window is an evident unacceptable approach as the data processing is indeed an integral part of the task.
..Indeed, if the model uses a time series in the past it might be operationally more useful for a near real time assessment. However, if using future data leads to a better accuracy, this can also be an interesting research finding in the context of the challenge. I would suggest, if possible to assess the performances of the 2 methods and report the comparison.
The way I understood it is that we are can use different timespan from that of the competition's datasets?
I however got worse results trying that as compared to @D_S
I agree with you on that, my understanding is that timespans beyond the ones given are ok. As far as I understand the given timespans are just an information on when the label where gathered. I don't see why there would be limitation on the timespans of the data if this can lead to better performance ?
But again, answers given on that matter are not 100% clear and a final, clear answer on precisely what data (not labels, we mean data from satelllites) is authorized would be great !
I meant that the winner did not use the April 2022 interval for Afghanistan and from 2019 to 2020 for Iran and Sudan, but completely different intervals. The intervals are clearly spelled out in the task I wanted to clarify that the organizers will be disqualified for this
I don't see how such a limitation would make sense. I understand this limitation on labels, as labelling is expensive. But Sentinel-2 is providing FREE optical data since 2015 so why arbitrarily limit optical data timespans if more of it enable better performance ?
My current best solutions is using timeserie Sentinel-2 optical data from 2021 to 2022 for Afghanistan (but no extra label). If this is not authorized, can any of the organizers please let us know ?
@antoine_saget@D_S What if we use the stipulated time interval for Afghanistan, would that make any difference in this scenario. I think @zindi had already stated clearly for us to see and know exactly what they want from us. Even though we all have different points of view on this project and we shouldn't be carried away with our LB scores rather than making sure building a very robust model a paramount task.
I don't think it's a matter of points of view, I think it's a matter of understanding the rules. The rules leave two possible interpretations, and for me it's not clear at all which is meant by the organizers given the explanations they gave so far (be it on the Info, Data or Discussions pages):
1. The given timerange are only meant as a guide for labeled data but doesn't apply to features (Sentinel-2 timeseries) and in this case it's ok to use timeseries beyond (before and/or after) the given timerange as long as labels are only within the given timerange. (for me this interpretation make the most sense but it's up to the organizers)
OR
2. The given timerange apply to labels and features and in this case it's not ok to use timeseries beyond the given timerange.
@antoine_saget Ohh... I think i understand your point. According to your interpretations above, i will say both interpretations can still be followed, according to @Lolletti being quoted by @JuliusFx which says:
..Indeed, if the model uses a time series in the past it might be operationally more useful for a near real time assessment. However, if using future data leads to a better accuracy, this can also be an interesting research finding in the context of the challenge. I would suggest, if possible to assess the performances of the 2 methods and report the comparison.
But to me, i will rather stick to your second interpretation which compliments perfectly well or agree with my own interpretation and also to what i have done so far on this project pending when we hear from zindi again.
again, time span refers to the labelled data (train and test). Evaluation of the model performance will be against such data. However, no limitation are given by rules on the data that participants want to use for training the model.
Hi, please see the Info page. It is clear there how participants will be evaluated.
Accuracy assessment of the cropland extent maps using test samples, and the balance between training sample dataset and classification accuracies.
The cropland extent of the test regions at 10m spatial resolution in the three test regions, with following further specifications:
a) For Afghanistan (Nangarhar province): temporal cropland extent distribution of April 2022.
b) For Iran and Sudan (both with one-degree by one-degree region): cropland extent distribution during the time period July 2019 ~ June 2020.
Training samples with the features used for the classification procedure. We will provide training samples, and the participants could add training samples by themselves. In addition, we encourage the participants to use small training sample dataset, so that we will limit the maxinum number of total training samples in each test regions at 1,000.
@D_S while waiting for @amyflorida626 to respond. I think it's very easy to check the time interval used. You are expected to submit all your codes and according to this.
########
4. Code used for data processing. Note: we will evaluate the potential of the submitted code for data processing according to the technical document (from the RAW data to the resulting maps); the submission will not be accepted if the methodology is evaluated as unrepeatable. The submitted script is limited to python and Google
########,
Going out of the stipulated timing window is an evident unacceptable approach as the data processing is indeed an integral part of the task.
Happy Hacking!!!!
according to this thread by @antoine_saget :, Does the timespans also limit data acquisition ? - Zindi, @Lolletti says:
The way I understood it is that we are can use different timespan from that of the competition's datasets?
I however got worse results trying that as compared to @D_S
I agree with you on that, my understanding is that timespans beyond the ones given are ok. As far as I understand the given timespans are just an information on when the label where gathered. I don't see why there would be limitation on the timespans of the data if this can lead to better performance ?
But again, answers given on that matter are not 100% clear and a final, clear answer on precisely what data (not labels, we mean data from satelllites) is authorized would be great !
I meant that the winner did not use the April 2022 interval for Afghanistan and from 2019 to 2020 for Iran and Sudan, but completely different intervals. The intervals are clearly spelled out in the task I wanted to clarify that the organizers will be disqualified for this
I don't see how such a limitation would make sense. I understand this limitation on labels, as labelling is expensive. But Sentinel-2 is providing FREE optical data since 2015 so why arbitrarily limit optical data timespans if more of it enable better performance ?
My current best solutions is using timeserie Sentinel-2 optical data from 2021 to 2022 for Afghanistan (but no extra label). If this is not authorized, can any of the organizers please let us know ?
That's why I asked about the tags for Afghanistan whether it is allowed to go outside. So that it would not turn out that it was possible
@antoine_saget @D_S What if we use the stipulated time interval for Afghanistan, would that make any difference in this scenario. I think @zindi had already stated clearly for us to see and know exactly what they want from us. Even though we all have different points of view on this project and we shouldn't be carried away with our LB scores rather than making sure building a very robust model a paramount task.
My take !!!
I don't think it's a matter of points of view, I think it's a matter of understanding the rules. The rules leave two possible interpretations, and for me it's not clear at all which is meant by the organizers given the explanations they gave so far (be it on the Info, Data or Discussions pages):
1. The given timerange are only meant as a guide for labeled data but doesn't apply to features (Sentinel-2 timeseries) and in this case it's ok to use timeseries beyond (before and/or after) the given timerange as long as labels are only within the given timerange. (for me this interpretation make the most sense but it's up to the organizers)
OR
2. The given timerange apply to labels and features and in this case it's not ok to use timeseries beyond the given timerange.
I hope @amyflorida626 or @Lolletti can give a final answer to this question ?
@antoine_saget Ohh... I think i understand your point. According to your interpretations above, i will say both interpretations can still be followed, according to @Lolletti being quoted by @JuliusFx which says:
Hi Anotine,
again, time span refers to the labelled data (train and test). Evaluation of the model performance will be against such data. However, no limitation are given by rules on the data that participants want to use for training the model.
Thank you, it's 100% clear now =)
Hi, please see the Info page. It is clear there how participants will be evaluated.
Accuracy assessment of the cropland extent maps using test samples, and the balance between training sample dataset and classification accuracies.
The cropland extent of the test regions at 10m spatial resolution in the three test regions, with following further specifications:
a) For Afghanistan (Nangarhar province): temporal cropland extent distribution of April 2022.
b) For Iran and Sudan (both with one-degree by one-degree region): cropland extent distribution during the time period July 2019 ~ June 2020.
Training samples with the features used for the classification procedure. We will provide training samples, and the participants could add training samples by themselves. In addition, we encourage the participants to use small training sample dataset, so that we will limit the maxinum number of total training samples in each test regions at 1,000.