I have just been catching up on some of the discussions highlighting the allowed features for this competition. I want to summarize them and have Zindi confirm them
1. Field ID (not allowed) - this is not allowed because it leaks information on the field labels. Many field IDs close to each other have the same labels. I believe this was not considered when the fields were labeled and that is why this leaks information. It is also the same reason you can't use any 'ordering' of data as a feature because ordering means you can recreate the same leak you have in Field ID
2. row_loc and column_loc (not allowed) - you can't use row_loc and column_loc (or coordinates) because your model needs to generalise on inherent crop properties and not its spatial position. You want to be able to predict crop type based on Band Observations only and not if the farm is close to other (similar) farms or where it is located. This appears valid to me and I agree with the organisers on this.
3. Date of Observation (not allowed) - you are not allowed to use the absolute values of the date of observation. And I think this is so because to generalise your model, you don't want to restrict it to the exact dates the observation was gotten. The band observations are well spread over the planting season so you don't have to use the exact dates. You may use the dates to see closeness to harvest or how far gone the crop is in planting season I believe, to mark a deviation in crop nutrient properties but using exact dates will not generalise well.
4. Tiles (unsure): I am adding this because @DrFad said in another thread that he believes they're simply like folders and should not be used as features. I agree but I also think they are categories of observations and for this specific project, we may need to account for possible observation variations. The downside to using this may also be closeness of crop fields since some tiles may contain fields of certain crop types more than others and this would mean your model may not also generalise well. I'd prefer it isn't also used.
I am making this summary because many things have been disallowed in the course of this competition and new people joining may not have gone through the discussions or have context on what and what not is allowed. I'd also like Zindi to make clarifications and list all disallowed features for the benefit of all participants. It also helps to know how one fares on the scale of fairness on the leaderboard.
Well said @Alchemi. This definitely clarifies restricted features and will place all on a level ground. I just hope those that have used the features inform Zindi via email to disqualify such entries.
We also need Zindi or Radiant to confirm the restriction of "tiles" feature . This inherently provides some information gain because it groups similar crops to some extent. In my view, "tiles" shouldn't be used. I haven't used it so far.
"You may use the dates to see closeness to harvest or how far gone the crop is in planting season I believe"
Is there a statement by ZindI that supports this? ZindI has said "dates cannot be used". my reading is not just "absolute", but also relative. Relativity of dates, even order of dates is a strong feature. Zindi, please clarify if date order, or distance between dates, may be used.
Zindi, may we use the dates to see closeness to harvest or how far gone the crop is in planting season? Can we use dates at all, such as days between images?
Please. Could you add another point to your list
Private sharing is not allowed too
Thank you for this summary.
4. Tiles (not allowed). You may use them to mosaic the fields into one image, but you may not use them as a field in modeling.
Thanks Zindi For the clarification
Thanks @Alchemi for consolidating and helping to clarify the rules re: spatial and temporal coordinate usage. Really helpful to see it all in one place!
I have the same understanding of the rules and confusion as @robga - I thought neither absolute or relative usage of dates are allowed. For example, using relative temporal coordinates could mean ordering the data by earliest to latest date, calculating time elapsed or spectral changes between 1 or more relative steps in dates, or creating other features that capture seasonality of the planting and harvest cycle.
Re: some additional clarity on use of spatial coordinates, I understand that neither absolute nor relative pixel positions for each field are allowed for use. So for relative pixel positions, this would mean we can't use things like distance from a field to another field, the boundary shape or orientation of a field or any other features that represents local spatial relationships between pixels? I think this would rule out the use of any convolutional neural net or classical computer vision filter/kernel approach to recognize spatial patterns between pixels.
@Zindi, @RadiantEarth - could you please clarify whether the use of relative temporal or spatial coordinates are permitted?
I also noticed in the official rules section that there's a typo which may cause some confusion:
"This is a computer vision challenge. Models that use metadata such as dates or spatial coordinates will not be accepted as a winning solution. You may use the dates to reconstruct the 2x2 grid (00 01 02 03) into a single mosaic."
I assume this is supposed to say "You may use the tile IDs to reconstruct the 2x2 grid (00 01 02 03) into a single mosaic"?
Thanks @Alchemi for summarizing these. Here are some clarifications on these points to address new questions that are raised in this thread:
- Field ID: You are not allowed to use Field ID as "feature" input to the model. But you can use Field ID in your preprocessing to find all the corresponding pixels of a specific field and derive features such as area from that. The reason you are not allowed to use Field ID is that this number is just an index and does not have any correlation with the crop type in real world, and cannot be used on any other dataset.
- Location : You are not allowed to use absolute location information (row_loc and column_loc) or relative location information (such as closeness of two fields) as "feature" input to the model. Similar to Field ID, location information has no correlation with crop type in real world.
- Date of Observation: You are not allowed to use absolute or relative value of Date of Observation as "feature" input to the model. For example 2019/08/05 is a date of observation, you should not use this value as a feature in your model. Similar to Field ID, you can use the date values in your preprocessing to derive any feature that you want. Let's say you want to interpolate the value of a band between two observations, you are allowed to use the dates to implement your interpolation. But the final "feature" input to your model should not contain any absolute date value. You can also use dates to order data before inputing them into the model as @daveluo asked. This can help your prediction as the observations are not equally distanced in time (mostly due to omission of cloudy images), and phenology of crop is key to be able to detect the crop type.
- Tile IDs: You are not allowed to use absolute or relative Tile ID as "feature" input to the model. This information again has no relationship with crop type in real world. You are allowed to use the Tile ID in preprocessing for example to merge the four tiles if you want.
Hello! I have added tiles in my predictions. What should I do to delete my last submission ?
send mail to Zindi@zindi to remove the submitted for you
Also check to see that you didn't include Field_id in the train data
Dr fad is also amazed of the scored even@taviv should check
When this post was posted alchemi's score was 1.15 and now his score is 1.16 really I don't understand what is going on in this Competition I think everyone is pretending like he is not using those features or what ??
btw you did a good move to downranking your self, looking forward to see Dr Fad's solution, with code, this time not just a description please
Hi Mohammed, any particular reason why you are interested in my code?
can't comprehend on this enough. But to be realistic everyone using the Not Allowed features should Kindly mail Zindi to Remove the submission. @sbs complained about it since 2 days ago and nothing still changed
@Datasciensyash. Hope you are doing good. Please can you confirm if you mistakenly included the restricted features in your model I. E. Field_id, row_loc, column_loc and tiles. If so, you can send an email to Zindi to remove the entry. Otherwise, great job to attain that score. All the best