ICLR Workshop Challenge #2: Radiant Earth Computer Vision for Crop Detection from Satellite Imagery
$7,000 USD
Identify crop type using satellite imagery, and win a trip to present your work at ICLR 2020 in Addis Ababa.
3 February–15 March 2020 23:59
274 data scientists enrolled, 39 on the leaderboard
Starter Notebook (currently top of leaderboard)
published 10 Feb 2020, 06:13

Hi all,

I have a starter notebook up. Very rough since I don't have much time, but if there is anything unclear let me know and I'll add explanation where necessary :)

The approach I took is to sample all the pixels of each field and store the data in a nice big dataframe. Then we treat this as a tabular classification problem. Throwing a simple random forest classifier at the data gives a score of ~1.30.

I'm sure you will all make lots of improvements to this shortly :) One request: if you use this code/approach, please give a little acknowledgement in your submission/code? "based on the starter code by JW" is enough - just gives a little attribution.

Notebook: https://colab.research.google.com/drive/1DPizsNT7GUK776TRDmk5rZVMsB1kJY5H

great notebook for a starter

this error is occurring

FileNotFoundError: [Errno 2] No such file or directory: '/content/data/00/20190825/0_B03_20190825.tif'

Thanks a lot for this. You are a gem. Will mention your great work and give credits for sure.

This should be changed to refelct your own system (computer) directory or path where you have the files

When running on Colab, that's where the data was downloaded. Change paths to reflect your own system if running locally.

the total training data in the description is 3316 and on my data frame is 3286

There are a few field IDs missing. Zindi is looking into it and should have an update shortly. For now, using the data as is works fine.

Any ideas for improvements ? The RF seems to be predicting every time randomly

Its predictions will vary slightly each time - that's normal. To get better results, you can try adding features (for example, calculate NDVI for each date). You could also try different model types (Catboost, XGBoost, maybe something like fastai tabular or NODE) or just tweak the parameters of the Random Forest model (n_estimators=1000, for example).

Can you please explain what you mean by NDVI ? - Lgb and XGboost seem to be not so well working as RF here

It's a measure for vegetation (VI => vegetaiton index) In this case we have different image bands, including the red part of the spectrum (band 4) and the near IR part of the spectrum (band 8). `ndvi = (nir-red)/(nir+red)` is how it's usually calculated. You can do this for each date, and get some vegetation-specific numbers that the model might be able to use to good effect.

Wooo I don't know about vegetation . Thanks

Were you able to load the data into EOLearn?

This notebook was really helpful thanks man

Keep getting this error from the starter code:

ValueError                                Traceback (most recent call last)
<ipython-input-27-f8f030b35458> in <module>()
     17       # Load im
---> 18       im = load_file(f"/content/drive/My Drive/mlhub-tutorials-master/mlhub-tutorials-master/notebooks/2020 CV4A Crop Type Challenge/data/{t}/{d}/{t[1]}_{b}_{d}.tif")
     20       # Going four levels deep. Each second on the outside is four weeks in this loop

3 frames

/usr/local/lib/python3.6/dist-packages/tifffile/tifffile.py in read_array(self, dtype, count, out)
   6400         n = fh.readinto(result)
   6401         if n != size:
-> 6402             raise ValueError(f'failed to read {size} bytes')
   6404         if not result.dtype.isnative:
ValueError: failed to read 24474240 bytes

Only thing I can think of is that the tiff file might be corrupted. Maybe try copying them to your local directory, and opening them with load_file one by one to see if it's just one image giving trouble?