Hi!
So, it`s the time to share our approaches!
Data: I use the 11 channels images (RGB + NIR + ndvi + mask + scaled B5, B6, B7, B8A + scaled mask) resized to (150, 150) and center cropped to (100, 100), then concate them by time dimension (11 time slices, 11 channels, 100, 100).
Model: the model is 11 ResNet50 for each time slice, the embeddings from last convolutional layer are concated among time dimension and go to 2 biLSTM with Attention. Then simple linear layer takes Attentions` outputs and poolings (mean and max) from each LSTM.
Augmentations: horizontal and vertical flips, ShiftScaleRotate and RandomSisedCrop. Loss - CrossEntropy. Validation - simple 5-KFOLD subject to the unique FieldID.
This approach gives ~0.43 on the private LB (~0.45 public).
Most important part is pseudo labeling test observations with this model (more precisely, the blend of many models haha). Then I add all test to my train (using soft labels, so CrossEntropy is changed to SoftCrossEntropy https://discuss.pytorch.org/t/cross-entropy-for-soft-label/16093 ) and retrain model (add the whole test in each training fold). It gives ~0.4 in the private LB.
Congrats! Thank you for sharing.
Nice, so you used CNN. Impressive
Thanks for sharing! Would you mind sharing your preprocessing steps for the channels? You mentioned you scaled some channels?
old_image = np.load(path_to_old_image_in_npy)
_h, _w, _c = old_image.shape
rgb = cv2.resize(rgb,(_w, _h))
rgb = np.dstack([old_image, rgb])
Just something like that. Where old_image - ndarray with cropped field image from SUBDATASET_1 (h_old, w_old, 6) and rgb - ndarray with new SUBDATASET_2 image (mostly h_old // 2, w_old // 2, 5).
I divide each cheanel by 32768 (max value for SENTINEL-2 images) and add BatchNorm2d before first convolution of every ResNet.
I also started working on a CNN+LSTM model but couldn't finish it due to lack of time. Well done! Its interesting to see what B5,6,7,8A bands (which have 20m res vs 10m for RGB and NIR) contribute to the model. The majority of fields is small and my initial guess was that 20m resolution is not enough to represent useful features.
Congrats! Its really an interesting solution.
I added them in the last day of the competition, because I thought like you.. Additional channels boost model from 0.45 to 0.43 on the public LB. But I haven`t time to tune the model. There could be better way to utilize information from additional channels.
Congrats!