- TLDR; An ensemble of 1D CNN models
- Seeing that 2D CNN approaches were limited as it was difficult for a human to detect whether the class of an image was `1` or `0`, I decided to switch to a 1D CNN approach.
- A key to success here is normalization of the input data. I found the recommended band specific normalization factors to be underperformant and opted to normalize all channels by 4500 (didn't try to many things here - though I think that better normalization could have led to an even better result).
- Used lots of batchnorm and large dropout to stabilize training.
- Kernel size of the CNN layers was an important parameter.
- Trained on all 6 channels.
The final result was an ensemble of 1d CNN models with different kernel sizes. The model itself is
# Dataset
class CustomDataset(torch.utils.data.Dataset):
def __init__(self, images, labels, test=TEST):
self.images = images
self.labels = labels
self.test = test
def __len__(self):
return len(self.images)
def __getitem__(self, index):
image = self.images[index]
image = image.flatten()
image = image / 4500.0
image = torch.tensor(image).float()
if self.test:
return image
label = self.labels[index]
label = torch.tensor(label, dtype=torch.float)
return image, torch.unsqueeze(label, 0)
# Model
class CNNModel(nn.Module):
def __init__(self):
# Define the CNN layers
self.conv1 = nn.Conv1d(in_channels=1, out_channels=32, kernel_size=3, padding=1)
self.bn1 = nn.BatchNorm1d(32)
self.conv2 = nn.Conv1d(in_channels=32, out_channels=64, kernel_size=3, padding=1)
self.bn2 = nn.BatchNorm1d(64)
self.conv3 = nn.Conv1d(in_channels=64, out_channels=128, kernel_size=3, padding=1)
self.bn3 = nn.BatchNorm1d(128)
self.conv4 = nn.Conv1d(in_channels=128, out_channels=256, kernel_size=3, padding=1)
self.bn4 = nn.BatchNorm1d(256)
self.conv5 = nn.Conv1d(in_channels=256, out_channels=128, kernel_size=3, padding=1)
self.bn5 = nn.BatchNorm1d(128)
self.pool = nn.MaxPool1d(kernel_size=2)
self.gap = nn.AdaptiveAvgPool1d(1)
self.fc1 = nn.Linear(128, 64)
self.fc2 = nn.Linear(64, 1)
self.dropout = nn.Dropout(p=0.6)
def forward(self, x):
x = x.view(x.size(0), 1, -1)
x = F.relu(self.bn1(self.conv1(x)))
x = self.pool(F.relu(self.bn2(self.conv2(x))))
x = self.pool(F.relu(self.bn3(self.conv3(x))))
x = self.pool(F.relu(self.bn4(self.conv4(x))))
x = self.pool(F.relu(self.bn5(self.conv5(x))))
x = self.gap(x)
x = x.view(x.size(0), -1)
x = self.dropout(x)
x = F.relu(self.fc1(x))
x = torch.sigmoid(self.fc2(x))
return x
@nymfree congo on your win. Can you explain why you choose normalization factor 4500, is it through experiments or there is any particular reason for that.
Thanks. For satellite imaging, spectral bands have different normalization factors. If I remember correctly, it is 3000, 2500 and 2500 for blue, green and red respectively. And some other values for the other channels.
I found channel specific normalization not to perform better than a single normalization factor for all. experimented with 3000, 3500, 4000, 4500 and 5000. 4500 gave better CV.
Congratulations on your performance, colleague. How did you solve the class imbalance problem? I created several balanced datasets with the same number of class 1 and the same number of class 0. But in each dataset, I used different samples from class 0. I did an ensemble of 2D convolutional neural networks and obtained an AUC of 0.93 on my test set, which corresponds to 25% of the training set. When I submitted my official score, it was 0.504. My other attempt was to leave the dataset imbalanced and use focal loss along with ResNet50. I achieved an AUC of 0.95 on my test set and an AUC of 0.501 on the leaderboard. I don’t know what I did so wrong for the models to degrade so much on the official test set. Below is a bit of the code with ResNet50.
I didn't do anything special regarding class imbalance. I trained on the whole dataset using 5 folds. The folds were stratified such that they contained about the same proportion of 1s and 0s.
2D convs were not successfull in this competition, in my opinion. The other reason you could have been getting high AUC and LB equivalent to random guessing might be because you didn't use the id_map.csv file to properly sort your predictions. There is some other discussion on the forum about this.