🌊 Join the Buzz: Team Central_Park: Second Solu...

Inundata: Mapping Floods in South Africa

Helping South Africa

$10 000 USD

Challenge completed 8 months ago

Skills you will learn

Classification

1331 joined

315 active

Info Data Chat Leaderboard

Start

Nov 22, 24

Feb 16, 25

Reveal

Feb 17, 25

Koleshjr

Multimedia university of kenya

Team Central_Park: Second Solution

Platform · 11 Mar 2025, 10:34 · 13

Huge thanks to my teammates @nymfree and @DJOE . We could have not done this without the team work.

Our solution consists of four key stages:

1. Exploratory Data Analysis (EDA) & Data Preparation

We perform EDA on composite images to determine the optimal band combination for flood probability prediction. The best-performing band for this task was Moisture Stress.

2. Image Classification

Using Moisture Stress images, we train an image classifier (eva02_tiny_patch14_224) to predict the probability of flooding at each location. This feature significantly improves the overall model performance as a predictive feature on its own and later for the normalization stage

3. Modelling

We trained nine different models using a combination of flood probability, lagged precipitation values, rolling statistics, exponentially weighted moving averages (EWMA), and event time indicators. We used a 10 fold CV based on StratifiedGroupKFold for all the models and the cv scores for each are shown below

XGBoost - 0.002344928
LightGBM - 0.002392372
FastAI Tabular - 0.002656410
FastAI GatedConv - 0.002614245
FastAI 1DConv - 0.002568254
FastAI TabTransformer - 0.002845684
TabNet - 0.002566037
Wavenet-GRU - 0.002619232
ResNet1D - 0.002407528

4. Ensembling

We ensemble model predictions using Nelder-Mead optimization and apply flood probability normalization to improve generalization and predictive performance. The normalization DOES NOT USE THE LEAKED ROW ORDER as shown by @snow but instead we use the flood probability from stage 2 from the image classifier to perform normalization and this is a generalizable approach since the flood probability is a predictive feature.

Mean Average Ensemble - 0.002088235
Optimized Ensemble - 0.002072086

The above cv scores are without the normalization step btw! So you can see how ensembling a diverse range of models can lead to a really huge CV improvement !

Now the oof score after normalizing: - 0.00201654

Leaderboard Scores

Optimized public LB / private LB (without normalizing): 0.002125168 / 0.002322155
Optimized public LB / private LB (with normalizing): 0.002105851 / 0.002280238

GBDT Modelling

Basically all the features I used on all the gbdt models(xgboost and lightgbm) are shown below plus now add the flood probability from stage 2 and the precipitation from the original data. I then tuned the features of each using optuna.

Conclusion

Our success in this challenge came from a combination of:

Diverse modeling approaches (GBDT + deep learning)
Extensive feature engineering
Careful cross-validation strategy
Ensembling and flood probability normalization

While GBDT models provided robust predictions, deep learning models captured additional complexity, and ensembling them together led to a significant improvement.

This post covers my contributions using GBDT. My teammates will share their insights into deep learning approaches and other aspects of our pipeline. Feel free to ask any questions!

Github link:

koleshjr/Inundata-Mapping-Floods-in-South-Africa: Can you identify where and when an urban flash flood happened in South Africa?

Kindly star the REPO if you find it insightful as it encourages us to open source more of our work! And I hope it encourages more Winners (who we have seen in some recently concluded competions) who are not willing to share their winning solutions with the community to share as we are all here to learn.

Cheers😁

Discussion 13 answers

Ebiendele

Federal university of technology akure

Wow, this is impressive congratulations to your team🎉

11 Mar 2025, 10:38

Upvotes 0

Koleshjr

Multimedia university of kenya

Thank you!

replied to Ebiendele11 Mar 2025, 10:42

Upvotes 0

Sitwala

University of Pretoria

Congratulations guys, nicely done

11 Mar 2025, 10:39

Upvotes 0

Koleshjr

Multimedia university of kenya

Thank you!

replied to Sitwala11 Mar 2025, 10:43

Upvotes 0

RareGem

Congratulations. Thank you for sharing your solutions, this will help others to learn

11 Mar 2025, 10:42

Upvotes 0

Koleshjr

Multimedia university of kenya

You're welcome

replied to RareGem11 Mar 2025, 10:43

Upvotes 0

capybara_lover

Congratulations! I'm looking forward to use the image features that you extracted to measure the impact on my model

11 Mar 2025, 10:57

Upvotes 0

Koleshjr

Multimedia university of kenya

Nice! We have provided the notebooks to do that. Specifically the first stage and second stage notebooks

replied to capybara_lover11 Mar 2025, 11:38

Upvotes 1

nymfree

Convnets / tabtransformer modelling

Using @koleshjr's great feature engineering work, I developed 3 models based on convnets and transformers. Small and shallow networks worked best here. Additionally, it was important to leverage fastai's tried and tested training pipeline (data pre-processing and normalization methods being very important here).

1d CNN model

class TabularConvNet(nn.Module):
    def __init__(self, input_dim=743, num_classes=1):
        super(TabularConvNet, self).__init__()
        self.conv1 = nn.Conv1d(in_channels=1, out_channels=16, kernel_size=1, padding=1)
        self.bn1 = nn.BatchNorm1d(16)
        self.conv2 = nn.Conv1d(in_channels=16, out_channels=32, kernel_size=1, padding=1)
        self.bn2 = nn.BatchNorm1d(32)
        self.fc1 = nn.Linear(23904, 512)
        self.fc2 = nn.Linear(512, 64)
        self.fc3 = nn.Linear(64, num_classes)
        #self.dropout = nn.Dropout(0.8)

    def forward(self,_, x):
        x = x.unsqueeze(1)  # Add channel dimension
        x = F.relu(self.conv1(x))
        x = self.bn1(x)
        x = F.relu(self.conv2(x))
        x = self.bn2(x)
        x = x.view(x.size(0), -1)  # Flatten
        x = F.relu(self.fc1(x))
        #x = self.dropout(x)
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Gated conv model

class GatedConvBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=1):
        super(GatedConvBlock, self).__init__()

        # Main Convolution Path
        self.conv_x = nn.Conv1d(in_channels, out_channels, kernel_size, padding=kernel_size//2)

        # Gating Mechanism
        self.conv_gate = nn.Conv1d(in_channels, out_channels, kernel_size, padding=kernel_size//2)
        self.sigmoid = nn.Sigmoid()

        # Activation
        self.tanh = nn.Tanh()

    def forward(self, x):
        conv_out = self.tanh(self.conv_x(x))   # Main path
        gate_out = self.sigmoid(self.conv_gate(x))  # Gating path
        return conv_out * gate_out  # Element-wise multiplication

class GatedConvClassifier(nn.Module):
    def __init__(self, input_dim=743, hidden_dim=512, num_blocks=10):
        super(GatedConvClassifier, self).__init__()

        self.num_blocks = num_blocks
        self.hidden_dim = hidden_dim

        # First Gated Conv Block
        self.first_block = GatedConvBlock(input_dim, hidden_dim)

        # Additional Gated Conv Blocks
        self.blocks = nn.ModuleList([GatedConvBlock(hidden_dim, hidden_dim) for _ in range(num_blocks - 1)])

        # Global Average Pooling (GAP)
        self.global_avg_pool = nn.AdaptiveAvgPool1d(1)

        # Fully Connected Classifier
        self.fc1 = nn.Linear(hidden_dim, 64)
        self.fc2 = nn.Linear(64, 1)

        # Activation for Binary Classification
        self.sigmoid = nn.Sigmoid()

    def forward(self, _, x):
        x = x.unsqueeze(-1)  # Convert (B, F) -> (B, F, 1) for 1D conv
        x = self.first_block(x)  # First Gated Conv Block

        for block in self.blocks:
            x = block(x)  # Pass through multiple Gated Conv Blocks

        x = self.global_avg_pool(x)  # Global Average Pooling
        x = x.squeeze(-1)  # Convert back to (B, F)
        x = self.fc1(x)  # Fully connected
        x = self.fc2(x)  # Fully connected
        return x

Transformer model

class TabTransformer(nn.Module):
    def __init__(self, num_features=743, num_classes=1, dim_embedding=96, num_heads=4, num_layers=2):
        super(TabTransformer, self).__init__()
        self.embedding = nn.Linear(num_features, dim_embedding)
        encoder_layer = nn.TransformerEncoderLayer(d_model=dim_embedding, nhead=num_heads, batch_first=True)
        self.transformer = nn.TransformerEncoder(encoder_layer, num_layers=num_layers)
        self.classifier = nn.Linear(dim_embedding, num_classes)

    def forward(self, _, x):
        x = self.embedding(x)
        x = x.unsqueeze(1)  # Adding a sequence length dimension
        x = self.transformer(x)
        x = torch.mean(x, dim=1)  # Pooling
        x = self.classifier(x)
        return x