I've been experimenting with models that take as input the image of a location and the time series of precipitation data for that event and try to predict, for each day independently, if a flooding event happened that day (so 730 outputs with sigmoid activations to predict a probability for each day).
However I seem to not be able to get a score lower than 0.0032 with this approach. When I visualize the predictions with my best model on the validation set, it almost never detects the exact day the flood happens, albeit some times it correctly detects if there was a flooding or not in the considered time series.
I suspect that there are too few examples in the training set to be able to accurately predict the day a flooding happened, which is the entire point of this competition, so would it maybe make more sense to make assumptions like: there is at most one flooding event in a given time series, and adapt the model accordingly?
Any thoughts and experience shared would be very welcome.
I've been asked more hints on how I use the images in my model. I made a custom neural network with two inputs. One is a convolutional image encoder that maps the (128, 128, 6) image input into an output vector, then this vector is repeated 730 times for each precipitation value (remember, we only have one image for each time series), and is concatenated to each value. Then a Bidirectional (because we are looking at the data post-hoc) LSTM layer with a sigmoid output for each day maps the output to a single day probability as I said before.
Hope this helps!
Are you using keras or pytorch ? And do you use any feature engineering. I've replicated the same neural net architecture with tensorflow but I can't reach 0.0032.
I used Keras, splitting the dataset so that both training and validation set have the same number of time series with a flooding event (47%).
By the way I tried removing the images from the model, training only on the time series and got pretty much the same results. The images seems to be useless for me