The secret to 0.68 using ML was high compute power honestly 😅😂😂...lags range (1, 175) and rolling means [30, 60, 120, 240, 480, 960] on all the data....more than 175 lags didn't improve score that much and that's why there was a plateau....Happy to hear guys using 1d CNN network, That's new and I hope that @HungryLearner you'll post your solution for learning
The task can be easily seen as a signal segmentation. In the same way, ECG signal segmentation can be potentially solved with 1D Networks, like UNET or LSTM.
The only tweak is that this particular dataset has a block-based class arrangement, hence, the use of high stride is fundamental to summarize the signal quickly.
BTW:
Indeed I understand the creativity needed in ML solutions will be outstanding while the computing power needed will be so high due to the amount of time-based (line-based) data required to be processed. The discussion board shows some evidence of this computing challenge while the otherwise sparse leaderboard says a lot.
You can see all the columns of the CSV files except the Target columns as your input channels. With this in mind, you have a 1D signal but with 3 features (or 4 if time is included).
The 78 examples become 78 different cases to work with. Just as the way you work with 78 images.
However, due to the varied length of each signal, you may use a batch of 1 to avoid error from your dataloader or need to hard code the collate function.
Writing a simple 1D CNN UNet can show you a surprisingly fast training and high score.
Hint: To improve performance, you may consider
1. Increase the number of stages and/or the number of layers per stage (blocks). Standard UNet uses 4 stages, you can increase this.
2. Increase the stride to quickly capture distant relationships between points.
3. Increase the kernel size to potentially increase the window of application of each convolution function.
Very fascinating approach @HungryLearner, thanks for sharing and congrats. I strongly thought the only secret for this task was seq2seq lstm's/Rnns.
I created a pytorch RNN model with the 4 features as input; I didn't like having a batch size of 1, so worked around varied sequence length by 'zero' padding and inserting an extra class label there; played around with batch sizes/seq_length(used 5000)/hidden_size(64)/num_layers(3); and precomputed sequences into easily indexed numpy arrays (shape 5000x4 with corresponding 5000 labels) so the dataloader works faster.
Along with some data wrangling, those were the main challenges I faced in getting my model working. I haven't decided whether I'm gonna share my code in discussions yet but I will share a tensorflow guide I found useful during my learnings a little while back: https://www.tensorflow.org/tutorials/structured_data/time_series
I need to stop being fixated on just gbdt on tabular data. I need to get out of my comfort zone and start learning about nn techniques for dealing with tabular data. None of these ideas came to my mind honestly. Thank you guys for sharing. Back to learning
Smiles....Yeahh.. @Koleshjr Papers with codes. Kaggle kernels etc. That's another secret to this amazing challenge. 👍. A particular publicly shared kaggle challenge winning notebook with Unet_1D model architecchture (structure) and with further editing from me did the magic for me.
A particular publicly shared kaggle challenge winning notebook with Unet_1D model architecchture (structure) and with further editing from me did the magic for me. Though i was unable to ensemble my different model notebooks after using different parameters, may be that shoud have improved my overall score further.
Thanks for sharing and Congratulations for such an amazing score. Just a question though, when do you guys decide to use NN for tabular data? Is it trial and error, preference or there is actually tips on the data that can help you think the NN way? @MICADEE@HungryLearner@DanielBruintjies
There are several published work on the use of NN for tabular data. But as you've mentioned their applicability has to do with the data and task need.
For this challenge, I saw the task as a segmentation one. In fact, there is a statement on the data/overview that the classes are repeated for some time.
"The adjacent data points have the same time interval, and each scenario in the data lasts for several consecutive time points. Within a single time point of data, only one scenario can occur, but the same scenario may occur in different time periods within one ..."
This clearly means to me that we are facing a time window segmentation task. Following the success of various CNN network in 2D image segmentation, I can easily translate the task to the need for a 1D variant of 2D segmentation architecture like UNET.
So, in essence, the decision for me was due to the preliminary elucidation of the data and the task ahead. It's never a trial and error decision. I don't even tried a single ML approach in this challenge.
Wow! I probably wouldn't have gone this route even if I had 2 more weeks. It's nice to see a totally different approach to this problem. Congratulations to you @HungryLearner, there's still so much to learn. I get stunned everyday.
albeit a bit late, I'd like to share what I think were the main steps in my solution to getting a good score in this challenge. I chose the same general approach as @HungryLearner and @MICADEE, a 1D UNET, and do not have much to add as to why it is a good choice. However, I didn't reach my best score right away, there were some additional details that reflected in higher scores as I went along. So let me share some of my submissions' scores and what I did/changed along the way to reach them.
- Score of 0.631: I started of with a 1D UNET that tried to incorporate some Transformer-like Attention layers at the highest stage of aggregation within the UNET. I did this to test whether attention can help with this challenge, because I thought it would be essential to capture the interaction between different time intervals that are far apart and Transformer-Attention is known to be very good at this usually. However, it did not work out for me. I still think this approach might work but the scaling turned out to work way better with a pure CNN UNET in my case.
- Score of 0.714: As mentioned above I dropped the attention part and started to use a pure 1D UNET; after doing some cross validation to improve the hyperparameters (kernel size, number of channels and layers etc.) I got a score above 0.7. Let me mention here that I agree with others that said this challenge required a lot of computing power. In order to do the necessary cross validation runs my kaggle notebooks were maxed out for almost a full week.
- Score of 0.748: The next jump came after some further tweaking of parameters and - in particular - after adding some hand engineered features. I calculated some statistical time series features using a sliding window of fixed time interval length. Some of them turned out to work well, others did not. Including the good features seemed to give me another boost in performance.
- Score of 0.776: When looking at the output of my model I noticed that my predictions were more accurate when the corresponding target was close to the center of the subsequences I used as input for my model. Using only an input length of 2500 this does make sense as the model has much more information about the surroudings of the time points in the middle of the sequence than it does for those close to the ends. This is were I have to applaud @HungryLearner for the idea of using a batch size of 1 and inputting the full time series as a whole, as this probably avoids this problem of differing model performance based on the relative position within the input (sub-)sequence. While I thought about increasing the input length as well, I opted for a different approach were I let my model predict each value several times such that each value would have a prediction corresponding to it being in the center but also to it being at the edges of the input. I then fit a simple logistic regression to aggregate these predictions into one single model prediction. This solution to performance drop off at the edges of the prediction window helped a lot with further improving.
- Score of 0.791: The last boost in performance came from ensembling four of my best performing models and from using the full training set. I had not used the whole training set for most of my submissions in order to be sure they were performing well on a hold-out validation set. However, after examining by CV what a good number of epochs to train would be I started training on the full set and along with ensembling it improved my score again for the last time.
That's a short summary of what I did along the way during this competition. I hope everyone enjoyed the competition and congratulations to everyone on the leaderbord and good luck in your upcoming challenges!
Kolesh, what was the secret to 0.68 😅. I really struggled in this
The secret to 0.68 using ML was high compute power honestly 😅😂😂...lags range (1, 175) and rolling means [30, 60, 120, 240, 480, 960] on all the data....more than 175 lags didn't improve score that much and that's why there was a plateau....Happy to hear guys using 1d CNN network, That's new and I hope that @HungryLearner you'll post your solution for learning
Ah! That's quite some features! sounds like you've been cooking up quite the recipe for success! And hitting that magic 0.68 score is no small feat.
The task can be easily seen as a signal segmentation. In the same way, ECG signal segmentation can be potentially solved with 1D Networks, like UNET or LSTM.
The only tweak is that this particular dataset has a block-based class arrangement, hence, the use of high stride is fundamental to summarize the signal quickly.
BTW:
Indeed I understand the creativity needed in ML solutions will be outstanding while the computing power needed will be so high due to the amount of time-based (line-based) data required to be processed. The discussion board shows some evidence of this computing challenge while the otherwise sparse leaderboard says a lot.
Wow. I will try learn that
The only secret to fast training and potential high scores in this competition is to use a 1D CNN network for segmentation.
Using 1D UNet and a batch size of 1 to work on each CSV per batch is the only secret I know for this competition.
I respect anyone able to use ML for this competition and still get above 0.6. Waiting to see such a solution approach as events unfold.
Happy Hacking !!!
So you had 78 training examples? How did you organise feature -, target variables?
You can see all the columns of the CSV files except the Target columns as your input channels. With this in mind, you have a 1D signal but with 3 features (or 4 if time is included).
The 78 examples become 78 different cases to work with. Just as the way you work with 78 images.
However, due to the varied length of each signal, you may use a batch of 1 to avoid error from your dataloader or need to hard code the collate function.
Writing a simple 1D CNN UNet can show you a surprisingly fast training and high score.
Hint: To improve performance, you may consider
1. Increase the number of stages and/or the number of layers per stage (blocks). Standard UNet uses 4 stages, you can increase this.
2. Increase the stride to quickly capture distant relationships between points.
3. Increase the kernel size to potentially increase the window of application of each convolution function.
Happy Hacking!!!
Useful resources to learn about this? papers with code, kaggle kernels e.t.c???
Very fascinating approach @HungryLearner, thanks for sharing and congrats. I strongly thought the only secret for this task was seq2seq lstm's/Rnns.
I created a pytorch RNN model with the 4 features as input; I didn't like having a batch size of 1, so worked around varied sequence length by 'zero' padding and inserting an extra class label there; played around with batch sizes/seq_length(used 5000)/hidden_size(64)/num_layers(3); and precomputed sequences into easily indexed numpy arrays (shape 5000x4 with corresponding 5000 labels) so the dataloader works faster.
Along with some data wrangling, those were the main challenges I faced in getting my model working. I haven't decided whether I'm gonna share my code in discussions yet but I will share a tensorflow guide I found useful during my learnings a little while back: https://www.tensorflow.org/tutorials/structured_data/time_series
Keen to see others approaches as well!
@HungryLearner amazing approach! Would be very nice if you could post your notebook. Congrats!
I need to stop being fixated on just gbdt on tabular data. I need to get out of my comfort zone and start learning about nn techniques for dealing with tabular data. None of these ideas came to my mind honestly. Thank you guys for sharing. Back to learning
Wow bro amazing approach,it will be very helpful if u provide the full notebook for that Ure approach
Smiles....Yeahh.. @Koleshjr Papers with codes. Kaggle kernels etc. That's another secret to this amazing challenge. 👍. A particular publicly shared kaggle challenge winning notebook with Unet_1D model architecchture (structure) and with further editing from me did the magic for me.
@MICADEE did you also use a 1d cnn?
A particular publicly shared kaggle challenge winning notebook with Unet_1D model architecchture (structure) and with further editing from me did the magic for me. Though i was unable to ensemble my different model notebooks after using different parameters, may be that shoud have improved my overall score further.
Thanks for sharing and Congratulations for such an amazing score. Just a question though, when do you guys decide to use NN for tabular data? Is it trial and error, preference or there is actually tips on the data that can help you think the NN way? @MICADEE @HungryLearner @DanielBruintjies
There are several published work on the use of NN for tabular data. But as you've mentioned their applicability has to do with the data and task need.
For this challenge, I saw the task as a segmentation one. In fact, there is a statement on the data/overview that the classes are repeated for some time.
"The adjacent data points have the same time interval, and each scenario in the data lasts for several consecutive time points. Within a single time point of data, only one scenario can occur, but the same scenario may occur in different time periods within one ..."
This clearly means to me that we are facing a time window segmentation task. Following the success of various CNN network in 2D image segmentation, I can easily translate the task to the need for a 1D variant of 2D segmentation architecture like UNET.
So, in essence, the decision for me was due to the preliminary elucidation of the data and the task ahead. It's never a trial and error decision. I don't even tried a single ML approach in this challenge.
"But as you've mentioned their applicability has to do with the data and task need." Gotcha.... I appreciate
@MICADEE could you kindly share the Kaggle link to the ID unet?
Keywords to use to search: 1d unet , 1d segmentation,
Kaggle:
https://www.kaggle.com/code/akashsuper2000/pytorch-u-net-model
https://www.kaggle.com/code/super13579/u-net-1d-cnn-with-pytorch
Github:
https://github.com/jjongjjong/ECG_segmentation_1DUnet/blob/main/notebooks/ECG_segmentation.ipynb
You can even find other 1d segmentation codes like 1d ViT , 1D swin transformer on github
Thanks @HungryLearner
@Koleshjr Sorry for delay in response. Have been on and off these days. Here is my Kaggle link to the edited 1D Unet Model architechture.
Model Architechture (structure) Source:
Cheers !!!
Wow! I probably wouldn't have gone this route even if I had 2 more weeks. It's nice to see a totally different approach to this problem. Congratulations to you @HungryLearner, there's still so much to learn. I get stunned everyday.
Hello everyone,
albeit a bit late, I'd like to share what I think were the main steps in my solution to getting a good score in this challenge. I chose the same general approach as @HungryLearner and @MICADEE, a 1D UNET, and do not have much to add as to why it is a good choice. However, I didn't reach my best score right away, there were some additional details that reflected in higher scores as I went along. So let me share some of my submissions' scores and what I did/changed along the way to reach them.
- Score of 0.631: I started of with a 1D UNET that tried to incorporate some Transformer-like Attention layers at the highest stage of aggregation within the UNET. I did this to test whether attention can help with this challenge, because I thought it would be essential to capture the interaction between different time intervals that are far apart and Transformer-Attention is known to be very good at this usually. However, it did not work out for me. I still think this approach might work but the scaling turned out to work way better with a pure CNN UNET in my case.
- Score of 0.714: As mentioned above I dropped the attention part and started to use a pure 1D UNET; after doing some cross validation to improve the hyperparameters (kernel size, number of channels and layers etc.) I got a score above 0.7. Let me mention here that I agree with others that said this challenge required a lot of computing power. In order to do the necessary cross validation runs my kaggle notebooks were maxed out for almost a full week.
- Score of 0.748: The next jump came after some further tweaking of parameters and - in particular - after adding some hand engineered features. I calculated some statistical time series features using a sliding window of fixed time interval length. Some of them turned out to work well, others did not. Including the good features seemed to give me another boost in performance.
- Score of 0.776: When looking at the output of my model I noticed that my predictions were more accurate when the corresponding target was close to the center of the subsequences I used as input for my model. Using only an input length of 2500 this does make sense as the model has much more information about the surroudings of the time points in the middle of the sequence than it does for those close to the ends. This is were I have to applaud @HungryLearner for the idea of using a batch size of 1 and inputting the full time series as a whole, as this probably avoids this problem of differing model performance based on the relative position within the input (sub-)sequence. While I thought about increasing the input length as well, I opted for a different approach were I let my model predict each value several times such that each value would have a prediction corresponding to it being in the center but also to it being at the edges of the input. I then fit a simple logistic regression to aggregate these predictions into one single model prediction. This solution to performance drop off at the edges of the prediction window helped a lot with further improving.
- Score of 0.791: The last boost in performance came from ensembling four of my best performing models and from using the full training set. I had not used the whole training set for most of my submissions in order to be sure they were performing well on a hold-out validation set. However, after examining by CV what a good number of epochs to train would be I started training on the full set and along with ensembling it improved my score again for the last time.
That's a short summary of what I did along the way during this competition. I hope everyone enjoyed the competition and congratulations to everyone on the leaderbord and good luck in your upcoming challenges!
Thank's for such detailed explanation and congrats on winning. Truly deserved.