I joined the competition late but honestly I didn't know how to tackle this problem, I ended up with a combined and clean df and I'm wondering if I split the data into train and test I end up with a single label. Any insights are welcome, I just need some guidance and thanks!
If you split df based on Lapse=='?' and 'Lapse != '?' alone, you will end up with a single label in train. What you can do is split df based on year(NP2_EFFECTDATE). Anything below 2020 should be in your train. Then you can split based on Lapse='?' to get your test.
I really appreciate it, that was helpful!