we combine 2000 data points from Train and Test sets
Not at all, the dataset is too small for a deep learning model to find useful patterns. I have tried LSTM and GRU, but I only manage to boost the test set accuracy to 0.37 after spending days optimizing the hyperparameter and structure of the model.