The approach was simple. Tokenized using keras tokenizer at a character level. Used an Embedding layer with each amino acid being represented in 100 dimensions. This was followed by a 1D Convolution with 4096 filters and then followed by concatenation of global max pool and global avg pool. This was fed to a Dense layer for final classification. Had to use TPU since dataset size was huge. Will share the code soon.