There are really some labelling in the dataset which are very weird. for example the two transactions : TransactionId_86851 and TransactionId_24271 occured almost in the same time same accountId yet TransactionId_24271 is labeled as fraudulent but not TransactionId_86851. and TransactionId_86851 has a bigger amount. this is just an example they are many other like this. Which conduct to the next question : Is the data labelling really correct because this lead to big problem in finding pattern
My understanding is that the transactions that are labelled as fraud are from account holders who called in and said that their account was hacked or something.
So maybe the account holder for transaction 24271 did not do the transaction and a hacker did so they called in saying that it is fraud and the account holder for transaction 86851 actually made the transaction.
This is possible, however the interval between the two transactions is to small, and many of theme are the same. in the othere hand this make detection of patterns harder.
this is a good sign to engineer another feature here