Xente Fraud Detection Challenge
$4,500 USD
Accurately classify the fraudulent transactions from Xente's e-commerce platform
1175 data scientists enrolled, 547 on the leaderboard
20 May 2019—23 September 2019
Ways to extract features from TransactionStartTime
published 3 Sep 2019, 12:00

Hey everyone, just joined , and this is my first dealing with time :D, i want to know if it is possible to use built-in functions in Pandas to easily extract time-based features? Like how many hours since last transaction by a user or how many transactions did a user do without frauding? Or do i have to make groupby's and do some for loops? Thanks

Hey Blenz!

Just joined too, I have not analyzed the data yet, but this is how it can be done using pandas:

df["hour_column"] = pd.DatetimeIndex( df['timestamp_column'] ).hour

Hey, thanks for the answer but i'm not talking about extracting the hour/minute/second from a timestamp , already did that via pd.to_datetime() then df.column.dt.hour. I'm referring to extracting features after grouping transactions by CustomerId. For example for a number of transactions N made by a user U, i want to know the difference in hours between transaction number N and transaction number N-1 etc.. and other features that are time-based

Hi. You could use the groupby option but you’d have to worry about the zero results(that is, first transactions in your grouping from 2 and above would result to a zero value) so you should be looking at making a feature out of that or use the new column as such.

@Blenz: the pandas groupby built-in function is best at this kind of task. You can also implement your own custom functions on a dataframe using the .apply() built-in function.

Thanks a lot for the input . But i won't be submitting to this competition as it is too late.