Hey everyone, just joined , and this is my first dealing with time :D, i want to know if it is possible to use built-in functions in Pandas to easily extract time-based features? Like how many hours since last transaction by a user or how many transactions did a user do without frauding? Or do i have to make groupby's and do some for loops? Thanks
Just joined too, I have not analyzed the data yet, but this is how it can be done using pandas:
df["hour_column"] = pd.DatetimeIndex( df['timestamp_column'] ).hour
Hey, thanks for the answer but i'm not talking about extracting the hour/minute/second from a timestamp , already did that via pd.to_datetime() then df.column.dt.hour. I'm referring to extracting features after grouping transactions by CustomerId. For example for a number of transactions N made by a user U, i want to know the difference in hours between transaction number N and transaction number N-1 etc.. and other features that are time-based
Hi. You could use the groupby option but you’d have to worry about the zero results(that is, first transactions in your grouping from 2 and above would result to a zero value) so you should be looking at making a feature out of that or use the new column as such.
@Blenz: the pandas groupby built-in function is best at this kind of task. You can also implement your own custom functions on a dataframe using the .apply() built-in function.
Thanks a lot for the input . But i won't be submitting to this competition as it is too late.