In general, when choosing the model used, simplicity and interpretability were top on the scale of preference. The solution was written in R. It uses an ensemble of auto-arima and the tbats model to predict the number of clicks a client’s ad receives, one and two weeks into the future.
1. ETL Process:
a. Extract: The data in the csv format was loaded into a dataframe for pre-processing
b. Transform: The data was aggregated on the 'ID' field and summed on the 'clicks' field. This was done so that the problem can be treated as a simple time series problem
c. Load: The transformed data was loaded into the models using a 'for' loop
2. Data Modelling:
a. The auto-ARIMA and the TBATS models were used.
b. The ARIMA (Auto-Regressive Integrated Moving Average) explains a given time series based on its own past values (autoregressive), the difference between past values (integrated), and a moving average model applied to the lagged observations. The model was chosen because of its simplicity and accuracy. However, the major challenge of using this model is choosing the right parameters which is where the auto-ARIMA comes in. The auto-ARIMA model selects the best parameter by iteratively exploring different combinations of parameters.
c. TBATS (Trigonometric, Box-Cox, ARMA, Trend, and Seasonal) is a sophisticated time series forecasting model designed to handle complex seasonal patterns and other common characteristics found in time series data. This model was used to capture complex seasonality and non-linear trends that are present in the data. It is a robust model but has a tendency to over-fit quickly. However, combining this with an auto-ARIMA limits this risk.
d. The final prediction is a mean of the results from both models
Thank you.
Superb @the_specialist. I still find it a mystery when people train models with R. Indeed the specialist. It would be great if you could share your notebook