The standard ARIMA (moving average autoregressive model) model allows prediction only based on the past values of prediction variables. The model assumes that the future value of a variable depends linearly on its past value and the value of past (random) influence. Arimax model is an extended version of ARIMA model. It also includes other independent (predictive) variables. The model is also called vector Arima or dynamic regression model.
Arimax model is similar to multivariable regression model, but allows the use of autocorrelation in regression residuals to improve the accuracy of prediction.
This exercise provides an exercise for arimax model prediction. The statistical significance of the regression coefficient was also examined.
These exercises used ice cream consumption data. The dataset contains the following variables.
- Ice cream consumption in the United States (per capita)
- Average household income per week
- The price of ice cream
- Average temperature.
The number of observation data is 30. They correspond to four weeks in the period from March 18, 1951 to July 11, 1953.
Load the dataset and plot the variables cons (ice cream consumption), temp (temperature), and revenue.
ggplot(df, aes(x = X, y = income)) + Ylab ("revenue") + Xlab ("time") + grid.arrange(p1, p2, p3, ncol=1, nrow=3)
ARIMA model was used to estimate ice cream consumption data. Then the model is transmitted to the prediction function as an input to obtain the prediction data for the next six periods.
fcast_cons <- forecast(fit_cons, h = 6)
Draw the prediction diagram.
Find out the mean absolute error (mase) of the fitted ARIMA model.
An extended ARIMA model is estimated for consumption data, taking the temperature variable as an additional regression factor (using the auto. Arima function). Then predict the next 6 periods (note that this prediction needs to assume the expected temperature; it is assumed that the temperature in the next 6 periods will be represented by the following vector:
fcast_temp <- c(70.5, 66, 60.5, 45.5, 36, 28)）
Draw the obtained prediction diagram.
Output the obtained forecast summary. Find out the coefficient of the temperature variable, its standard error, and the predicted mase. Compare the mase with the initial predicted mase.
The coefficient of the temperature variable is 0.0028
The standard error of this coefficient is 0.0007
The average absolute proportional error is 0.7354048, which is less than the error of the initial model (0.8200619).
Check the statistical significance of temperature variable coefficient. Is the coefficient statistically significant at the 5% level?
More additional regression factors can be input into the function of ARIMA model, but they can only be input in the form of matrix. Create a matrix with the following columns.
The value of the temperature variable.
The value of the income variable.
The value of the income variable with a lag of one period.
The value of the income variable lagging two periods.
Output the matrix.
Note: the last three columns can be created by adding two NAS to the vector of the value of the income variable, and the resulting vector is used as the input of the embedding function (the dimension parameter is equal to the number of columns to be created).
vars <- cbind(temp, income) print(vars)
The obtained matrix was used to fit the three extended ARIMA models, and the following variables were used as additional regression factors.
The lag periods of temperature and income are 0 and 1.
Temperature, income with lag periods of 0, 1 and 2.
Check the summary of each model and find the model with the lowest information criterion (AIC) value.
Note that AIC cannot be used to compare ARIMA models with different orders because the number of observations is different. For example, the AIC value of non differential model ARIMA (P, 0, q) cannot be compared with the corresponding value of differential model ARIMA (P, 1, q).
auto.arima(cons, xreg = var) print(fit0$aic)
AIC can be used because the parameter order of each model is the same (0).
The model with the lowest AIC value is the first model.
Its AIC is equal to – 113.3.
Use the model found in the previous exercise to predict the next 6 periods and draw the prediction diagram. The prediction requires a matrix of expected temperature and income in the next six periods; Create a matrix using the temp variable and the following expected income values: 91, 91, 93, 96, 96, 96.
Find out the average absolute proportional error of the model and compare it with the errors of the first two models in this exercise set.
The model with two external regression factors has the lowest Mean absolute proportional error (0.528)
Most popular insights