Residual analysis stands as a pivotal stage in time series modeling, serving to assess the model’s goodness of fit and ensure the satisfaction of underlying assumptions. Residuals, representing the differences between predicted and observed values, undergo careful examination in the following steps:
- Compute Residuals:
Calculate residuals by subtracting predicted values from observed values. - Plot Residuals:
Visual inspection of residuals over time reveals trends, patterns, or seasonality. Ideally, well-fitted model residuals appear random and centered around zero. - Autocorrelation Function (ACF) of Residuals:
Plotting the ACF of residuals helps identify any lingering autocorrelation. Significant spikes in the ACF plot suggest unaccounted temporal dependencies.
Significance of Normality:
The normality assumption is foundational for statistical techniques like confidence interval estimation and hypothesis testing. Deviations from normality can lead to biased estimations and inaccurate conclusions. Time series models, including ARIMA and SARIMA, often assume residual normality, and if not met, the model may fail to accurately capture data patterns.
Implications of Deviations from Normality:
- Validity of Confidence Intervals:
Constructing valid confidence intervals relies on the normality assumption. Non-normally distributed residuals may compromise the reliability of these intervals, leading to inaccurate uncertainty assessments. - Outliers and Skewness:
Histogram deviations from normality may signal outliers or residual skewness. Identifying and addressing these issues is crucial for enhancing overall model performance.
In essence, ensuring normality in residuals is fundamental for robust time series modeling, aligning with the foundational assumptions of various statistical techniques. Violations of this assumption warrant attention to maintain the model’s accuracy and reliability.
On running Residual analysis over Analyze Boston we get:
Residual Over Time:
This plot shows differences between observed and predicted values over the course of the prediction period, illuminating the behaviour of the model residuals. It is essential to analyse residuals over time in order to evaluate the model’s performance and spot any possible systematic trends or oversights. Important factors to think about are:
Random Patterns: The residuals should ideally show randomness in the absence of recurring patterns, demonstrating how well the model captured the underlying data structures.
Centred around Zero: The residuals should be centred around zero; any discernible drift points to potential bias or incompleteness in the model.
Heteroscedasticity: A consistent variability in residuals may be a sign of heteroscedasticity, a signal that the model does not sufficiently account for the inherent variability in the data.
Outliers: Finding extreme values or outliers in the residuals can help identify data points or events that the model missed.
The lack of consistent trends indicates that the variance in the ‘logan_intl_flights’ data has been effectively accounted for. Accurate model predictions are indicated by residuals that are mainly centred around the mean; random noise is usually responsible for any deviations. The models’ ability to consistently manage variability is implied by the lack of heteroscedasticity, which strengthens their dependability over time.
ACF Residual Analysis:
The residuals’ Autocorrelation Function (ACF), which illustrates the relationship between different lags, helps with the evaluation of residual temporal structures after model fitting.
Among the interpretations are:
- No Notable Increases:
The ACF of residuals indicates independence and shows that the model has successfully captured temporal dependencies if it rapidly decays to zero without noticeable spikes. - Significant Spikes:
The existence of notable spikes at particular lags raises the possibility of residual patterns or autocorrelation, which calls for further research into alternative model structures or refinements.
The fact that our ACF shows no notable spikes suggests that the temporal dependencies in the data have been successfully eliminated by the model.
In next blog we will learn about which statistical test is commonly utilized to assess the existence of significant autocorrelations in a time series at different lags.