I’ve gained an understanding of the topic I mentioned below through the book written by Marco Peixeiro on Time Series Forecasting with Python.
Understanding Time Series:
A time series is a chronological collection of measurements that are regularly spaced and are important in many different fields of study. In order to produce reliable future estimates, time series analysis entails identifying patterns, trends, and behaviors that have been ingrained in the data throughout time.
Decomposing Time Series:
Data is broken down into its most basic components using time series decomposition: trend, seasonality, and noise. Long-term movement is represented by the trend, recurrent patterns are identified by seasonality, and unexplained deviations are represented by noise. This breakdown improves understanding of data structures for more accurate analysis and predictions.
Forecasting Project Lifecycle:
The phases of a forecasting project lifecycle include gathering data and deploying models. Examining exploratory data, choosing a model (such as ARIMA or exponential smoothing), training the model using historical data, validating and testing it, deploying it, and continuing to monitor and maintain it are important processes. Iterative processes guarantee ongoing updates for precise and up-to-date projections.
Baseline Models:
Baseline models provide minimum forecasts that more complicated models should exceed, acting as standards for more advanced models. The seasonal baseline, naive baseline, and mean or average baseline are a few examples. They create performance benchmarks and assess the importance of the advances provided by more sophisticated models.
Random Walk Model:
A strong foundation for time series forecasting is the random walk model. It asserts that the most recent observation alone determines future values, accounting for short-term swings with random error. Its persistence, ability to capture noise, and usefulness as a starting point for evaluating the performance of sophisticated models are among its features. Underlying trends in the data may be difficult to identify if a sophisticated model is unable to outperform the random walk.
Using the “Economic Indicators” dataset from Analyse Boston, let’s attempt to determine the baseline (mean) for the total number of international flights at Logan Airport.
The baseline model is based on a computation of the historical average of international flights at Logan Airport. For this basic metric, the mean of international flights is computed for every time unit, like months or years, based on the temporal granularity of the dataset. The computed historical average serves as the foundation for predicting future international flights.
The alignment of the blue and red lines in the visualisation, provides a quick indicator of how well the baseline model is performing. The assumption here is that future value will reflect the historical average.