Time Series Analysis; Applying ARIMA Forecasting Model to the U.S. Unemployment Rate Using Python

David Hasan
6 min readNov 9, 2020

In this thread, I’m going to apply the ARIMA forecasting model to the U.S. unemployment rate as time-series data. Also, I’ll bring the proper codes which I run the model using Python (IDE Jupyter Notebook). At the end of this thread, I put two YouTube videos for training purposes.

Step 1- Data preparation

First, we need to import the necessary dependencies to the Jupyter Notebook. the first group of libraries is needed for the data manipulation and the second set of libraries are needed for the model development.

Import dependencies for data manipulation
Dependencies to develop the ARIMA forecasting model

Data for the three cases are pulled out from the Economic Research Services of the Federal Reserve Bank of St. Louis fred.stlouisfed.org website. Data included seasonally adjusted unemployment rate for the United States, State of Oregon, and the State of Nevada. I put the links to the data at the bottom of the thread; however, you can download the data in a CSV format from the GitHub user content link here too.

Importing data to Jupyter Notebook

Step 2- Data Visualization

The following line graph displays the trends of the unemployment rate for the three time-series data, which consist of the U.S., Oregan, and Nevada. Also, I mentioned some top annotation over the line graph that may provide more information for readers. It’s just a sample to produce the line graph using Python programming, matplotlit package, and ggplot style.

Line graph for three unemployment rates in the U.S., State of Oregon and Nevada

Stationarity

Time-series data should be stationary. A stationary series means that the properties [means, variance, and covariance] do not change over time. Note that seasonality and trends are not stationary because they demonstrate the value of the time series at different times [e.g., the temperature in winter is always low]. Hypothetically, time-series data should be stationary to run the ARIMA forecasting model. Autocorrelation and order differencing to visualize the stationary status. As displayed in the below graph, the first Autocorrelation of the state of Nevada dataset is nonstationary data. This means that the state of the Nevada dataset changes over time, and it plays a key role to determine the criteria. Mean, variance and covariance are changing over time, therefore, this issue which called as stationarity attribute should be solved before employing data. Toward this end, the first and second-order of differencing was applied to display the stationarity situation among the dataset. First-order differencing defined as 6 rows down and second-order differencing 12 rows down. Both orders are stationary, but the second-order is more stationary than the first order.

Autocorrelation and two orders of differencing to test the stationarity

ARIMA Parameters

  • AR(p) Auto Regression: This component refers to using the past values as own lags in the linear regression model to the prediction
  • I(d) Integration: It uses differencing of observations through subtracting an observation from the previous step to make the time series stationary.
  • MA(q) Moving Average: It’s a model that uses the dependency between an observation and a residual error from a moving average model applied to the autoregression.

Basic model was run order of (p,d,q) = (1,1,1) for the U.S. unemployment rate data. As noted that the U.S. unemployment dataset was stationary and the first order yields a certain result. AIC was obtained approximately 776 in the first-order model.

The following left graph displays the KDE residuals. Kernel Density Estimation (KDE) is a non-parametric way to estimate the probability density function (PDF) of a random variable. This function uses Gaussian kernels and includes automatic bandwidth determination.

Residual Results for the KDE Test

Step 3- Running ARIMA model

Types of Forecasting

  1. Univariate Forecasting: in the method, the forecasting model is applied to the single time-series data set. In this thread, the stationary time-series data is a univariate forecasting model.
  2. Multivariate Forecasting [exogenous variables]: this sort of forecasting model relied on the multivariate dataset. In other words, each time-series data has dependencies on the other time-series data set such as forecasting the hourly weather based on temperature, pressure, wind speed, and wind direction.

The U.S. unemployment rate data set divided to train and test the dataset. The train data defined as 80% and the rest defined as 20% for the test split. The result demonstrated forecasting from 2012 to 2020. It means that regardless of the seasonal variation, the unemployment rate in the U.S. could be moved in forecasting endpoints.

Line Graph for the Forecasting, Actual, and Training Unemployment Dataset

The next model created in accord with the order (p,d,q) = (2,0,1). This model obtained as the most fitted model because the U.S. unemployment rate was stationary and it does not need the differencing attribute. In this graph, three lines are visualized in this graph which included Train and Actual data, and the Forecasting model. Running the AutoARIMA model yielded to obtain the most optimized result with the lowest AIC rate. AIC obtained 775.9 in the optimized model.

ARIMA Model Statistics

A seasonal ARIMA model is applied in the next running model. In the SARIMA Final Forecasting Model, the line graph predicted the future trend of the U.S. unemployment rate. However, the unemployment rate has placed in an unusual situation because of the COVID-19 pandemic issue, it will take time for the trends to return to the previous status. In other words, some businesses could adapt to new conditions, particularly in worker-based businesses. Regardless of the seasonal variation, the unemployment rate in the U.S. moving slightly down based on the final ARIMA forecasting model.

SARIMA Seasonal Final Forecasting Model for the U.S. Unemployment Rate

You can use the following tutorial videos on YouTube to find the Python codes. Date preparation is illustrated in video part 1, and running the ARIMA model using Python is illustrated in video part 2.

Video part (1) includes the data preparation and data wrangling using Python (Jupyter Notebook)
Video part (2) includes applying the ARIMA models using Python (Jupyter Notebook)

Resources:

Maklin, C. (2019). ARIMA Model Python Example — Time Series Forecasting. Retrieved from https://towardsdatascience.com/machine-learning-part-19-time-series-and-autoregressive-integrated-moving-average-model-arima-c1005347b0d7

Brownlee , J. (2017). How to Create an ARIMA Model for Time Series Forecasting in Python. Retrieved from https://machinelearningmastery.com/arima-for-time-series-forecasting-with-python/

ARIMA in Python — Time Series Forecasting Part 2 — Datamites Data Science Projects. DataMites YouTube Channel, Sep 28, 2018. https://www.youtube.com/watch?v=D9y6dcy0xK8&feature=emb_title

Prabhakaran, S. (2019). ARIMA Model — Complete Guide to Time Series Forecasting in Python. Retrieved from https://www.machinelearningplus.com/time-series/arima-model-time-series-forecasting-python/

Banerjee, P. (2020). ARIMA Model for Time Series Forecasting. Python notebook using data from Time Series Analysis Dataset, Published on Jul 20, 2020. Retrieved from https://www.kaggle.com/prashant111/arima-model-for-time-series-forecasting

--

--