How to convert non-stationary data into stationary for ARIMA model with python
If you’re dealing with any time series data. Then you may have heard of ARIMA. It may be the model you are trying to use right now to forecast your data. To use ARIMA (so any other forecasting model) you need to use stationary data.
What is non-stationary data?
Non-stationary simply means that your data has seasonal and trends effects. Which change the mean and variance. Which will affect the forecasting of the model. As consistency is important when using models. If the data has trends or seasonal effects then the data is less consistent. Which will affect the accuracy of the model.
Example dataset
Here is an example of some time series data:
This is the number of air passengers each month. In the United Kingdom. You can fetch the data here. From 2009 to 2019.
As we can see the data has strong seasonality. As people start to go on their summer holidays. And a tiny bump in the winter. To avoid the cold. If you like to use a forecasting model, then you need to change this into stationary data.
Differencing
Differencing is a popular method used to get rid of seasonality and trends. This is done by subtracting the current observation with the previous observation.
Assuming you are using pandas:
df_diff = df.diff().diff(12).dropna()
This short line should do the job. Just make sure that your date is the index. If not you will get a few issues plotting the graph.
If you still want to keep your traditional index then simply create a new dataframe. Keeping the columns separated and shifting your numerical column.
diff_v2 = df['Passengers'].diff().diff(12).dropna()
time_series = df['TIME']
df_diff_v2 = pd.concat([time_series, diff_v2], axis=1).reset_index().dropna()
The concatenation produces NaN values. As the passengers series is shifted ahead compared to the time series. We use the dropna() function. To drop those rows.
df_diff_v2 = df_diff_v2.drop(columns=['index'])
ax = df_diff_v2.plot(x='TIME')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
This is here as we are dealing with large tick values. This is not need if your values are less than thousand.
Now your data can be used for your ARIMA model.
If you found this tutorial helpful, then. Check out the rest of the website. And sign up to my mailing list. To get more blog posts like this. And short essays relating to technology.
Resources:
https://www.quora.com/What-are-stationary-and-non-stationary-series
https://www.analyticsvidhya.com/blog/2018/09/non-stationary-time-series-python/
https://machinelearningmastery.com/remove-trends-seasonality-difference-transform-python/
https://towardsdatascience.com/hands-on-time-series-forecasting-with-python-d4cdcabf8aac