ARMAX models

ARMAX models#

This tutorial demonstrates ARX and ARMAX models described in the previous page.

We want to see if the preditions of an ordinary linear regression model are improved by expanding it with an AR(p) model. This is what we call the ARX(p) formulation:

(38)#\[y_t = \sum_{i=1}^p \varphi_i y_{t-i} + \sum_{k=0}^r \beta_k x_{t-k} + \varepsilon_t\]

Autoregressive models (AR) are based on the idea that the current value of the series, \(y_t\), can be explained as a function of \(p\) past values, \(y_{t−1}\), \(y_{t−2}\), …, \(y_{t−p}\). They are essentially linear regression models which take lagged copies of the dependent variable as explanatory variables. Other variables can be added, leading to the so-called ARX model (AR model with eXplanatory variables).

This example has one explanatory variable \(x\) which lagged values may also be used up to a certain lag \(r\). Further expanding this model with a MA(q) term yields the ARMAX(p,q) fomulation:

(39)#\[y_t = \sum_{i=1}^p \varphi_i y_{t-i} + \sum_{j=1}^q \theta_j \varepsilon_{t-j} + \sum_{k=0}^r \beta_k x_{t-k} + \varepsilon_t\]

The target of this exercise is to find whether the ARMAX expansion improves the prediction metrics of an ordinary linear model.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

import arviz as az
from cmdstanpy import CmdStanModel

Data#

The data comprises the heating power, indoor and outdoor air temperatures of a single-family house, with a 2 hour time step between measurements.

df = pd.read_csv('data/armaxdata.csv')
df.head()

	datetime	Ti	Te	P
0	2023-04-07 20:00:00	21.017708	8.008333	1127.547691
1	2023-04-07 22:00:00	21.022083	6.254167	1095.384059
2	2023-04-08 00:00:00	21.023750	5.258333	1073.813890
3	2023-04-08 02:00:00	21.010625	3.381250	1101.343786
4	2023-04-08 04:00:00	21.009167	3.118750	1131.881637

# A plot of the dependent variable vs. each of the explanatory variables
fig, axes = plt.subplots(1, 2, figsize=(10, 4))
axes[0].plot(pd.to_datetime(df['datetime']), df['P'])
axes[1].scatter(df['Ti']-df['Te'], df['P'])
axes[0].set_ylabel("Heating power (W)")
axes[0].set_xlabel("Time")
axes[1].set_xlabel("Ti-Te (C)")

Text(0.5, 0, 'Ti-Te (C)')

_images/6cfe293948468c104f5ee7bcae960abf92ee0b28ded9b17fa4b0c01c6dd83559.png

We may expect a linear relationship between the heating power \(y\) and the indoor-outdoor temperature difference \(x=T_i-T_e\). However, because of the building’s thermal inertia exceeding the time step size, it is likely that each current value \(y_t\) depends on previous instances \(y_{t-1}\) and \(x_{t-1}\).

Let us use about two thirds of the dataset for model training, and the last third to watch the energy use forecasted by fitted models:

N = len(df)
N_train = round(2/3*N)
X = df['Ti']-df['Te']
y = df.loc[:N_train-1,'P']

Linear regression#

We have Stan models of an ordinary linear regression model, and of ARX and ARMAX model of any orders (p,q,r):

linreg = CmdStanModel(stan_file='models/linearregression.stan')
arx = CmdStanModel(stan_file='models/arx.stan')
armax = CmdStanModel(stan_file='models/armax.stan')

Let us start by fitting an ordinary linear regression model which we will use later for model comparison reference:

# Data declaration for the linear regression model
linreg_data = {
    "N": N_train,
    "K": 1,
    "x": X.values[:N_train, np.newaxis],
    "y": y.values,
}

# Fitting the LR model
fit_linreg = linreg.sample(data=linreg_data, show_progress=False)

20:45 - cmdstanpy - INFO - CmdStan start processing
20:45 - cmdstanpy - INFO - Chain [1] start processing
20:45 - cmdstanpy - INFO - Chain [2] start processing
20:45 - cmdstanpy - INFO - Chain [3] start processing
20:45 - cmdstanpy - INFO - Chain [4] start processing

20:46 - cmdstanpy - INFO - Chain [1] done processing
20:46 - cmdstanpy - INFO - Chain [4] done processing
20:46 - cmdstanpy - INFO - Chain [2] done processing
20:46 - cmdstanpy - INFO - Chain [3] done processing

Now, the choice of ARMAX orders (p,q,r) is not trivial. There is no universal rule telling us how many previous time steps will make the energy forecasts more robust. The general recommendation is to fit several models of increasing complexity, until out model comparison metrics tell us to stop. Let us fit a few models, which is fast once the initial model has been compiled.

arx1_data = {
    "N_train": N_train,
    "N": N,
    "K": 1,
    "P": 1,
    "R": 1,
    "x": X.values[:, np.newaxis] ,
    "y": y.values,
}

fit_arx1 = arx.sample(data=arx1_data, show_progress=False)

arx2_data = {
    "N_train": N_train,
    "N": N,
    "K": 1,
    "P": 2,
    "R": 1,
    "x": X.values[:, np.newaxis] ,
    "y": y.values,
}

fit_arx2 = arx.sample(data=arx2_data, show_progress=False)

arx3_data = {
    "N_train": N_train,
    "N": N,
    "K": 1,
    "P": 2,
    "R": 2,
    "x": X.values[:, np.newaxis] ,
    "y": y.values,
}

fit_arx3 = arx.sample(data=arx3_data, show_progress=False)

20:47 - cmdstanpy - INFO - CmdStan start processing
20:47 - cmdstanpy - INFO - Chain [1] start processing
20:47 - cmdstanpy - INFO - Chain [2] start processing
20:47 - cmdstanpy - INFO - Chain [3] start processing
20:47 - cmdstanpy - INFO - Chain [4] start processing
20:55 - cmdstanpy - INFO - Chain [2] done processing
20:57 - cmdstanpy - INFO - Chain [4] done processing
20:57 - cmdstanpy - INFO - Chain [1] done processing
20:57 - cmdstanpy - INFO - Chain [3] done processing
20:58 - cmdstanpy - INFO - CmdStan start processing
20:58 - cmdstanpy - INFO - Chain [1] start processing
20:58 - cmdstanpy - INFO - Chain [2] start processing
20:58 - cmdstanpy - INFO - Chain [3] start processing
20:58 - cmdstanpy - INFO - Chain [4] start processing
21:22 - cmdstanpy - INFO - Chain [1] done processing
21:24 - cmdstanpy - INFO - Chain [2] done processing
21:25 - cmdstanpy - INFO - Chain [3] done processing
21:28 - cmdstanpy - INFO - Chain [4] done processing
21:28 - cmdstanpy - WARNING - Some chains may have failed to converge.
	Chain 4 had 1 divergent transitions (0.1%)
	Use the "diagnose()" method on the CmdStanMCMC object to see further information.
21:28 - cmdstanpy - INFO - CmdStan start processing
21:28 - cmdstanpy - INFO - Chain [1] start processing
21:28 - cmdstanpy - INFO - Chain [2] start processing
21:28 - cmdstanpy - INFO - Chain [3] start processing
21:28 - cmdstanpy - INFO - Chain [4] start processing
21:53 - cmdstanpy - INFO - Chain [1] done processing
21:55 - cmdstanpy - INFO - Chain [4] done processing
21:59 - cmdstanpy - INFO - Chain [2] done processing
21:59 - cmdstanpy - INFO - Chain [3] done processing

Let us add an ARMAX model to the pool:

armax_data = {
    "N_train": N_train,
    "N": N,
    "K": 1,
    "P": 1,
    "Q": 1,
    "R": 1,
    "x": X.values[:, np.newaxis] ,
    "y": y.values,
}

fit_armax = armax.sample(data=armax_data, show_progress=False)

21:59 - cmdstanpy - INFO - CmdStan start processing
21:59 - cmdstanpy - INFO - Chain [1] start processing
21:59 - cmdstanpy - INFO - Chain [2] start processing
21:59 - cmdstanpy - INFO - Chain [3] start processing
21:59 - cmdstanpy - INFO - Chain [4] start processing

06:22:10 - cmdstanpy - INFO - Chain [4] done processing
06:22:11 - cmdstanpy - INFO - Chain [2] done processing
06:22:13 - cmdstanpy - INFO - Chain [1] done processing
06:22:13 - cmdstanpy - INFO - Chain [3] done processing
06:22:13 - cmdstanpy - WARNING - Some chains may have failed to converge.
	Chain 1 had 40 divergent transitions (4.0%)
	Chain 2 had 20 divergent transitions (2.0%)
	Chain 3 had 29 divergent transitions (2.9%)
	Chain 4 had 37 divergent transitions (3.7%)
	Use the "diagnose()" method on the CmdStanMCMC object to see further information.

We now compare all these models in order to only select one for posterior analysis.

idata_linreg = az.from_cmdstanpy(
    posterior = fit_linreg,
    posterior_predictive="y_hat",
    log_likelihood="log_lik",
)
idata_arx1 = az.from_cmdstanpy(
    posterior = fit_arx1,
    posterior_predictive="y_hat",
    log_likelihood="log_lik",
)
idata_arx2 = az.from_cmdstanpy(
    posterior = fit_arx2,
    posterior_predictive="y_hat",
    log_likelihood="log_lik",
)
idata_arx3 = az.from_cmdstanpy(
    posterior = fit_arx3,
    posterior_predictive="y_hat",
    log_likelihood="log_lik",
)
idata_armax = az.from_cmdstanpy(
    posterior = fit_armax,
    posterior_predictive="y_hat",
    log_likelihood="log_lik",
)

df_comp_loo = az.compare({"linreg": idata_linreg, "arx1": idata_arx1, "arx2": idata_arx2, "arx3": idata_arx3, "arxmax": idata_armax}, ic="loo")
df_comp_loo

arviz - WARNING - Array contains NaN-value.
/home/simon/anaconda3/envs/poem/lib/python3.13/site-packages/arviz/stats/stats.py:797: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.70 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
  warnings.warn(
/home/simon/anaconda3/envs/poem/lib/python3.13/site-packages/arviz/stats/stats.py:797: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.70 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
  warnings.warn(
/home/simon/anaconda3/envs/poem/lib/python3.13/site-packages/arviz/stats/stats.py:797: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.70 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
  warnings.warn(
/home/simon/anaconda3/envs/poem/lib/python3.13/site-packages/arviz/stats/stats.py:797: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.70 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
  warnings.warn(

	rank	elpd_loo	p_loo	elpd_diff	weight	se	dse	warning	scale
arx2	0	-103.396246	4.856709	0.000000	1.000000e+00	7.235023	0.000000	True	log
arx3	1	-104.615270	6.405578	1.219024	3.599302e-12	7.672517	1.520233	True	log
arx1	2	-107.561120	4.663292	4.164874	2.050683e-12	5.673898	4.558640	True	log
arxmax	3	-108.130806	5.292030	4.734560	5.088846e-13	5.821429	4.730052	True	log
linreg	4	-151.296510	2.012209	47.900264	0.000000e+00	1.916102	7.322429	False	log

It looks like the best model according to the LOO criterion is the second ARX model we defined, using \(p=2\) and \(r=1\). The other ARX and ARMAX models come close, but all are a significany improvement from the linear regression in terms of elpd.

Posterior analysis#

fs = fit_arx2.summary(percentiles=(5, 50, 95))
fs.loc[['alpha', 'phi[1]', 'phi[2]', 'beta[1,1]', 'beta[1,2]', 'sigma']]

	Mean	MCSE	StdDev	MAD	5%	50%	95%	ESS_bulk	ESS_tail	R_hat
alpha	-38.785400	1.155390	50.603400	48.393000	-121.478000	-38.993800	47.127700	1923.76	1776.38	1.00199
phi[1]	1.060290	0.008094	0.273956	0.259596	0.621424	1.055360	1.522520	1168.82	1226.96	1.00239
phi[2]	-0.214849	0.006826	0.235300	0.224055	-0.612585	-0.209717	0.158765	1213.09	1318.39	1.00252
beta[1,1]	5.004210	0.120778	4.197610	4.010650	-1.900230	4.966280	11.955700	1270.83	1554.13	1.00208
beta[1,2]	8.193960	0.205776	6.866260	6.412250	-3.228970	8.318390	19.202300	1142.12	1384.58	1.00306
sigma	30.057400	0.148224	5.751920	5.271160	22.308100	29.101100	40.549800	1559.94	1990.28	1.00150

y_hat = idata_arx2.posterior_predictive["y_hat"]

fig, axes = plt.subplots(figsize=(10, 4))
axes.plot(pd.to_datetime(df['datetime']), df['P'], label='data')
axes.plot(pd.to_datetime(df['datetime']), y_hat.mean(("chain", "draw")), c="C1", label="posterior mean")
az.plot_hdi(pd.to_datetime(df['datetime']), y_hat, smooth=False)
axes.set_ylabel("consumption")
axes.set_xlabel("time")
axes.legend()

<matplotlib.legend.Legend at 0x7f6f8cfd4d70>

_images/d4b9d44e10668c653d7e913e4b9aac34656dd039c0708c6d413a9d55e7a612d6.png

We can see the posterior high density intervals increase drastically after the end of the training period.

ARMAX models

Contents

ARMAX models#

Data#

Linear regression#

Posterior analysis#