Forecasting occurs at time $t$ before a new value $y_t$ has been observed. There are two types of forecasts:
- Marginal - A forecast made at k time steps into the future only. For example, marginal forecasting with $k=1$ will simulate from the forecast distribution for $y_t$. Marginal forecasting with $k=2$ will simulate from the forecast distribution for $y_{t+1}$, and so on.
- Path - A joint forecast made from 1 to k steps into the future. Path forecasting with $k=3$ will simulate dependent observations $y_t, y_{t+1}, y_{t+2}$. Path forecasting is important when computing functions across multiple forecast horizons. The most common example is forecasting the sum $\sum_{h=1:k} y_{t+k-1}$. This is accomplished by path forecasting $k$ steps ahead, and then summing each path to get a random sample from the forecast distribution for the sum.
Naturally, forecasting $k-$steps ahead requires knowing all predictors $X$ that far into the future as well. Marginal forecasting $k-$steps ahead accepts $X_{t+k-1}$ as an argument. Path forecasting accepts the predictors in an array. Each row contains the predictors at one time step, starting with $X_t$ in the first row, down to $X_{t+k-1}$ in the last row.
This marginal forecast method works for Poisson and Bernoulli DGLMs. Below we run through a basic usage with the following steps:
- Manually define a simple DGLM with an intercept and 2 regression predictors
- Define the future predictors, $X_{t+k-1}$
- Forecast by drawing a random sample at $k=5$, and getting a $95\%$ credible interval
- Forecast the mean only, without drawing a random sample
import numpy as np
from pybats.dglm import dlm, pois_dglm, bern_dglm, bin_dglm
from pybats.analysis import analysis
# Define Poisson and Bernoulli DGLMs, along with a normal DLM, with 1 trend term and 2 regression predictors
a0 = np.array([1, 1, 1])
R0 = np.eye(3)/10
mod_n = dlm(a0, R0, ntrend=1, nregn=2, deltrend=1, delregn=.9, discount_forecast=True)
mod_p = pois_dglm(a0, R0, ntrend=1, nregn=2, deltrend=1, delregn=.9)
mod_bern = bern_dglm(a0, R0, ntrend=1, nregn=2, deltrend=1, delregn=.9)
# New predictors for k=5 steps into the future
X = np.array([1,2])
First we're going to forecast from the Poisson DGLM, which will draw a random sample from the forecast distribution. We'll use the forecast samples to calculate a $95\%$ credible interval.
forecast_samples = mod_p.forecast_marginal(k = 5, X=X, nsamps=1000)
# Get the 95% credible intervals
np.percentile(forecast_samples, [2.5, 97.5])
We can repeat this analysis using the normal DLM:
forecast_samples = mod_n.forecast_marginal(k = 5, X=X, nsamps=1000)
# Get the 95% credible intervals
np.percentile(forecast_samples, [2.5, 97.5])
Next we'll test the models by setting mean_only=True
. This will return only the mean value, without simulation from the forecast distribution. It is much faster if the mean is the only value of interest.
Notice that we skipped simulating from the Bernoulli DGLM, because Bernoulli outcomes are only $0$ or $1$ anyway. The mean of the forecast distribution in a Bernoulli DGLM is equal to the probability of the outcome being $1$.
# Setting the flag mean_only=True returns the exact mean, without simulation
m_p = mod_p.forecast_marginal(k = 5, X=X, mean_only=True)
ans = [70.38455503971952]
assert (np.equal(np.round(ans[0], 5), np.round(m_p, 5)).all())
# Test the Bernoulli DGLM forecast mean is correct
# Setting the flag mean_only=True returns the exact mean, without simulation
m_bern = mod_bern.forecast_marginal(k=5, X=X, mean_only=True)
ans = [0.9771442828394032]
assert (np.equal(np.round(ans[0], 5), np.round(m_bern, 5)).all())
# Test the Bernoulli DGLM forecast mean is correct
# Setting the flag state_mean_var=True returns the exact mean and variance, without simulation
m_n, v = mod_n.forecast_marginal(k = 5, X=X, state_mean_var=True)
ans = [4., 1.82222222]
assert(np.equal(np.round(ans[0], 5), np.round(m_n, 5)).all())
assert(np.equal(np.round(ans[1], 5), np.round(v, 5)).all())
dglm.forecast_marginal
has three special features beyond returning just a random sample from the forecast distribution:
- Setting
mean_only = True
makes the function return the mean of the forecast distribution, instead of a random sample. - Setting
state_mean_and_var = True
makes the function return the mean and variance of the linear predictor, $\lambda_t = F_t^{'} \theta_t$. - Passing in a value for
y
makes the function return the log probability density ofy
under the forecast distribution, $p(y)$.
We'll repeat the same tests from above on a binomial DGLM by specifying both the predictors X
and the number of trials n
.
mod_b = bin_dglm(a0, R0, ntrend=1, nregn=2, deltrend=1, delregn=.9)
# Test the Binomial DGLM
m_bin = mod_b.forecast_marginal(n = 10, k=5, X=X, mean_only=True)
ans = [9.771442828394033]
assert (np.equal(np.round(ans[0], 5), np.round(m_bin, 5)).all())
There are two functions for path forecasting with DGLMs. forecast_path_copula
is the default method, and is part of my new research. It dramatically speeds up path forecasting in DGLMs. The traditional forecast_path
function is only used when dglm.forecast_path
is called with the flag copula=FALSE
.
Below we'll test path forecasting - with a copula, as is the default - for $k=2$ steps into the future. The Poisson model will have $2$ trend terms - an intercept and a slope - and 1 regression term. This test will confirm that marginal forecasting and path forecasting lead to identical forecast means for $k=1$ and $k=2$ steps ahead.
a0 = np.array([1, 1, 1])
R0 = np.eye(3)/100
mod_p = pois_dglm(a0, R0, ntrend=2, nregn=1, deltrend=1, delregn=.9)
# The predictors for the next 2 time steps, X_t and X_(t+1), in a matrix with k=2 rows
X = np.array([[1], [0.5]])
# Test the Poisson DGLM by comparing the path and marginal forecast distributions
samps_copula = mod_p.forecast_path_copula(k = 2, X=X, nsamps=100000)
m_samp = samps_copula.mean(axis=0)
m_marg = np.array([mod_p.forecast_marginal(k=1, X = X[0], mean_only=True),
mod_p.forecast_marginal(k=2, X=X[1], mean_only=True)]).reshape(-1)
assert (np.equal(np.round(m_samp - m_marg, 1), np.zeros(2)).all())
samps_path = mod_p.forecast_path(k = 2, X=X, nsamps=2000, copula=False)
m_samp = samps_path.mean(axis=0)
assert (np.equal(np.round(m_samp - m_marg, 0), np.zeros(2)).all())
dglm.forecast_path
has two special features beyond returning just a random sample from the forecast distribution. These only work when using forecast_path_copula
, and are not yet available for a normal DLM.
- Passing in a value (a vector of length $k$) for
y
makes the function return the log probability density ofy
under the forecast distribution, $p(y)$. - Setting
return_cov = True
makes the function return the covariance in the copula distribution. This will be different than the covariance in the forecast distribution, which can be calculated directly from a random sample.