The PyBATS library supports many types of DGLMs - Poisson, Bernoulli, Normal (a DLM), and Binomial.
A Dynamic Generalized Linear Model is defined by:
- The exponential family of the observations $y_t$: Normal, Poisson, Bernoulli, or Binomial
- The components in the state vector: Trend, Regression, Seasonal, Holiday, and Latent Factor
A DGLM is a linear state space model, defined by the dynamic regression:
$$ \begin{aligned} y_t &\sim p(\lambda_t) \\ \lambda_{t} &= F_{t}' \theta_{t} \end{aligned} $$Where:
- $p(\lambda_t)$ is the forecast distribution for the observation $y_t$
- $\theta_t$ is the state vector, containing the model coefficients
- $F_t$ is the regression vector, containing all of the predictors $X_t$
A PyBATS DGLM has three core methods.
dglm.update
updates the state vector $\theta_t$ after observing $y_t$.dglm.forecast_marginal
simulates from the forecast distribution for $y_{t+k}$, $k$ time steps into the future.dglm.forecast_path
simulates from the joint forecast distribution for $y_{t+1}:y_{t+k}$ from today through to $k$ steps into the future.
Below is a very simple example of manually defining a DGLM. However, in most situations, DGLMs will be defined automatically through analysis
or through define_models
.
This model has 1 trend coefficent and 2 regression coefficients. The trend is simply an intercept. The 2 regression coefficients mean that we have 2 predictor variables.
import numpy as np
from pybats.dglm import pois_dglm
a = np.array([1, 1, 1])
R = np.eye(3)
mod = pois_dglm(a, R, ntrend=1, nregn=2, deltrend=1, delregn=.9)
mod.get_coef()
mod.get_coef
provides the mean and standard deviation of the state vector $\theta$. The mean vector and full variance matrix is also available using mod.a
and mod.R
.
deltrend
is the discount factor on the trend component. A discount factor of 1 indicates no expected change over time.
delregn
is the discount factor on the regression component. A discount factor of 0.9 indicates that we expect about a 10% change in the coefficient every time step. This is a very low discount factor. Most of the time, discount factors should be set between $0.95 - 1$.
Discounting old information is what makes this model dynamic, because we are telling the model that the coefficients should change over time.
We will make up an observation $y$ and predictors $X$ and demonstrate how the model updates:
y = 5
X = np.array([1,2])
mod.update(y = y, X = X)
Below, we'll see how the mean and variance of the state vector $\theta_t$ has now changed from the initial definition.
mod.get_coef()
We can forecast from this model using future values of our predictors, X_future
. The output are simulated values from the forecast distribution.
X_future = np.array([2,1])
forecast_samples = mod.forecast_marginal(k = 1, X = X_future, nsamps=1000)
forecast_samples[:10]
Path forecasting is used to forecast over multiple step steps at once. For example, let's forecast the next 3 time steps. X_future
is still the future values of our predictors, but it now has 3 rows.
X_future = np.array([[2,1], [2,2], [1,1]])
forecast_samples = mod.forecast_path(k = 3, X = X_future, nsamps=1000)
forecast_samples[:10]
A Bernoulli DGLM models $0-1$ observations:
$$ \begin{aligned} y_t &\sim Bern(\pi_t) \\ \pi_t &= 1/(1 + e^{-\lambda_t}) \\ \lambda_{t} &= F_{t}' \theta_{t} \end{aligned} $$Where
- Bern($\pi_t$) is the Bernoulli distribution with probability $\pi_t$
- $\lambda_t$ is transformed through the logistic function to become $\pi_t$
A Poisson DGLM models non-negative integers:
$$ \begin{aligned} y_t &\sim Pois(\mu_t) \\ \mu_t &= e^{\lambda_t} \\ \lambda_{t} &= F_{t}' \theta_{t} \end{aligned} $$Where
- Pois($\mu_t$) is the Poisson distribution with rate parameter $\mu_t$
- $\mu_t$ is exponent of $\lambda_t$
A normal DLM has a slightly different form than other DGLMs:
$$ \begin{aligned} y_t &= \mu_t + \epsilon_t \\ \mu_t &= \lambda_{t} = F_{t}' \theta_{t} \\ \epsilon_t &\sim N(0, \sigma^2_t) \end{aligned} $$Where
- N($0, \sigma^2_t$) is the Normal distribution with mean $0$ and variance $\sigma^2_t$
- $\sigma^2_t$ is also a dynamic parameter, with its own evolution through time. The discount factor
mod.delVar
is set by the modeler, and controls the rate at which $\sigma^2$ changes. Typically it is set between $0.95-1$. The mean of $\sigma_t^2$ is stored asmod.s
A Binomial DGLM models the sum of independent $0-1$ observations:
$$ \begin{aligned} y_t &\sim Bin(n_t, pi_t) \\ \pi_t &= 1/(1 + e^{-\lambda_t}) \\ \lambda_{t} &= F_{t}' \theta_{t} \end{aligned} $$Where
- Bin($n_t, pi_t$) is the Binomial distribution with $n_t$ independent trials, each having probability $\pi_t$
- $\lambda_t$ is transformed through the logistic function to become $\pi_t$