The class Dynamic Generalized Linear Model (DGLM), which is the core of the PyBATS package.

The PyBATS library supports many types of DGLMs - Poisson, Bernoulli, Normal (a DLM), and Binomial.

A Dynamic Generalized Linear Model is defined by:

  • The exponential family of the observations $y_t$: Normal, Poisson, Bernoulli, or Binomial
  • The components in the state vector: Trend, Regression, Seasonal, Holiday, and Latent Factor

A DGLM is a linear state space model, defined by the dynamic regression:

$$ \begin{aligned} y_t &\sim p(\lambda_t) \\ \lambda_{t} &= F_{t}' \theta_{t} \end{aligned} $$

Where:

  • $p(\lambda_t)$ is the forecast distribution for the observation $y_t$
  • $\theta_t$ is the state vector, containing the model coefficients
  • $F_t$ is the regression vector, containing all of the predictors $X_t$

A PyBATS DGLM has three core methods.

  • dglm.update updates the state vector $\theta_t$ after observing $y_t$.
  • dglm.forecast_marginal simulates from the forecast distribution for $y_{t+k}$, $k$ time steps into the future.
  • dglm.forecast_path simulates from the joint forecast distribution for $y_{t+1}:y_{t+k}$ from today through to $k$ steps into the future.

class dglm[source]

dglm(a0=None, R0=None, nregn=0, ntrend=0, nlf=0, nhol=0, seasPeriods=[], seasHarmComponents=[], deltrend=1, delregn=1, dellf=1, delhol=1, delseas=1, rho=1, interpolate=True, adapt_discount=False, adapt_factor=0.5, discount_forecast=False)

Below is a very simple example of manually defining a DGLM. However, in most situations, DGLMs will be defined automatically through analysis or through define_models.

This model has 1 trend coefficent and 2 regression coefficients. The trend is simply an intercept. The 2 regression coefficients mean that we have 2 predictor variables.

import numpy as np
from pybats.dglm import pois_dglm

a = np.array([1, 1, 1])
R = np.eye(3)
mod = pois_dglm(a, R, ntrend=1, nregn=2, deltrend=1, delregn=.9)
mod.get_coef()
Mean Standard Deviation
Intercept 1 1.0
Regn 1 1 1.0
Regn 2 1 1.0

mod.get_coef provides the mean and standard deviation of the state vector $\theta$. The mean vector and full variance matrix is also available using mod.a and mod.R.

deltrend is the discount factor on the trend component. A discount factor of 1 indicates no expected change over time.

delregn is the discount factor on the regression component. A discount factor of 0.9 indicates that we expect about a 10% change in the coefficient every time step. This is a very low discount factor. Most of the time, discount factors should be set between $0.95 - 1$.

Discounting old information is what makes this model dynamic, because we are telling the model that the coefficients should change over time.

dglm.update[source]

dglm.update(y=None, X=None, phi_mu=None, phi_sigma=None, analytic=True, phi_samps=None, **kwargs)

Update the DGLM state vector mean and covariance after observing 'y', with covariates 'X'.

We will make up an observation $y$ and predictors $X$ and demonstrate how the model updates:

y = 5
X = np.array([1,2])
mod.update(y = y, X = X)

Below, we'll see how the mean and variance of the state vector $\theta_t$ has now changed from the initial definition.

mod.get_coef()
Mean Standard Deviation
Intercept 0.6 0.92
Regn 1 0.6 0.97
Regn 2 0.2 0.63

dglm.forecast_marginal[source]

dglm.forecast_marginal(k, X=None, nsamps=1, mean_only=False, phi_mu=None, phi_sigma=None, analytic=True, phi_samps=None, state_mean_var=False, y=None)

Simulate from the forecast distribution at time t+k.

We can forecast from this model using future values of our predictors, X_future. The output are simulated values from the forecast distribution.

X_future = np.array([2,1])
forecast_samples = mod.forecast_marginal(k = 1, X = X_future, nsamps=1000)
forecast_samples[:10]
array([17, 10, 51, 10, 21, 10, 21, 11, 19, 20])

dglm.forecast_path[source]

dglm.forecast_path(k, X=None, nsamps=1, copula=True, phi_mu=None, phi_sigma=None, phi_psi=None, analytic=True, phi_samps=None, **kwargs)

Simulate from the path (joint) forecast distribution from 1 to k steps ahead.

Path forecasting is used to forecast over multiple step steps at once. For example, let's forecast the next 3 time steps. X_future is still the future values of our predictors, but it now has 3 rows.

X_future = np.array([[2,1], [2,2], [1,1]])
forecast_samples = mod.forecast_path(k = 3, X = X_future, nsamps=1000)
forecast_samples[:10]
array([[47, 40, 12],
       [25, 19,  4],
       [54, 42,  6],
       [20, 10,  6],
       [ 3,  8,  4],
       [10, 10, 10],
       [40, 36,  8],
       [ 5,  5,  1],
       [ 0,  1,  1],
       [11,  6,  2]])

class bern_dglm[source]

bern_dglm(a0=None, R0=None, nregn=0, ntrend=0, nlf=0, nhol=0, seasPeriods=[], seasHarmComponents=[], deltrend=1, delregn=1, dellf=1, delhol=1, delseas=1, rho=1, interpolate=True, adapt_discount=False, adapt_factor=0.5, discount_forecast=False) :: dglm

A Bernoulli DGLM models $0-1$ observations:

$$ \begin{aligned} y_t &\sim Bern(\pi_t) \\ \pi_t &= 1/(1 + e^{-\lambda_t}) \\ \lambda_{t} &= F_{t}' \theta_{t} \end{aligned} $$

Where

class pois_dglm[source]

pois_dglm(a0=None, R0=None, nregn=0, ntrend=0, nlf=0, nhol=0, seasPeriods=[], seasHarmComponents=[], deltrend=1, delregn=1, dellf=1, delhol=1, delseas=1, rho=1, interpolate=True, adapt_discount=False, adapt_factor=0.5, discount_forecast=False) :: dglm

A Poisson DGLM models non-negative integers:

$$ \begin{aligned} y_t &\sim Pois(\mu_t) \\ \mu_t &= e^{\lambda_t} \\ \lambda_{t} &= F_{t}' \theta_{t} \end{aligned} $$

Where

  • Pois($\mu_t$) is the Poisson distribution with rate parameter $\mu_t$
  • $\mu_t$ is exponent of $\lambda_t$

class dlm[source]

dlm(*args, n0=1, s0=1, delVar=1, **kwargs) :: dglm

A normal DLM has a slightly different form than other DGLMs:

$$ \begin{aligned} y_t &= \mu_t + \epsilon_t \\ \mu_t &= \lambda_{t} = F_{t}' \theta_{t} \\ \epsilon_t &\sim N(0, \sigma^2_t) \end{aligned} $$

Where

  • N($0, \sigma^2_t$) is the Normal distribution with mean $0$ and variance $\sigma^2_t$
  • $\sigma^2_t$ is also a dynamic parameter, with its own evolution through time. The discount factor mod.delVar is set by the modeler, and controls the rate at which $\sigma^2$ changes. Typically it is set between $0.95-1$. The mean of $\sigma_t^2$ is stored as mod.s

class bin_dglm[source]

bin_dglm(a0=None, R0=None, nregn=0, ntrend=0, nlf=0, nhol=0, seasPeriods=[], seasHarmComponents=[], deltrend=1, delregn=1, dellf=1, delhol=1, delseas=1, rho=1, interpolate=True, adapt_discount=False, adapt_factor=0.5, discount_forecast=False) :: dglm

A Binomial DGLM models the sum of independent $0-1$ observations:

$$ \begin{aligned} y_t &\sim Bin(n_t, pi_t) \\ \pi_t &= 1/(1 + e^{-\lambda_t}) \\ \lambda_{t} &= F_{t}' \theta_{t} \end{aligned} $$

Where