The DCMM is a combination of a Bernoulli and Poisson DGLM. The Bernoulli DGLM models the probability of the observation being zero. Conditional on a non-zero outcome, then the observation follows a Poisson distribution. This is useful for modeling time series with a greater number of zeros than expected under a Poisson distribution, which is frequently the case for low-valued count time series.
In more formal terms, a DCMM models observations $y_t$ as: $$ \quad z_{t} \sim Bern(\pi_{t}) \quad \textrm{and}\quad y_{t} | z_{t} = \begin{cases} 0, & \text{if } z_{t} = 0,\\ 1 + x_{t}, \quad x_{t} \sim Pois(\mu_{t}), & \textrm{if}\ z_{t} = 1. \end{cases} $$
A DCMM can be used in the same way as a DGLM, with the standard methods dcmm.update
, dcmm.forecast_marginal
, and dcmm.forecast_path
. There are equivalent helper functions as well. A full analysis can be run with analysis_dcmm
, and define_dcmm
helps to initialize a DCMM. These helper functions assume that the same predictors X
are used for the Bernoulli and Poisson DGLMs.
The only difference from using a standard dglm
is that outside of analysis_dcmm
, the update and forecast functions do not automatically recognize whether the DCMM includes latent factors or call a copula for path forecasting. This means that the modeler needs to be more explicit in calling the correct method, such as dcmm.forecast_path_copula
for path forecasting with a copula.
A quick example of using analysis_dcmm
to model simulated sales data follows. Another example with a DCMM can also be found here.
import pandas as pd
import numpy as np
from pybats.shared import load_sales_example2
from pybats.analysis import analysis_dcmm
from pandas.tseries.holiday import USFederalHolidayCalendar
data = load_sales_example2()
data.head()
prior_length = 25 # Number of days of data used to set prior
k = 7 # Forecast horizon
rho = 0.5 # Random effect discount factor to increase variance of forecast distribution
forecast_samps = 1000 # Number of forecast samples to draw
forecast_start = pd.to_datetime('2018-01-01') # Date to start forecasting
forecast_end = pd.to_datetime('2018-05-01') # Date to stop forecasting
holidays = USFederalHolidayCalendar.rules
mod, samples = analysis_dcmm(data['Sales'].values, data[['Price', 'Promotion']].values,
k, forecast_start, forecast_end,
nsamps=forecast_samps,
prior_length=prior_length,
seasPeriods=[7], seasHarmComponents=[[1,2,3]],
dates=data.index, holidays=holidays,
rho=rho,
ret = ['model', 'forecast'])
Because the DCMM is effectively a container for a Poisson and a Bernoulli DGLM, we can access each of them individually. The coefficients in the Bernoulli DGLM affect the probability of a non-zero observation, and the coefficients in the Poisson DGLM impact the size of any non-zero observations. To illustrate, we'll take a look at the holiday coefficients in both DGLMs.
pois_hol = mod.pois_mod.get_coef('hol')
bern_hol = mod.bern_mod.get_coef('hol')
coef = pd.DataFrame({'Holidays':[h.name for h in holidays],
'Pois Mean': pois_hol['Mean'],
'Pois Std Dev': pois_hol['Standard Deviation'],
'Bern Mean': bern_hol['Mean'],
'Bern Std Dev': bern_hol['Standard Deviation']}).round(2)
coef
The largest negative coefficients are for Christmas and New Years Day, which means that they are more likely to have very low or $0$ sales.
The largest positive coefficients are for July 4th and Memorial day, which means that they are likely to have increased sales.