A Dynamic Count Mixture Model, or DCMM, is the combination of a Bernoulli and Poisson DGLM as described in Berry and West (2019).

The DCMM is a combination of a Bernoulli and Poisson DGLM. The Bernoulli DGLM models the probability of the observation being zero. Conditional on a non-zero outcome, then the observation follows a Poisson distribution. This is useful for modeling time series with a greater number of zeros than expected under a Poisson distribution, which is frequently the case for low-valued count time series.

In more formal terms, a DCMM models observations $y_t$ as: $$\quad z_{t} \sim Bern(\pi_{t}) \quad \textrm{and}\quad y_{t} | z_{t} = \begin{cases} 0, & \text{if } z_{t} = 0,\\ 1 + x_{t}, \quad x_{t} \sim Pois(\mu_{t}), & \textrm{if}\ z_{t} = 1. \end{cases}$$

## classdcmm[source]

dcmm(a0_bern=None, R0_bern=None, nregn_bern=0, ntrend_bern=0, nlf_bern=0, nhol_bern=0, seasPeriods_bern=[], seasHarmComponents_bern=[], deltrend_bern=1, delregn_bern=1, delhol_bern=1, delseas_bern=1, dellf_bern=1, a0_pois=None, R0_pois=None, nregn_pois=0, ntrend_pois=0, nlf_pois=0, nhol_pois=0, seasPeriods_pois=[], seasHarmComponents_pois=[], deltrend_pois=1, delregn_pois=1, delhol_pois=1, delseas_pois=1, dellf_pois=1, rho=1, interpolate=True, adapt_discount=False)

A DCMM can be used in the same way as a DGLM, with the standard methods dcmm.update, dcmm.forecast_marginal, and dcmm.forecast_path. There are equivalent helper functions as well. A full analysis can be run with analysis_dcmm, and define_dcmm helps to initialize a DCMM. These helper functions assume that the same predictors X are used for the Bernoulli and Poisson DGLMs.

The only difference from using a standard dglm is that outside of analysis_dcmm, the update and forecast functions do not automatically recognize whether the DCMM includes latent factors or call a copula for path forecasting. This means that the modeler needs to be more explicit in calling the correct method, such as dcmm.forecast_path_copula for path forecasting with a copula.

A quick example of using analysis_dcmm to model simulated sales data follows. Another example with a DCMM can also be found here.

import pandas as pd
import numpy as np

from pybats.analysis import analysis_dcmm
from pandas.tseries.holiday import USFederalHolidayCalendar


Sales Price Promotion
Date
2014-06-01 15.0 1.11 0.0
2014-06-02 13.0 2.19 0.0
2014-06-03 6.0 0.23 0.0
2014-06-04 2.0 -0.05 1.0
2014-06-05 6.0 -0.14 0.0
prior_length = 25   # Number of days of data used to set prior
k = 7               # Forecast horizon
rho = 0.5           # Random effect discount factor to increase variance of forecast distribution
forecast_samps = 1000  # Number of forecast samples to draw
forecast_start = pd.to_datetime('2018-01-01') # Date to start forecasting
forecast_end = pd.to_datetime('2018-05-01')   # Date to stop forecasting
holidays = USFederalHolidayCalendar.rules

mod, samples = analysis_dcmm(data['Sales'].values, data[['Price', 'Promotion']].values,
k, forecast_start, forecast_end,
nsamps=forecast_samps,
prior_length=prior_length,
seasPeriods=[7], seasHarmComponents=[[1,2,3]],
dates=data.index, holidays=holidays,
rho=rho,
ret = ['model', 'forecast'])

beginning forecasting


Because the DCMM is effectively a container for a Poisson and a Bernoulli DGLM, we can access each of them individually. The coefficients in the Bernoulli DGLM affect the probability of a non-zero observation, and the coefficients in the Poisson DGLM impact the size of any non-zero observations. To illustrate, we'll take a look at the holiday coefficients in both DGLMs.

pois_hol = mod.pois_mod.get_coef('hol')
bern_hol = mod.bern_mod.get_coef('hol')

coef = pd.DataFrame({'Holidays':[h.name for h in holidays],
'Pois Mean': pois_hol['Mean'],
'Pois Std Dev': pois_hol['Standard Deviation'],
'Bern Mean': bern_hol['Mean'],
'Bern Std Dev': bern_hol['Standard Deviation']}).round(2)
coef

Holidays Pois Mean Pois Std Dev Bern Mean Bern Std Dev
Hol 1 New Years Day -0.94 0.66 -0.78 1.29
Hol 2 Martin Luther King Jr. Day 0.20 0.43 0.08 1.41
Hol 3 Presidents Day -0.27 0.46 0.27 1.39
Hol 4 Memorial Day 0.62 0.40 0.04 1.41
Hol 5 July 4th 1.10 0.43 0.04 1.41
Hol 6 Labor Day 0.21 0.43 0.03 1.41
Hol 7 Columbus Day -0.04 0.45 0.21 1.39
Hol 8 Veterans Day -0.15 0.48 0.03 1.41
Hol 9 Thanksgiving 0.14 0.43 0.16 1.39
Hol 10 Christmas -1.97 1.11 -1.10 1.23

The largest negative coefficients are for Christmas and New Years Day, which means that they are more likely to have very low or $0$ sales.

The largest positive coefficients are for July 4th and Memorial day, which means that they are likely to have increased sales.