The DCMM is a combination of a Bernoulli and Poisson DGLM. The Bernoulli DGLM models the probability of the observation being zero. Conditional on a non-zero outcome, then the observation follows a Poisson distribution. This is useful for modeling time series with a greater number of zeros than expected under a Poisson distribution, which is frequently the case for low-valued count time series.

In more formal terms, a DCMM models observations $y_t$ as: $$ \quad z_{t} \sim Bern(\pi_{t}) \quad \textrm{and}\quad y_{t} | z_{t} = \begin{cases} 0, & \text{if } z_{t} = 0,\\ 1 + x_{t}, \quad x_{t} \sim Pois(\mu_{t}), & \textrm{if}\ z_{t} = 1. \end{cases} $$

A DCMM can be used in the same way as a DGLM, with the standard methods dcmm.update, dcmm.forecast_marginal, and dcmm.forecast_path. There are equivalent helper functions as well. A full analysis can be run with analysis_dcmm, and define_dcmm helps to initialize a DCMM. These helper functions assume that the same predictors X are used for the Bernoulli and Poisson DGLMs.

The only difference from using a standard dglm is that outside of analysis_dcmm, the update and forecast functions do not automatically recognize whether the DCMM includes latent factors or call a copula for path forecasting. This means that the modeler needs to be more explicit in calling the correct method, such as dcmm.forecast_path_copula for path forecasting with a copula.

A quick example of using analysis_dcmm to model simulated sales data follows. Another example with a DCMM can also be found here.

import pandas as pd
import numpy as np

from pybats.shared import load_sales_example2
from pybats.analysis import analysis_dcmm
from pandas.tseries.holiday import USFederalHolidayCalendar


data = load_sales_example2()
data.head()

prior_length = 25   # Number of days of data used to set prior
k = 7               # Forecast horizon
rho = 0.5           # Random effect discount factor to increase variance of forecast distribution
forecast_samps = 1000  # Number of forecast samples to draw
forecast_start = pd.to_datetime('2018-01-01') # Date to start forecasting
forecast_end = pd.to_datetime('2018-05-01')   # Date to stop forecasting
holidays = USFederalHolidayCalendar.rules

mod, samples = analysis_dcmm(data['Sales'].values, data[['Price', 'Promotion']].values,
                             k, forecast_start, forecast_end,
                             nsamps=forecast_samps,
                             prior_length=prior_length,
                             seasPeriods=[7], seasHarmComponents=[[1,2,3]],
                             dates=data.index, holidays=holidays,
                             rho=rho,
                             ret = ['model', 'forecast'])

beginning forecasting

Because the DCMM is effectively a container for a Poisson and a Bernoulli DGLM, we can access each of them individually. The coefficients in the Bernoulli DGLM affect the probability of a non-zero observation, and the coefficients in the Poisson DGLM impact the size of any non-zero observations. To illustrate, we'll take a look at the holiday coefficients in both DGLMs.

pois_hol = mod.pois_mod.get_coef('hol')
bern_hol = mod.bern_mod.get_coef('hol')

coef = pd.DataFrame({'Holidays':[h.name for h in holidays],
                     'Pois Mean': pois_hol['Mean'],
                     'Pois Std Dev': pois_hol['Standard Deviation'],
                     'Bern Mean': bern_hol['Mean'],
                     'Bern Std Dev': bern_hol['Standard Deviation']}).round(2)
coef

The largest negative coefficients are for Christmas and New Years Day, which means that they are more likely to have very low or $0$ sales.

The largest positive coefficients are for July 4th and Memorial day, which means that they are likely to have increased sales.

	Sales	Price	Promotion
Date
2014-06-01	15.0	1.11	0.0
2014-06-02	13.0	2.19	0.0
2014-06-03	6.0	0.23	0.0
2014-06-04	2.0	-0.05	1.0
2014-06-05	6.0	-0.14	0.0

	Holidays	Pois Mean	Pois Std Dev	Bern Mean	Bern Std Dev
Hol 1	New Years Day	-0.94	0.66	-0.78	1.29
Hol 2	Martin Luther King Jr. Day	0.20	0.43	0.08	1.41
Hol 3	Presidents Day	-0.27	0.46	0.27	1.39
Hol 4	Memorial Day	0.62	0.40	0.04	1.41
Hol 5	July 4th	1.10	0.43	0.04	1.41
Hol 6	Labor Day	0.21	0.43	0.03	1.41
Hol 7	Columbus Day	-0.04	0.45	0.21	1.39
Hol 8	Veterans Day	-0.15	0.48	0.03	1.41
Hol 9	Thanksgiving	0.14	0.43	0.16	1.39
Hol 10	Christmas	-1.97	1.11	-1.10	1.23

DCMM

class dcmm[source]

`class` `dcmm`[source]