Functions to find point forecasts using simulated values from the forecast distribution.

Forecast samples are the output from dglm.forecast_marginal, dglm.forecast_path, and most commonly, analysis. All of the functions in this module accept an array of forecast samples and produce a series of point forecasts, such as the forecast mean or median.

When using analysis, the samples are sequentially simulated from the forecast distribution at each specified time step. The forecast samples are placed in a 3 dimensional array, whose axes are nsamps $\times$ forecast length $\times$ k, where:

• nsamps is the number of samples drawn from the forecast distribution
• forecast length is the number of time steps between forecast_start and forecast_end in analysis
• k is the forecast horizon, or the number of steps that were forecast ahead

The point forecast will be calculated over nsamps, meaning that the output will be a 2 dimensional array of size forecast length $\times$ k.

More generally, all of the point forecasts are calculated from an array and assume that random samples are stored along the first dimension.

mean[source]

mean(samps)

Find the mean point forecasts.

The mean point forecast is theoretically optimal for minimizing squared error loss, $L = \sum_i (y_i - f_i)^2$.

An example below demonstrates how to use the function:

from pybats.shared import load_us_inflation
from pybats.analysis import analysis
import pandas as pd

forecast_start = '2000-Q1'
forecast_end = '2013-Q4'

mod, samples = analysis(Y = data.Inflation.values[1:], X=None, family="normal",
k = 4, prior_length = 12,
forecast_start = forecast_start, forecast_end = forecast_end,
dates=data.Date,
ntrend = 2, deltrend=.99,
nsamps = 5000)

forecast = mean(samples)

beginning forecasting

forecast.shape

(56, 4)
start = data[data.Date == forecast_start].index[0]
end = data[data.Date == forecast_end].index[0] + 1
dates = pd.to_datetime(data[start:end].Date)

forecast = pd.DataFrame(forecast, index=dates)

Date
2000-01-01 0.99 0.96 0.87 0.80
2000-04-01 1.00 0.94 0.88 0.81
2000-07-01 1.06 1.01 0.93 0.88
2000-10-01 1.07 1.03 0.96 0.97
2001-01-01 1.13 1.05 1.04 1.02

The forecasts are made on each date before seeing the observation. So the $1-$Step ahead forecast is for the date listed in that row. The $2-$Step Ahead forecast is project the mean for the date listed in the next row, and so on.

This view allows you to easily see the forecasts and how the model projects into the future. Looking across the first row, it is clear that the model has a negative local slope, because the forecasts generally decrease as they go further into the future.

median[source]

median(samps)

Find the median point forecasts.

The median point forecast is theoretically optimal for minimizing absolute deviation loss, $L = \sum_i |y_i - f_i|$.

forecast = median(samples)

forecast = pd.DataFrame(forecast, index=dates)

Date
2000-01-01 0.97 0.96 0.87 0.80
2000-04-01 1.02 0.96 0.90 0.82
2000-07-01 1.04 1.00 0.93 0.88
2000-10-01 1.06 1.02 0.96 0.99
2001-01-01 1.13 1.05 1.05 1.02

m_one_median[source]

m_one_median(samps)

Find the (-1)-median point forecasts.

The $\left(-1\right)-$median is theoretically optimal for minimizing absolute percent error, $L=\sum_i |y_i - f_i|/y_i$. It is always lower than the median, and is only defined for non-zero $y$.

We'll use a new example, because the $(-1)-$median can produce strange results when $y_i$ is close to $0$.

from pybats.shared import load_sales_example

Y = data['Sales'].values

k = 4
forecast_start = 15
forecast_end = 30

mod, samples = analysis(Y, X, family="poisson",
forecast_start=forecast_start, forecast_end=forecast_end,
k=k, prior_length=6,
rho=.5, deltrend=0.95, delregn=0.95)

beginning forecasting

forecast = m_one_median(samples)

forecast = pd.DataFrame(forecast)