Forecast samples are the output from dglm.forecast_marginal
, dglm.forecast_path
, and most commonly, analysis
. All of the functions in this module accept an array of forecast samples and produce a series of point forecasts, such as the forecast mean or median.
When using analysis
, the samples are sequentially simulated from the forecast distribution at each specified time step. The forecast samples are placed in a 3 dimensional array, whose axes are nsamps $\times$ forecast length $\times$ k, where:
- nsamps is the number of samples drawn from the forecast distribution
- forecast length is the number of time steps between
forecast_start
andforecast_end
inanalysis
- k is the forecast horizon, or the number of steps that were forecast ahead
The point forecast will be calculated over nsamps, meaning that the output will be a 2 dimensional array of size forecast length $\times$ k.
More generally, all of the point forecasts are calculated from an array and assume that random samples are stored along the first dimension.
The mean point forecast is theoretically optimal for minimizing squared error loss, $L = \sum_i (y_i - f_i)^2$.
An example below demonstrates how to use the function:
from pybats.shared import load_us_inflation
from pybats.analysis import analysis
import pandas as pd
data = load_us_inflation()
forecast_start = '2000-Q1'
forecast_end = '2013-Q4'
mod, samples = analysis(Y = data.Inflation.values[1:], X=None, family="normal",
k = 4, prior_length = 12,
forecast_start = forecast_start, forecast_end = forecast_end,
dates=data.Date,
ntrend = 2, deltrend=.99,
nsamps = 5000)
forecast = mean(samples)
forecast.shape
start = data[data.Date == forecast_start].index[0]
end = data[data.Date == forecast_end].index[0] + 1
dates = pd.to_datetime(data[start:end].Date)
forecast = pd.DataFrame(forecast, index=dates)
forecast.columns = ['1-Step Ahead', '2-Step Ahead', '3-Step Ahead', '4-Step Ahead']
forecast.round(2).head()
The forecasts are made on each date before seeing the observation. So the $1-$Step ahead forecast is for the date listed in that row. The $2-$Step Ahead forecast is project the mean for the date listed in the next row, and so on.
This view allows you to easily see the forecasts and how the model projects into the future. Looking across the first row, it is clear that the model has a negative local slope, because the forecasts generally decrease as they go further into the future.
The median point forecast is theoretically optimal for minimizing absolute deviation loss, $L = \sum_i |y_i - f_i|$.
forecast = median(samples)
forecast = pd.DataFrame(forecast, index=dates)
forecast.columns = ['1-Step Ahead', '2-Step Ahead', '3-Step Ahead', '4-Step Ahead']
forecast.round(2).head()
The $\left(-1\right)-$median is theoretically optimal for minimizing absolute percent error, $L=\sum_i |y_i - f_i|/y_i$. It is always lower than the median, and is only defined for non-zero $y$.
We'll use a new example, because the $(-1)-$median can produce strange results when $y_i$ is close to $0$.
from pybats.shared import load_sales_example
data = load_sales_example()
Y = data['Sales'].values
X = data['Advertising'].values
k = 4
forecast_start = 15
forecast_end = 30
mod, samples = analysis(Y, X, family="poisson",
forecast_start=forecast_start, forecast_end=forecast_end,
k=k, prior_length=6,
rho=.5, deltrend=0.95, delregn=0.95)
forecast = m_one_median(samples)
forecast = pd.DataFrame(forecast)
forecast.columns = ['1-Step Ahead', '2-Step Ahead', '3-Step Ahead', '4-Step Ahead']
forecast.round(2).head()