- Introduction
- Analytic Posteriors
- Posterior Estimation
- Forward-filtering Backward-sampling
- Non-Gaussian Extension
- Augmented FFBS Sampler
- Discussion
In this document we detail how dynamic linear models (DLMs) can be implemented in Theano (or similar tensor libraries), as well as a complementary Rao-Blackwellized sampler tailored to the structure of DLMs. Furthermore, we provide an example of how DLMs can be extended–via Gaussian scale-mixtures–to model non-gaussian observations. The example is a Pólya-Gamma extension to forward-filtering backward-sampling for negative-binomial observations.
Introduction
For a proper introduction of dynamic linear models and their estimation, see Harrison & West (1999),Petris, Petrone & Campagnoli (2009), and Gamerman & Lopes (2006),Pole, West & Harrison (1994). The focus of this document is on some practical details behind DLM estimation in Python–and, particularly, libraries like Theano (Bergstra, Breuleux, Bastien, Lamblin, Pascanu, Desjardins, Turian, Warde-Farley & Bengio 2010). Throughout, we use some custom random variable objects from symbolic-pymc
(Willard 2019). This is largely because symbolic-pymc
offers a necessary non-standard distribution and its random variables work well within the scan
operations we employ.
In the future, we plan to build optimizations that automate much of the work done here (and more). This exposition sets the foundation for such work by first motivating the use and generality of Bayesian frameworks like DLMs, then by demonstrating the analytic steps that produce customized, efficient samplers. These are the steps that would undergo automation in the future.
A DLM with prior \(\theta_0 \sim \operatorname{N}\left( m_0, C_0 \right)\) is defined as follows:
\[\begin{align} y_t &= F_t^{\top} \theta_{t} + \epsilon_t, \quad \epsilon_t \sim \operatorname{N}\left( 0, V_t \right) \label{eq:basic-dlm-obs} \\ \theta_t &= G_t \theta_{t-1} + \nu_t, \quad \nu_t \sim \operatorname{N}\left( 0, W_t \right) \label{eq:basic-dlm-state} \end{align}\]
for \(t \in \{1, \dots, T\}\), \(y_t \in \mathbb{R}\), and \(\theta_t \in \mathbb{R}^{M}\).
The most “notationally” faithful representation of the timeseries model in \(\eqref{eq:basic-dlm-state}\) using Theano is provided in Listing 2. It represents the notion of a recursion–to the best of Theano’s ability–by way of the scan
operator.
import numpy as np
import theano
import theano.tensor as tt
import matplotlib.pyplot as plt
from cycler import cycler
from matplotlib.collections import LineCollection
from theano.printing import debugprint as tt_dprint
from symbolic_pymc.theano.random_variables import NormalRV, MvNormalRV, GammaRV
'ggplot')
plt.style.use(= plt.rcParams['axes.prop_cycle']
plt_orig_cycler 'text', usetex=True)
plt.rc(
# theano.config.cxx = ""
# theano.config.mode = "FAST_COMPILE"
= 'ignore' tt.config.compute_test_value
= tt.iscalar("N_obs")
N_obs_tt = tt.iscalar("N_theta")
N_theta_tt
= tt.specify_shape(tt.matrix(), [N_theta_tt, N_theta_tt])
G_tt = 'G_t'
G_tt.name
= tt.specify_shape(tt.col(), [N_theta_tt, 1])
F_tt = 'F_t'
F_tt.name
= np.random.RandomState(np.random.MT19937(np.random.SeedSequence(1234)))
rng_state = rng_state.get_state()
rng_init_state = theano.shared(rng_state, name='rng', borrow=True)
rng_tt = True
rng_tt.tag.is_rng = rng_tt
rng_tt.default_update
= tt.zeros([N_theta_tt])
m_0_tt = "m_0"
m_0_tt.name = 1000.0 * tt.eye(N_theta_tt)
C_0_tt = "C_0"
C_0_tt.name
= MvNormalRV(m_0_tt, C_0_tt, rng=rng_tt, name='theta_0')
theta_0_rv
= np.r_[5.0, 10.0]
phi_W_true = theano.shared(phi_W_true, name='phi_W')
phi_W_tt = tt.eye(N_theta_tt) * tt.inv(phi_W_tt)
W_tt = "W_t"
W_tt.name
= 0.1
phi_V_true = theano.shared(phi_V_true, name='phi_V')
phi_V_tt = tt.inv(phi_V_tt)
V_tt = "V_t"
V_tt.name
def state_step(theta_tm1, G_t, W_t, N_theta, rng):
= MvNormalRV(tt.zeros([N_theta]), W_t, rng=rng, name='nu')
nu_rv = G_t.dot(theta_tm1) + nu_rv
theta_t return theta_t
= theano.scan(fn=state_step,
theta_t_rv, theta_t_updates ={"initial": theta_0_rv, "taps": [-1]},
outputs_info=[G_tt, W_tt, N_theta_tt, rng_tt],
non_sequences=N_obs_tt,
n_steps=True,
strict='theta_t')
name
def obs_step(theta_t, F_t, V_t, rng):
= NormalRV(0.0, tt.sqrt(V_t), rng=rng, name='eps')
eps_rv = F_t.T.dot(theta_t) + eps_rv
y_t return y_t
= theano.scan(fn=obs_step,
Y_t_rv, Y_t_updates =[theta_t_rv],
sequences=[F_tt, V_tt, rng_tt],
non_sequences=True,
strict='Y_t') name
Throughout we’ll use data sampled from \(\eqref{eq:basic-dlm-state}\) for demonstration purposes. Our simulation has the following model parameter values:
\[\begin{equation} \begin{gathered} T = 200,\quad M = 2 \\ W_t = \operatorname{diag}\left( \phi_W \right) ,\quad V_t = \operatorname{diag}\left( \phi_V \right) \\ \phi_W = \left(1.1, 10\right),\quad \phi_V = 0.7 \\ G_t = \begin{pmatrix} 1 & 0.1 \\ 0 & 1 \\ \end{pmatrix},\quad F_t = \begin{pmatrix} 1 \\ 0 \end{pmatrix} \\ \theta_0 = \begin{pmatrix} 0 \\ 0 \end{pmatrix} \end{gathered} \label{eq:sim-settings} \end{equation}\]
A sample from \(\eqref{eq:sim-settings}\) is generated in Listing 3.
from theano import function as tt_function
= {
dlm_sim_values 200,
N_obs_tt: 2,
N_theta_tt: '0,2',
G_tt: np.r_[1.0, 0.1],
[0.0, 1.0]].astype(tt.config.floatX),
[1.0],
F_tt: np.r_[[[0.0]]].astype(tt.config.floatX)
[
}
=True).set_state(rng_init_state)
rng_tt.get_value(borrow
= tt_function([N_obs_tt, N_theta_tt, G_tt, F_tt],
simulate_dlm
[Y_t_rv, theta_t_rv],={theta_0_rv: np.r_[0.0, 0.0]},
givens=Y_t_updates)
updates
= simulate_dlm(dlm_sim_values[N_obs_tt], dlm_sim_values[N_theta_tt], dlm_sim_values[G_tt], dlm_sim_values[F_tt])
y_sim, theta_t_sim
= rng_tt.get_value(borrow=True).get_state() rng_sim_state
In Figure 4 we plot a sample from the model in Listing 2 for a fixed RNG seed.
plt.clf()
= plt.subplots(figsize=(8, 4.8))
fig, ax = ax.plot(y_sim, label=r'$y_t$', color='black', linewidth=0.7)
_
plt.tight_layout() plt.legend()

Analytic Posteriors
The standard DLM is essentially a Kalman Filter and enjoys many well documented closed-form results. In the following, we will simply state the relevant prior predictive and posterior results.
Given all the prior and observed data up to time \(t\), \(D_t\), these distribution are given by the following:
\[\begin{align} \theta_{t} \mid D_{t-1} &\sim \operatorname{N}\left( a_{t}, R_{t} \right) \\ y_{t} \mid D_{t-1} &\sim \operatorname{N}\left( f_{t}, Q_{t} \right) \end{align}\]
The prior predictive moments are as follows:
\[\begin{equation} \begin{gathered} a_t = G_t m_{t-1}, \quad R_t = G_t C_{t-1} G_t^\top + W_t \\ f_t = F_t^\top a_{t}, \quad Q_t = F_t^\top C_{t-1} F_t + V_t \end{gathered} \label{eq:dlm-prior-predictive} \end{equation}\]
We’ll also want to compute the posterior moments for \(\theta_t \mid D_t\), which are as follows:
\[\begin{equation} \begin{gathered} m_t = a_{t} + R_t F_t Q_t^{-1} \left(y_t - f_t\right), \quad C_t = R_t - R_t F_t Q_t^{-1} F_t^\top R_t \end{gathered} \label{eq:dlm-post-moments} \end{equation}\]
These “filtered” moments/distributions are one “kind” of posterior result for a DLM, and they only take into account the data up to time \(t\). The other kind are the “smoothed” distributions, which provide posterior distributions for each time \(t\) given all observations preceding and following \(t\).
Notationally, we’ve used \(D_t\) to signify all conditional observations and parameters up to time \(t\), so the smoothed distributions are given by \(\theta_t \mid D_T\), in contrast to \(\theta_t \mid D_t\)
The smoothed \(\theta_t\) distributions are still Gaussian, i.e. \(\left(\theta_t \mid D_T\right) \sim \operatorname{N}\left(s_t, S_t\right)\), and their moments are as follows:
\[\begin{equation} \begin{aligned} s_t &= m_t + C_t G_{t+1}^\top R_{t+1}^{-1} \left( s_{t+1} - a_{t+1} \right) \\ S_t &= C_t - C_t G_{t+1}^\top R_{t+1}^{-1} \left( R_{t+1} - S_{t+1} \right) R_{t+1}^{-1} G_{t+1} C_t \end{aligned} \label{eq:dlm-smooth-moments} \end{equation}\]
In most cases, models will not be as simple as the standard DLM. Even so, these basic closed-form solutions can still be relevant. For instance, efficient MCMC algorithms can be constructed using these closed-form results for conditionally linear models. In those cases, we can compute the posterior moments–in closed-form–conditional on samples generated by other means.
The standard approach is called forward-filtering backward-sampling (FFBS) and uses smoothed posteriors \(\theta_t \mid \theta_{t+1}, D_T\) conditioned on all other parameters.
We’ll build up to forward-backward sampling in what follows, but, first, we need to establish how the requisite quantities can be computed symbolically.
Posterior Estimation
In Listings 6 and 7, we demonstrate how the posterior moments in \(\eqref{eq:dlm-post-moments}\) and \(\eqref{eq:dlm-smooth-moments}\) can be computed in Theano.
Unfortunately, if we attempt to implement the exact closed-form updates in \(\eqref{eq:dlm-post-moments}\) or \(\eqref{eq:dlm-smooth-moments}\), our results will be fraught with numerical errors. This is a very basic issue with naively implemented Kalman filters. The solution to these issues usually involves some analytic reformulations that compensate for the covariance matrix subtractions. The standard approaches generally use some form of matrix decomposition that directly accounts for the positive semi-definite nature of the covariance matrices.
The approach taken here is based on the singular value decomposition (SVD) and effectively computes only one symmetric “half” of the updated covariances. The SVD also allows for easy inversions. See Zhang & Li (1996) for more details, or Petris, Petrone & Campagnoli (2009) for a concise overview of the procedure in the context of DLMs.
import warnings
"ignore", category=FutureWarning, message="Using a non-tuple sequence")
warnings.filterwarnings(
from theano.tensor.nlinalg import matrix_dot
def tt_finite_inv(x, eps_truncate=False):
"""Compute the element-wise reciprocal with special handling for small inputs.
Parameters
==========
x: Tensor-like
The value for which the reciprocal, i.e. `1/x`, is computed.
eps_truncate: bool (optional)
Determines whether or not a floating-point epsilon truncation is used to
upper-bound the returned values.
If not (the default), infinite values are simply set to zero.
"""
if eps_truncate:
= np.finfo(getattr(x, 'dtype', None) or theano.config.floatX).eps
eps return tt.minimum(tt.inv(x), np.reciprocal(np.sqrt(eps)))
else:
= tt.inv(x)
y = y[tt.isinf(y)]
res_subtensor return tt.set_subtensor(res_subtensor, 0.0)
SVD-based Filtering
The SVD forms of the filtering equations in \(\eqref{eq:dlm-post-moments}\) are produced through creative use of the SVDs of its component matrices. Using a slightly modified version of the formulation established in Petris, Petrone & Campagnoli (2009), the SVD for a matrix \(M\) is given by \(M = U_{M} D_{M} V_{M}^\top\). A symmetric matrix then takes the form \(M = U_{M} D_{M} U_{M}^\top\) and its “square-root” is given by \(M = N_M^\top N_M\) with \(N_M = S_{M} U_{M}^\top\) and \(S_{M} = D_{M}^{1/2}\). Likewise, matrix (generalized) inverses take the form \(M^{-1} = U_{M} S_{M}^{-1} U_{M}^\top\).
The idea here is that we can combine these SVD identities to derive square-root relationship between the SVD of \(C_t^{-1}\) and the SVDs of \(C_{t-1}\), \(W_t\), \(V_t\), and \(R_t\), then we can easily invert \(C_t^{-1}\) to arrive at the desired numerically stable SVD of \(C_t\).
First, note that \(N_{R_t}^\top N_{R_t} = G_t C_{t-1} G_t^\top + W_t = R_t\) for
\[\begin{equation} \begin{aligned} N_{R_t} &= \begin{pmatrix} S_{C_{t-1}} U_{C_{t-1}}^\top G_t^\top \\ N_{W_t} \end{pmatrix} \end{aligned} . \label{eq:N_R_t} \end{equation}\]
From this, we know that the SVD of \(R_t\) can be easily derived from the SVD of its square root, \(N_{R_t}\), i.e. \(U_{R_t} = V_{N_{R_t}}\) and \(S_{R_t} = D_{N_{R_t}}\). In other words, we can obtain a matrix’s SVD by computing the SVD of its “half”, which is itself entirely comprised of previous SVD components. The inherent symmetry of our covariance matrices is nicely preserved because we’re only ever using and computing one “half” of these matrices.
With the updated SVD of \(R_t\), we can use the identity \(C_t^{-1} = F_t V_t^{-1} F_t^\top + R_t^{-1}\)–obtained via the classic Sherman-Morrison-Woodbury matrix inverse identity–to employ the same technique as before and produce the SVD of \(C_t^{-1}\) by way of the SVD of yet another block square-root matrix,
\[\begin{equation} \begin{aligned} N_{C_t^{-1}} &= \begin{pmatrix} N_{V_t^{-1}} F_t^\top U_{R_t} \\ S_{R_t}^{-1} \end{pmatrix} \end{aligned} . \label{eq:N_C_t_inv} \end{equation}\]
Again, we compute the SVD of \(N_{C_t^{-1}}\) at this step and obtain \(V_{N_{C_t^{-1}}}\) and \(D_{N_{C_t^{-1}}}\).
This time, the block square-root matrix relationship isn’t so direct, and we have to multiply by \(U_{R_t}\): \(U_{R_t} N_{C_t^{-1}}^\top N_{C_t^{-1}} U_{R_t}^\top = C_t^{-1}\). However, since the additional \(U_{R_t}\) terms are orthogonal, we are able to derive the SVD of \(C_t\) as \(U_{C_t} = U_{R_t} V_{N_{C_t^{-1}}}\) and \(S_{C_t} = D_{N_{C_t^{-1}}}^{-1}\).
These quantities are computed in Listing 6.
from theano.tensor.nlinalg import svd
= tt.specify_shape(tt.col(), [N_obs_tt, 1])
y_tt = 'y_t'
y_tt.name
def filtering_step(y_t, m_tm1, U_C_tm1, S_C_tm1, F_t, G_t, N_W_t, N_V_t_inv):
"""Compute the sequential posterior state and prior predictive parameters."""
# R_t = N_R.T.dot(N_R)
= tt.join(0,
N_R
matrix_dot(S_C_tm1, U_C_tm1.T, G_t.T),
N_W_t)# TODO: All this could be much more efficient if we only computed *one* set of singular
# vectors for these non-square matrices.
= svd(N_R)
_, d_N_R_t, V_N_R_t_T
= V_N_R_t_T.T
U_R_t = tt.diag(d_N_R_t)
S_R_t = tt.diag(tt_finite_inv(d_N_R_t))
S_R_t_inv
= tt.join(0,
N_C_t_inv
matrix_dot(N_V_t_inv, F_t.T, U_R_t),
S_R_t_inv)= svd(N_C_t_inv)
_, d_N_C_t_inv, V_N_C_t_inv_T
= U_R_t.dot(V_N_C_t_inv_T.T)
U_C_t = tt_finite_inv(tt.square(d_N_C_t_inv))
d_C_t = tt.diag(d_C_t)
D_C_t = tt.diag(tt.sqrt(d_C_t))
S_C_t
= matrix_dot(U_C_t, D_C_t, U_C_t.T)
C_t
= G_t.dot(m_tm1)
a_t = F_t.T.dot(a_t)
f_t # A_t = R_t @ F_t @ inv(Q_t) = C_t @ F_t @ inv(V_t)
= a_t + matrix_dot(C_t, F_t, N_V_t_inv.T, N_V_t_inv, y_t - f_t)
m_t
return [m_t, U_C_t, S_C_t, a_t, U_R_t, S_R_t]
= svd(C_0_tt)
_, d_C_0_tt, Vt_C_0_tt = Vt_C_0_tt.T
U_C_0_tt = tt.diag(tt.sqrt(d_C_0_tt))
S_C_0_tt
= svd(W_tt)
_, d_W_tt, Vt_W_tt = Vt_W_tt.T
U_W_tt = tt.sqrt(d_W_tt)
s_W_tt = tt.diag(s_W_tt).dot(U_W_tt.T)
N_W_tt
= svd(tt.as_tensor_variable(V_tt, ndim=2) if V_tt.ndim < 2 else V_tt)
_, D_V_tt, Vt_V_tt = Vt_V_tt.T
U_V_tt = tt.diag(tt.sqrt(tt_finite_inv(D_V_tt, eps_truncate=True)))
S_V_inv_tt = S_V_inv_tt.dot(U_V_tt.T)
N_V_inv_tt
= theano.scan(fn=filtering_step,
filter_res, filter_updates =y_tt,
sequences=[
outputs_info"initial": m_0_tt, "taps": [-1]},
{"initial": U_C_0_tt, "taps": [-1]},
{"initial": S_C_0_tt, "taps": [-1]},
{# a_t, U_R_t, S_R_t
{}, {}, {},
],=[F_tt, G_tt, N_W_tt, N_V_inv_tt],
non_sequences=True,
strict='theta_filtered')
name
= filter_res (m_t, U_C_t, S_C_t, a_t, U_R_t, S_R_t)
SVD-based Smoothing
We can use the techniques above to produce SVD versions of the smoothing equations in \(\eqref{eq:dlm-smooth-moments}\). In this case, some extra steps are required in order to SVD-decompose \(S_t\) in the same manner as \(R_t\) and \(C_t^{-1}\) were.
First, notice that our target, \(S_t\), is a difference of matrices, unlike the matrix sums that comprised \(R_t\) and \(C_t^{-1}\) above. Furthermore, \(S_t\) is given as a difference of a (transformed) difference. To address the latter, we start by expanding \(S_t\) and setting \(B_t = C_t G_{t+1}^\top R_{t+1}^{-1}\) to obtain
\[\begin{equation} \begin{aligned} S_t &= C_t - B_t R_{t+1} B_t^\top + B_t S_{t+1} B_t^\top \\ &= H_t + B_t S_{t+1} B_t^\top \end{aligned} \label{eq:S_t_decomp} \end{equation}\]
Having turned \(S_t\) into a sum of two terms, we can now consider another blocked SVD-based square-root reformulation, which starts with the reformulation of \(H_t\).
We can use the definition of \(R_t = G_{t+1} C_t G_{t+1}^\top + W_{t+1}\) to get
\[\begin{equation} \begin{aligned} H_t &= C_t - B_t R_{t+1} B_t^\top \\ &= C_t - C_t G_{t+1}^\top R_{t+1}^{-1} G_{t+1} C_t \\ &= C_t - C_t G_{t+1}^\top \left(G_{t+1} C_t G_{t+1}^\top + W_{t+1}\right)^{-1} G_{t+1} C_t . \end{aligned} \end{equation}\]
This form of \(H_t\) fits the Woodbury identity and results in \(H_t^{-1} = G_{t+1}^\top W_{t+1}^{-1} G_{t+1} + C_t^{-1}\), which is amenable to our square-root formulation.
Specifically, \(H_t^{-1} = U_{C_t} N_{H_t}^{-\top} N_{H_t}^{-1} U_{C_t}^\top\), where
\[\begin{equation} \begin{aligned} N_{H_t}^{-1} &= \begin{pmatrix} N_{W_{t+1}}^{-1} G_{t+1} U_{C_t} \\ S_{C_t}^{-1} \end{pmatrix} \end{aligned} . \label{eq:N_H_t_inv} \end{equation}\]
By inverting the SVD of \(N_{H_t}^{-1}\) we obtain the SVD of \(H_t\) as \(U_{H_t} = U_{C_t} V_{N_{H_t}^{-1}}\) and \(D_{H_t} = {D_{N_{H_t}^{-1}}}^{-2} = S_{H_t}^2\).
Finally, using \(\eqref{eq:S_t_decomp}\) and \(\eqref{eq:N_H_t_inv}\) we can derive the last blocked square-root decomposition \(S_t = N_{S_t}^\top N_{S_t}\):
\[\begin{equation} \begin{aligned} N_{S_t} &= \begin{pmatrix} S_{H_t} U_{H_t}^\top \\ S_{S_{t+1}} U_{S_{t+1}}^\top B_t^\top \end{pmatrix} \end{aligned} . \label{eq:N_S_t} \end{equation}\]
Again, we take the SVD of \(N_{S_t}\) and derive the SVD of \(S_t\) as \(U_{S_t} = V_{N_{S_t}}\) and \(D_{S_t} = D_{N_{S_t}}^2 = S_{S_t}^2\).
def smoother_step(m_t, U_C_t, S_C_t, a_tp1, U_R_tp1, S_R_tp1, s_tp1, U_S_tp1, S_S_tp1, G_tp1, N_W_tp1_inv):
"""Smooth a series starting from the "forward"/sequentially computed posterior moments."""
= S_C_t.dot(U_C_t.T)
N_C_t
= tt_finite_inv(S_R_tp1)
S_R_tp1_inv = S_R_tp1_inv.dot(U_R_tp1.T)
N_R_tp1_inv
# B_t = C_t @ G_tp1.T @ inv(R_tp1)
= matrix_dot(N_C_t.T, N_C_t, G_tp1.T, N_R_tp1_inv.T, N_R_tp1_inv)
B_t
= tt_finite_inv(S_C_t)
S_C_t_inv
# U_C_t @ N_H_t_inv.T @ N_H_t_inv @ U_C_t.T = G_tp1.T @ W_tp1_inv @ G_tp1 + C_t_inv
= tt.join(0,
N_H_t_inv
matrix_dot(N_W_tp1_inv, G_tp1, U_C_t),
S_C_t_inv)= svd(N_H_t_inv)
_, d_N_H_t_inv, V_N_H_t_T
# H_t = inv(U_C_t @ N_H_t_inv.T @ N_H_t_inv @ U_C_t.T) = C_t - B_t @ R_tp1 @ B_t.T
= U_C_t.dot(V_N_H_t_T.T)
U_H_t = tt.diag(tt_finite_inv(d_N_H_t_inv))
S_H_t
# S_t = N_S_t.T.dot(N_S_t) = C_t - matrix_dot(B_t, R_tp1 - S_tp1, B_t.T)
= tt.join(0,
N_S_t
S_H_t.dot(U_H_t.T),
matrix_dot(S_S_tp1, U_S_tp1.T, B_t.T))= svd(N_S_t)
_, d_N_S_t, V_N_S_t_T
= V_N_S_t_T.T
U_S_t = tt.diag(d_N_S_t)
S_S_t
= m_t + B_t.dot(s_tp1 - a_tp1)
s_t
return [s_t, U_S_t, S_S_t]
= tt.diag(tt_finite_inv(s_W_tt, eps_truncate=True)).dot(U_W_tt.T)
N_W_inv_tt
= m_t[-1]
m_T = U_C_t[-1]
U_C_T = S_C_t[-1]
S_C_T
# These series only go from N_obs - 1 to 1
= theano.scan(fn=smoother_step,
smoother_res, _ =[
sequences"input": m_t, "taps": [-1]},
{"input": U_C_t, "taps": [-1]},
{"input": S_C_t, "taps": [-1]},
{"input": a_t, "taps": [1]},
{"input": U_R_t, "taps": [1]},
{"input": S_R_t, "taps": [1]}
{
],=[
outputs_info"initial": m_T, "taps": [-1]},
{"initial": U_C_T, "taps": [-1]},
{"initial": S_C_T, "taps": [-1]},
{
],=[G_tt, N_W_inv_tt],
non_sequences=True,
go_backwards=True,
strict='theta_smoothed_obs')
name
= smoother_res
(s_t_rev, U_S_t_rev, S_S_t_rev)
= s_t_rev[::-1]
s_t = U_S_t_rev[::-1]
U_S_t = S_S_t_rev[::-1]
S_S_t
= tt.join(0, s_t, [m_T])
s_t = tt.join(0, U_S_t, [U_C_T])
U_S_t = tt.join(0, S_S_t, [S_C_T]) S_S_t
Example
Listing 8 computes the filtered and smoothed means for our simulated series, and Figure 9 shows the results.
= tt_function([y_tt, N_theta_tt, G_tt, F_tt], [m_t, s_t])
filter_smooth_dlm
# phi_W_tt.set_value(phi_W_true)
# phi_V_tt.set_value(phi_V_true)
100.0, 100.0])
phi_W_tt.set_value(np.r_[1.5)
phi_V_tt.set_value(
= filter_smooth_dlm(y_sim, dlm_sim_values[N_theta_tt], dlm_sim_values[G_tt], dlm_sim_values[F_tt]) (m_t_sim, s_t_sim)
from cycler import cycler
= plt_orig_cycler * cycler('linestyle', ['-', '--'])
bivariate_cycler ='all')
plt.close(fig
= plt.subplots(figsize=(8, 4.8))
fig, ax
ax.set_prop_cycle(bivariate_cycler)=r'$\theta_t$', linewidth=0.8, color='black')
ax.plot(theta_t_sim, label=False)
ax.autoscale(enable=r'$E[\theta_t \mid D_{t}]$', alpha=0.9, linewidth=0.8)
ax.plot(m_t_sim, label=r'$E[\theta_t \mid D_{T}]$', alpha=0.9, linewidth=0.8)
ax.plot(s_t_sim, label=0.4)
plt.legend(framealpha plt.tight_layout()

Forward-filtering Backward-sampling
We can use the smoothing and filtering steps in the previous section to perform more efficient MCMC estimation than would otherwise be possible without the Rao-Blackwellization inherent to both steps.
Forward-filtering backward-sampling (Fr-Schnatter 1994) works by first computing the forward filtered moments, allowing one to draw \(\theta_T\) from \(\left(\theta_T \mid D_T\right) \sim \operatorname{N}\left(m_T, C_T\right)\) and, subsequently, \(\theta_t\) from \(\left(\theta_t \mid \theta_{t+1}, D_T \right) \sim \operatorname{N}\left(h_t, H_t\right)\).
The latter distribution’s moments are easily derived from the filtered and smoothed moments:
\[\begin{equation} \begin{gathered} h_t = m_t + B_t \left(\theta_{t+1} - a_{t+1}\right) \\ H_t = C_t - B_t R_{t+1} B^\top_t \end{gathered} \label{eq:ffbs-moments} \end{equation}\]
Since all the quantities in \(\eqref{eq:ffbs-moments}\) appear in the filtering and smoothing moments, we can use the SVD-based approach described earlier to perform the updates and sampling. We reproduce the relevant subset of calculations in Listing 10.
def ffbs_step(m_t, U_C_t, S_C_t, a_tp1, U_R_tp1, S_R_tp1, theta_tp1, F_tp1, G_tp1, N_W_tp1_inv, rng):
"""Perform forward-filtering backward-sampling."""
= tt_finite_inv(S_C_t)
S_C_t_inv
# H_t_inv = U_C_t @ N_H_t_inv.T @ N_H_t_inv @ U_C_t.T = G_tp1^T @ W_tp1_inv @ G_tp1.T + C_t_inv
= tt.join(0,
N_H_t_inv
matrix_dot(N_W_tp1_inv, G_tp1, U_C_t),
S_C_t_inv)= svd(N_H_t_inv)
_, d_N_H_t_inv, V_N_H_t_inv_T
= U_C_t.dot(V_N_H_t_inv_T.T)
U_H_t = tt_finite_inv(d_N_H_t_inv)
s_H_t
= S_C_t.dot(U_C_t.T)
N_C_t
= tt_finite_inv(S_R_tp1)
S_R_tp1_inv = S_R_tp1_inv.dot(U_R_tp1.T)
N_R_tp1_inv
# B_t = C_t @ G_tp1.T @ inv(R_tp1)
# B_t = matrix_dot(U_H_t * s_H_t, s_H_t * U_H_t.T,
# G_tp1.T, N_W_tp1_inv.T, N_W_tp1_inv)
= matrix_dot(N_C_t.T, N_C_t, G_tp1.T, N_R_tp1_inv.T, N_R_tp1_inv)
B_t
= m_t + B_t.dot(theta_tp1 - a_tp1)
h_t = 'h_t'
h_t.name
# TODO: Add an option or optimization to use the SVD to sample in
# `MvNormalRV`.
# theta_t = MvNormalRV(h_t, H_t, rng=rng, name='theta_t_ffbs')
= h_t + tt.dot(U_H_t, s_H_t *
theta_t
MvNormalRV(tt.zeros_like(h_t),0]),
tt.eye(h_t.shape[=rng))
rng
# These are statistics we're gathering for other posterior updates
= theta_tp1 - G_tp1.dot(theta_t)
theta_tp1_diff = F_tp1.T.dot(theta_t)
f_tp1
# Sequentially sample/update quantities conditional on `theta_t` here...
return [theta_t, theta_tp1_diff, f_tp1]
# C_T = matrix_dot(U_C_T, tt.square(S_C_T), U_C_T.T)
# theta_T_post = MvNormalRV(m_T, C_T, rng=rng_tt)
= m_T + matrix_dot(U_C_T, S_C_T,
theta_T_post
MvNormalRV(tt.zeros_like(m_T),0]),
tt.eye(m_T.shape[=rng_tt))
rng= "theta_T_post"
theta_T_post.name
= theano.scan(fn=ffbs_step,
ffbs_output, ffbs_updates =[
sequences"input": m_t, "taps": [-1]},
{"input": U_C_t, "taps": [-1]},
{"input": S_C_t, "taps": [-1]},
{"input": a_t, "taps": [1]},
{"input": U_R_t, "taps": [1]},
{"input": S_R_t, "taps": [1]}
{
],=[
outputs_info"initial": theta_T_post, "taps": [-1]},
{# theta_tp1_diff, f_tp1
{}, {},
],=[F_tt, G_tt, N_W_inv_tt, rng_tt],
non_sequences=True,
go_backwards=True,
strict='ffbs_samples')
name
= ffbs_output
(theta_t_post_rev, theta_t_diff_rev, f_t_rev)
= tt.join(0, theta_t_post_rev[::-1], [theta_T_post])
theta_t_post
# We need to add the missing end-points onto these statistics...
= tt.join(0, f_t_rev[::-1], [F_tt.T.dot(theta_T_post)]) f_t_post
Quantities besides the state values, \(\theta_t\), can be sampled sequentially (i.e. within the function ffbs_step
in Listing 10), or after FFBS when all \(\theta_t \mid D_T\) have been sampled. These quantities can use the conditionally Gaussian form of \(\left(\theta_t \mid \theta_{t+1}, D_T \right)\) to derive Gibbs steps, further Rao-Blackwellize hierarchical quantities, or apply any other means of producing posterior samples conditional on \(\left(\theta_t \mid \theta_{t+1}, D_T \right)\).
In our simulation example, we will further augment our original model by adding the classic conjugate gamma priors to our previously fixed state and observation precision parameters, \(\phi_W\) and \(\phi_V\), respectively:
\[\begin{equation} \begin{gathered} \phi_{W} \sim \operatorname{Gamma}\left( a_W, b_W \right) ,\quad \phi_{V} \sim \operatorname{Gamma}\left( a_V, b_V \right) \end{gathered} \label{eq:phi-gamma-priors} \end{equation}\]
This classical conjugate prior allows one to derive simple closed-form posteriors for a Gibbs sampler conditional on \(y_t\), \(\theta_t\), and \(\theta_{t-1}\), as follows:
\[\begin{equation} \begin{gathered} \phi_{W} \mid D_T \sim \operatorname{Gamma}\left( a_W + \frac{N}{2}, b_W + \frac{1}{2} \sum_{t=1}^N \left( \theta_t - G_t \theta_{t-1} \right)^2 \right) ,\quad \phi_{V} \mid D_T \sim \operatorname{Gamma}\left( a_V + \frac{N}{2}, b_V + \frac{1}{2} \sum_{t=1}^N \left( y_t - F_t^\top \theta_t \right)^2 \right) \end{gathered} \label{eq:phi-gamma-posteriors} \end{equation}\]
Those posterior computations are implemented in Listings 11 and 12, and they are used to update the shared Theano variables for \(\phi_W\) and \(\phi_V\) within a Gibbs sampling loop in Listing 13.
Simulation Example
= theano.shared(np.r_[2.5, 2.5]), theano.shared(np.r_[0.5, 0.5])
phi_W_a, phi_W_b
= phi_W_a + N_obs_tt * 0.5
phi_W_a_post_tt = tt.square(theta_t_diff_rev).sum(0)
phi_W_SS_tt = phi_W_b + 0.5 * phi_W_SS_tt
phi_W_b_post_tt = GammaRV(phi_W_a_post_tt, phi_W_b_post_tt, rng=rng_tt, name='phi_W_post') phi_W_post_tt
= theano.shared(0.125), theano.shared(0.25)
phi_V_a, phi_V_b
= phi_V_a + N_obs_tt * 0.5
phi_V_a_post_tt = tt.square(y_tt - f_t_post).sum()
phi_V_SS_tt = phi_V_b + 0.5 * phi_V_SS_tt
phi_V_b_post_tt = GammaRV(phi_V_a_post_tt, phi_V_b_post_tt, rng=rng_tt, name='phi_V_post') phi_V_post_tt
= tt_function([y_tt, N_obs_tt, N_theta_tt, G_tt, F_tt],
ffbs_dlm
[theta_t_post, phi_W_post_tt, phi_V_post_tt,
phi_W_SS_tt, phi_V_SS_tt],=ffbs_updates)
updates
=True).set_state(rng_sim_state)
rng_tt.get_value(borrow
= phi_W_a.get_value()/phi_W_b.get_value()
phi_W_0 = phi_V_a.get_value()/phi_V_b.get_value()
phi_V_0
phi_W_tt.set_value(phi_W_0)
phi_V_tt.set_value(phi_V_0)
= 0
chain = r'$\theta_t \mid D_T$'
theta_label = r'$\phi_W \mid D_T$'
phi_W_label = r'$\phi_V \mid D_T$'
phi_V_label = None, None, None
theta_t_post_sim, phi_W_post_sim, phi_V_post_sim = {theta_label: [[]], phi_W_label: [[]], phi_V_label: [[]]}
posterior_samples
for i in range(1000):
= ffbs_dlm(
theta_t_post_sim, phi_W_post_sim, phi_V_post_sim, phi_W_SS_sim, phi_V_SS_sim
y_sim,
dlm_sim_values[N_obs_tt], dlm_sim_values[N_theta_tt],
dlm_sim_values[G_tt], dlm_sim_values[F_tt])
# Update variance precision parameters
phi_W_tt.set_value(phi_W_post_sim)
phi_V_tt.set_value(phi_V_post_sim)
posterior_samples[theta_label][chain].append(theta_t_post_sim)
posterior_samples[phi_W_label][chain].append(phi_W_post_sim)
posterior_samples[phi_V_label][chain].append(phi_V_post_sim)
print(f'i={i},\tphi_W={phi_W_post_sim}\t({phi_W_SS_sim}),\tphi_V={phi_V_post_sim} ({phi_V_SS_sim})')
= {k: np.asarray(v) for k,v in posterior_samples.items()} posterior_samples
Figure 14 shows the posterior \(\theta_t\) samples and Figure 15 plots the posterior sample traces.
plt.clf()
= plt.subplots(figsize=(8, 4.8))
fig, ax =False)
ax.autoscale(enable
# bivariate_cycler = cycler('linestyle', ['-', '--']) * plt_orig_cycler
# ax.set_prop_cycle(bivariate_cycler)
= posterior_samples[theta_label][0].shape
thetas_shape
= ax._get_lines.prop_cycler
cycle
= cycler('linestyle', ['-', '--']) * cycler('color', ['black'])
bivariate_obs_cycler
ax.set_prop_cycle(bivariate_obs_cycler)=r'$\theta_t$', linewidth=1.0)
ax.plot(theta_t_sim, label
=True)
ax.autoscale(enable=False)
ax.autoscale(enable
for d in range(thetas_shape[-1]):
= next(cycle)
styles = posterior_samples[theta_label][0].T[d].T
thetas
= np.empty(thetas_shape[:-1] + (2,))
theta_lines 0] = np.tile(np.arange(thetas_shape[-2]), [thetas_shape[-3], 1]).T
theta_lines.T[1] = thetas.T
theta_lines.T[
ax.add_collection(
LineCollection(theta_lines,=theta_label,
label=0.3, linewidth=0.9,
alpha**styles)
)
plt.tight_layout()
=0.4) plt.legend(framealpha

import arviz as az
= az.from_dict(posterior=posterior_samples)
az_trace =True) az.plot_trace(az_trace, compact

'all')
plt.close(
= plt.subplots(figsize=(8, 4.8))
fig, ax
=r'$y_t$', linewidth=1.0, color='black')
ax.plot(y_sim, label
=True)
ax.autoscale(enable=False)
ax.autoscale(enable
= np.dot(posterior_samples[theta_label][0], dlm_sim_values[F_tt].squeeze())
f_t_ordinates
= np.empty(f_t_ordinates.shape + (2,))
f_t_lines 0] = np.tile(np.arange(f_t_ordinates.shape[1]), [f_t_ordinates.shape[0], 1]).T
f_t_lines.T[1] = f_t_ordinates.T
f_t_lines.T[
ax.add_collection(
LineCollection(f_t_lines,=r'$E[y_t \mid \theta_t, D_T]$',
label=0.3, linewidth=0.9, color='red')
alpha
)
plt.tight_layout()
=0.4) plt.legend(framealpha

Non-Gaussian Extension
Let’s say we want to model count observations that are driven by a smooth time-varying process. We can assume a negative-binomial observation model with a log-link function–in standard GLM fashion (McCullagh & Nelder 1989)–and connect it to the same state and observation dynamics as the basic DLM in \(\eqref{eq:basic-dlm-state}\) via its mean \(\mu_t = \exp\left(\eta_t\right)\):
\[\begin{equation} \begin{aligned} Y_t &\sim \operatorname{NB}\left(r, p_t\right) \\ E[Y_t \mid \theta_t] &= \mu_t = \exp\left( F_t^\top \theta_t \right) \\ \theta_t &= G_t \theta_{t-1} + \nu_t, \quad \nu_t \sim \operatorname{N}\left( 0, W \right) \end{aligned} \label{eq:nb-dlm} \end{equation}\]
Under the parameterization \(p_t = \frac{\mu_t}{\mu_t + r}\), the negative-binomial density function takes the following form:
\[\begin{equation} \begin{aligned} \operatorname{p}\left(Y_t = y_t \mid r, p_t\right) &= \frac{\Gamma\left( y_t + r \right)}{y_t!\,\Gamma(r)} \left( 1 - p_t \right)^r \left( p_t \right)^{y_t} \\ &= \frac{\Gamma\left( y_t + r \right)}{y_t!\,\Gamma(r)} \left( \frac{r}{r + \mu_t} \right)^r \left( \frac{\mu_t}{r + \mu_t} \right)^{y_t} \\ &= \frac{\Gamma\left( y_t + r \right)}{y_t!\,\Gamma(r)} \frac{\left( \mu_t / r \right)^{y_t}}{\left( 1 + \mu_t / r \right)^{r + y_t}} \\ &= \frac{\Gamma\left( y_t + r \right)}{y_t!\,\Gamma(r)} \frac{\left( e^{\eta_t - \log r} \right)^{y_t}}{\left( 1 + e^{\eta_t - \log r} \right)^{r + y_t}} . \end{aligned} \label{eq:nb-pmf} \end{equation}\]
The logit-inverse form in \(\eqref{eq:nb-pmf}\) has a Gaussian scale-mixture representation in the Pólya-Gamma distribution (Polson, Scott & Windle 2013). Said scale-mixture is as follows:
\[\begin{equation} \begin{aligned} \frac{e^{\psi a}}{\left(1 + e^{\psi}\right)^b} &= 2^{-b} e^{\kappa \psi} \int^{\infty}_{0} e^{-\frac{\omega}{2} \psi^2} \operatorname{p}(\omega) d\omega \\ &= 2^{-b} \int^{\infty}_{0} e^{-\frac{\omega}{2} \left( \psi - \frac{\kappa}{\omega} \right)^2} e^{\frac{\kappa^2}{2 \omega} } \operatorname{p}(\omega) d\omega . \end{aligned} \label{eq:pg-identity} \end{equation}\]
where \(\kappa = a - b / 2\) and \(\omega_t \sim \operatorname{PG}\left(b, 0\right)\).
When the Gaussian scale-mixture identity of \(\eqref{eq:pg-identity}\) is applied to our observation model in \(\eqref{eq:nb-pmf}\), \(a = y_t\), \(b = r + y_t\), \(\kappa = (y_t - r)/2\), and \(\psi = \eta_t - \log r\).
From the scale mixture formulation, we obtain the following augmented joint observation model density:
\[\begin{equation} \begin{aligned} \operatorname{p}(\theta_t \mid \omega, y_t, r, D_{t-1}) &\propto \exp\left(-\frac{\omega}{2} \left( F_t^\top \theta_t - \left( \log r + \frac{y_t - r}{2 \omega} \right) \right)^2 \right) \operatorname{p}(\theta_t \mid D_{t-1}) \\ &= e^{-\frac{\omega}{2} \left(y^*_t - F_t^\top \theta_t \right)^2} \operatorname{p}(\theta_t \mid D_{t-1}) \\ &\propto \operatorname{p}\left( y^*_t \mid F_t^\top \theta_t, \omega^{-1} \right) \operatorname{p}(\theta_t \mid D_{t-1}) \end{aligned} \label{eq:nb-aug-obs} \end{equation}\]
where \(y^*_t = \log r + \frac{y_t - r}{2 \omega}\).
The reformulation at the end of \(\eqref{eq:nb-aug-obs}\) characterizes a distribution of “virtual” observations,
\[\begin{equation} y^*_t = F_t^\top \theta_t + \epsilon^*_t,\quad \epsilon^*_t \sim \operatorname{N}\left(0, V^*_t \right) , \label{eq:nb-virtual-obs} \end{equation}\]
with \(V_t = \operatorname{diag}\left( \omega^{-1}_t \right)\).
The augmented observation equation in \(\eqref{eq:nb-virtual-obs}\) recovers our original DLM formulation in \(\eqref{eq:basic-dlm-obs}\) and–by comparison–details the changes needed to use the Pólya-Gamma scale-mixture. Specifically, we need to sample \(\omega_t \sim \operatorname{PG}\left(r + y_t, F_t^\top \theta_t - \log r \right)\) and perform the following substitutions: \(y_t \to y^*_t\) and \(V_t \to V^*_t\).
Simulation Example
Listing 17 creates a Theano graph for negative-binomial model defined in \(\eqref{eq:nb-dlm}\).
from symbolic_pymc.theano.random_variables import NegBinomialRV
= tt.iscalar('r')
r_tt
def nb_obs_step(theta_t, F_t, r, rng):
= tt.exp(F_t.T.dot(theta_t))
mu_t = mu_t / (mu_t + r)
p_t = NegBinomialRV(r, (1. - p_t), rng=rng, name='y_t')
y_t return y_t, p_t
= theano.scan(fn=nb_obs_step,
nb_obs_res, nb_Y_t_updates =[theta_t_rv],
sequences=[F_tt, r_tt, rng_tt],
non_sequences=[
outputs_info# y_t, p_t
{}, {},
],=True,
strict='Y_t')
name
= nb_obs_res nb_Y_t_rv, nb_p_t_tt
Listing 18 specifies parameters for a simulation from \(\eqref{eq:nb-dlm}\) and samples a series.
= dlm_sim_values.copy()
nb_dlm_sim_values = np.array([[1.0],
nb_dlm_sim_values[F_tt] 0.0]], dtype=theano.config.floatX)
[= np.array([[1.0, 0.1],
nb_dlm_sim_values[G_tt] 0.0, 0.8]], dtype=theano.config.floatX)
[= 1000
nb_dlm_sim_values[r_tt]
10.0, 10.0])
phi_W_tt.set_value(np.r_[
=True).set_state(rng_init_state)
rng_tt.get_value(borrow
= tt_function([N_obs_tt, N_theta_tt, G_tt, F_tt, r_tt],
simulate_nb_dlm
[nb_Y_t_rv, theta_t_rv, nb_p_t_tt],={theta_0_rv: np.r_[1.0, 0.5]},
givens=nb_Y_t_updates)
updates
= simulate_nb_dlm(nb_dlm_sim_values[N_obs_tt],
sim_nb_res
nb_dlm_sim_values[N_theta_tt],
nb_dlm_sim_values[G_tt],
nb_dlm_sim_values[F_tt],
nb_dlm_sim_values[r_tt])
= sim_nb_res nb_y_sim, nb_theta_t_sim, nb_p_t_sim
In Figure 19 we plot the sample generated in Listing 17.
plt.clf()
= plt.subplots(figsize=(8, 4.8))
fig, ax = ax.plot(nb_y_sim, label=r'$y_t$', color='black', linewidth=1.2, drawstyle='steps-pre')
_
plt.tight_layout() plt.legend()

plt.clf()
= plt.subplots(figsize=(8, 4.8))
fig, ax =False)
ax.autoscale(enable
= cycler('linestyle', ['-', '--']) * cycler('color', ['black'])
bivariate_obs_cycler
ax.set_prop_cycle(bivariate_obs_cycler)=r'$\theta_t$', linewidth=1.0)
ax.plot(nb_theta_t_sim, label
=True)
ax.autoscale(enable
plt.tight_layout()
=0.4) plt.legend(framealpha

plt.clf()
= plt.subplots(figsize=(8, 4.8))
fig, ax
=r'$p_t$', linewidth=1.0, color='black')
ax.plot(nb_p_t_sim, label
plt.tight_layout()=0.4) plt.legend(framealpha

Augmented FFBS Sampler
In order to create a FFBS sampler for our Pólya-Gamma DLM in \(\eqref{eq:nb-aug-obs}\), we need to update the filtering code to use time-varying “virtual” observation variances, \(V^*_t\). After this change is made, all Theano graphs that depend on the resulting objects need to be recreated, as well. This is done in Listing 22.
= tt.specify_shape(tt.col(), [N_obs_tt, 1])
V_t_tt = 'V_t'
V_t_tt.name
def nb_filtering_step(y_t, V_t, m_tm1, U_C_tm1, S_C_tm1, F_t, G_t, N_W_t):
= tt.diag(tt_finite_inv(tt.sqrt(V_t), eps_truncate=True))
N_V_t_inv return filtering_step(y_t, m_tm1, U_C_tm1, S_C_tm1, F_t, G_t, N_W_t, N_V_t_inv)
= theano.scan(fn=nb_filtering_step,
filter_res, filter_updates =[y_tt, V_t_tt],
sequences=[
outputs_info"initial": m_0_tt, "taps": [-1]},
{"initial": U_C_0_tt, "taps": [-1]},
{"initial": S_C_0_tt, "taps": [-1]},
{# a_t, U_R_t, S_R_t
{}, {}, {}
],=[F_tt, G_tt, N_W_tt],
non_sequences=True,
strict='theta_filtered')
name
= filter_res
(m_t, U_C_t, S_C_t, a_t, U_R_t, S_R_t)
= m_t[-1]
m_T = U_C_t[-1]
U_C_T = S_C_t[-1]
S_C_T
= matrix_dot(U_C_T, tt.square(S_C_T), U_C_T.T)
C_T = MvNormalRV(m_T, C_T, rng=rng_tt)
theta_T_post = "theta_T_post"
theta_T_post.name
= theano.scan(fn=ffbs_step,
ffbs_output, ffbs_updates =[
sequences"input": m_t, "taps": [-1]},
{"input": U_C_t, "taps": [-1]},
{"input": S_C_t, "taps": [-1]},
{"input": a_t, "taps": [1]},
{"input": U_R_t, "taps": [1]},
{"input": S_R_t, "taps": [1]}
{
],=[
outputs_info"initial": theta_T_post, "taps": [-1]},
{# theta_tp1_diff, f_tp1
{}, {},
],=[F_tt, G_tt, N_W_inv_tt, rng_tt],
non_sequences=True,
go_backwards=True,
strict='ffbs_samples')
name
= ffbs_output
(theta_t_post_rev, theta_t_diff_rev, f_t_rev)
= tt.join(0, theta_t_post_rev[::-1], [theta_T_post])
theta_t_post
= tt.join(0, f_t_rev[::-1], [F_tt.T.dot(theta_T_post)]) f_t_post
= phi_W_a + N_obs_tt * 0.5
phi_W_a_post_tt = phi_W_b + 0.5 * tt.square(theta_t_diff_rev).sum(0)
phi_W_b_post_tt = GammaRV(phi_W_a_post_tt, phi_W_b_post_tt, rng=rng_tt, name='phi_W_post') phi_W_post_tt
Simulation Example
In Listing 24 we sample the initial values and create Theano terms for posterior/updated \(\omega_t\)-related values.
from pypolyagamma import PyPolyaGamma
from symbolic_pymc.theano.random_variables import PolyaGammaRV
= theano.shared(nb_y_sim.astype(theano.config.floatX), name='y_raw_t')
y_raw_tt
= np.array(nb_dlm_sim_values[r_tt], dtype='double')
r_sim
# XXX: testing
= np.dot(nb_theta_t_sim, nb_dlm_sim_values[F_tt])
F_t_theta_0
= np.empty(nb_y_sim.shape[0], dtype='double')
omega_0 12344).pgdrawv(r_sim + nb_y_sim.squeeze(),
PyPolyaGamma(- np.log(r_sim),
F_t_theta_0.squeeze()
omega_0)= np.expand_dims(omega_0, -1)
omega_0
= np.log(r_sim) + (nb_y_sim - r_sim) / (2.0 * omega_0)
y_aug_0
= theano.shared(omega_0, name='omega_t')
omega_t_tt
= PolyaGammaRV(r_tt + y_raw_tt, theta_t_post.dot(F_tt) - tt.log(r_tt), rng=rng_tt, name='omega_post')
omega_post_tt
= tt.log(r_tt) + (y_raw_tt - r_tt) / (2.0 * omega_post_tt)
y_aug_post_tt = 'y_aug_post' y_aug_post_tt.name
Finally, the sampler steps are defined and executed in Listing 25.
= tt_function([N_obs_tt, N_theta_tt, y_tt, G_tt, F_tt, V_t_tt, r_tt],
nb_ffbs_dlm
[theta_t_post, phi_W_post_tt, omega_post_tt, y_aug_post_tt],=ffbs_updates)
updates
=True).set_state(rng_sim_state)
rng_tt.get_value(borrow
10.0, 10.0])
phi_W_tt.set_value(np.r_[
= 0
chain
= r'$\theta_t \mid D_T$'
theta_label = r'$\phi_W \mid D_T$'
phi_W_label = r'$\omega_t \mid D_T$'
omega_label
= None, None, None, None
theta_t_post_sim, phi_W_post_sim, omega_post_sim, y_aug_post_sim = {theta_label: [[]], phi_W_label: [[]], omega_label: [[]]}
nb_posterior_samples
= np.reciprocal(omega_0)
V_t_sim = y_aug_0
y_aug_sim
for i in range(1000):
= nb_ffbs_dlm(
nb_ffbs_res
nb_dlm_sim_values[N_obs_tt],
nb_dlm_sim_values[N_theta_tt],
y_aug_sim,
nb_dlm_sim_values[G_tt],
nb_dlm_sim_values[F_tt],
V_t_sim,
nb_dlm_sim_values[r_tt])
= nb_ffbs_res
theta_t_post_sim, phi_W_post_sim, omega_post_sim, y_aug_post_sim
phi_W_tt.set_value(phi_W_post_sim)
omega_t_tt.set_value(omega_post_sim)
= np.reciprocal(omega_post_sim)
V_t_sim = y_aug_post_sim
y_aug_sim
nb_posterior_samples[theta_label][chain].append(theta_t_post_sim)
nb_posterior_samples[phi_W_label][chain].append(phi_W_post_sim)
nb_posterior_samples[omega_label][chain].append(omega_post_sim)
print(f'i={i},\tphi_W={phi_W_post_sim}')
= {k: np.asarray(v) for k,v in nb_posterior_samples.items()} nb_posterior_samples
Figure 26 shows the posterior \(\theta_t\) samples, Figure 27 plots the posterior sample traces, and Figure 28 shows \(p_t \mid \theta_t\).
plt.clf()
= plt.subplots(figsize=(8, 4.8))
fig, ax =False)
ax.autoscale(enable
= nb_posterior_samples[theta_label][0].shape
thetas_shape
= ax._get_lines.prop_cycler
cycle
= cycler('linestyle', ['-', '--']) * cycler('color', ['black'])
bivariate_obs_cycler
ax.set_prop_cycle(bivariate_obs_cycler)=r'$\theta_t$', linewidth=1.0)
ax.plot(nb_theta_t_sim, label
=True)
ax.autoscale(enable=False)
ax.autoscale(enable
for d in range(thetas_shape[-1]):
= next(cycle)
styles = nb_posterior_samples[theta_label][0].T[d].T
thetas
= np.empty(thetas_shape[:-1] + (2,))
theta_lines 0] = np.tile(np.arange(thetas_shape[-2]), [thetas_shape[-3], 1]).T
theta_lines.T[1] = thetas.T
theta_lines.T[
ax.add_collection(
LineCollection(theta_lines,=theta_label,
label=0.05, linewidth=0.9,
alpha**styles)
)
plt.tight_layout()
=0.4) plt.legend(framealpha

import arviz as az
= az.from_dict(posterior=nb_posterior_samples)
az_trace =True) az.plot_trace(az_trace, compact

'all')
plt.close(
= plt.subplots(figsize=(8, 4.8))
fig, ax
=r'$p_t$', linewidth=1.0, color='black')
ax.plot(nb_p_t_sim, label
=True)
ax.autoscale(enable=False)
ax.autoscale(enable
= np.exp(np.dot(nb_posterior_samples[theta_label][0], nb_dlm_sim_values[F_tt].squeeze()))
mu_t_sim = mu_t_sim / (mu_t_sim + nb_dlm_sim_values[r_tt])
p_t_sim
= np.empty(p_t_sim.shape + (2,))
p_t_lines 0] = np.tile(np.arange(p_t_sim.shape[1]), [p_t_sim.shape[0], 1]).T
p_t_lines.T[1] = p_t_sim.T
p_t_lines.T[
ax.add_collection(
LineCollection(p_t_lines,=r'$p_t \mid \theta_t, D_T$',
label=0.3, linewidth=0.9, color='red')
alpha
)
plt.tight_layout()
=0.4) plt.legend(framealpha

COVID–19 Example
As another example, we’ll use data from the COVID–19 outbreak in New York and we’ll only concentrate on the positive test counts. We do not make any assertions about the results; we simply want to apply our implementation to a real-life dataset. If anything, we would simply like to highlight the potential provided by these flexible and constructive Bayesian frameworks and offer a concrete starting point for motivated statisticians and Python developers.
To start, Listing 29 pulls and formats the COVID–19 data and Figure 30 plots it.
import pandas as pd
= 'https://covidtracking.com/api/v1/states/daily.csv'
url = pd.read_csv(url, parse_dates=['date'], index_col=['state','date']).sort_index()
states = 'NY'
state
= states.loc[state, ['positive', 'total']].diff().loc['2020-03-20':]
counts = ['new_positive_tests', 'new_total_tests'] counts.columns
plt.clf()
= plt.subplots(figsize=(8, 4.8))
fig, ax = ax.plot(counts.new_positive_tests, label=r'$y_t$', color='black', linewidth=1.2, drawstyle='steps-pre')
_
plt.tight_layout() plt.legend()

Our example model is a simple Gaussian walk with an unknown evolution variance (i.e. again, modeled by a gamma prior on the precision \(\phi_W\)) given by the following values:
\[\begin{equation} \begin{gathered} F_t = \begin{pmatrix} 1 \end{pmatrix} ,\quad G_t = \begin{pmatrix} 1 \end{pmatrix} \\ W_t = \begin{pmatrix} \phi_{W}^{-1} \end{pmatrix} , \quad \phi_{W} \sim \operatorname{Gamma}\left(2.5, 0.5\right) , \quad \phi_{W} = 10 \end{gathered} \label{eq:ny-model-settings} \;. \end{equation}\]
Again, we set \(r = 1000\), effectively making our negative-binomial a Poisson approximation. Listing 31 sets the initial values in \(\eqref{eq:ny-model-settings}\) and Listing 32 specifies the sampler loop.
= {
nb_dlm_ny_values 0],
N_obs_tt: counts.new_positive_tests.shape[1
N_theta_tt:
}= np.array([[1.0]], dtype=theano.config.floatX)
nb_dlm_ny_values[F_tt] = np.array([[1.0]], dtype=theano.config.floatX)
nb_dlm_ny_values[G_tt] = 1000
nb_dlm_ny_values[r_tt]
2.5])
phi_W_a.set_value(np.r_[0.5])
phi_W_b.set_value(np.r_[10.0])
phi_W_tt.set_value(np.r_[
=True).set_state(rng_init_state)
rng_tt.get_value(borrow
-1).astype(theano.config.floatX))
y_raw_tt.set_value(np.expand_dims(counts.new_positive_tests,
= np.array(nb_dlm_ny_values[r_tt], dtype='double')
r_ny
= np.empty(counts.new_positive_tests.shape[0], dtype='double')
omega_0 12344).pgdrawv(r_ny + counts.new_positive_tests.values,
PyPolyaGamma(- np.log(r_ny),
F_t_theta_0.squeeze()
omega_0)= np.expand_dims(omega_0, -1)
omega_0
= np.log(r_ny) + (counts.new_positive_tests.values[:, np.newaxis] - r_ny) / (2.0 * omega_0)
y_aug_0
omega_t_tt.set_value(omega_0)
= tt_function([N_obs_tt, N_theta_tt, y_tt, G_tt, F_tt, V_t_tt, r_tt],
nb_ffbs_dlm
[theta_t_post, phi_W_post_tt, omega_post_tt, y_aug_post_tt],# mode='FAST_COMPILE',
=ffbs_updates)
updates
=True).set_state(rng_sim_state)
rng_tt.get_value(borrow
= 0
chain
= r'$\theta_t \mid D_T$'
theta_label = r'$\phi_W \mid D_T$'
phi_W_label = r'$\omega_t \mid D_T$'
omega_label
= None, None, None, None
theta_t_post_sim, phi_W_post_sim, omega_post_sim, y_aug_post_sim = {theta_label: [[]], phi_W_label: [[]], omega_label: [[]]}
nb_ny_posterior_samples
= np.reciprocal(omega_0)
V_t_sim = y_aug_0
y_aug_sim
for i in range(5000):
= nb_ffbs_dlm(
nb_ffbs_res
nb_dlm_ny_values[N_obs_tt],
nb_dlm_ny_values[N_theta_tt],
y_aug_sim,
nb_dlm_ny_values[G_tt],
nb_dlm_ny_values[F_tt],
V_t_sim,
nb_dlm_ny_values[r_tt])
= nb_ffbs_res
theta_t_post_sim, phi_W_post_sim, omega_post_sim, y_aug_post_sim
phi_W_tt.set_value(phi_W_post_sim)
omega_t_tt.set_value(omega_post_sim)
= np.reciprocal(omega_post_sim)
V_t_sim = y_aug_post_sim
y_aug_sim
nb_ny_posterior_samples[theta_label][chain].append(theta_t_post_sim)
nb_ny_posterior_samples[phi_W_label][chain].append(phi_W_post_sim)
nb_ny_posterior_samples[omega_label][chain].append(omega_post_sim)
print(f'i={i},\tphi_W={phi_W_post_sim}')
# Convert and thin samples
= {k: np.asarray(v)[:, 1000:] for k,v in nb_ny_posterior_samples.items()} nb_ny_posterior_samples
import arviz as az
= az.from_dict(posterior=nb_ny_posterior_samples)
az_trace =True) az.plot_trace(az_trace, compact

= plt.subplots(figsize=(8, 4.8))
fig, ax
= nb_ny_posterior_samples[theta_label][0].shape
thetas_shape
= ax._get_lines.prop_cycler
cycle
= cycler('linestyle', ['-', '--']) * cycler('color', ['black'])
bivariate_obs_cycler
ax.set_prop_cycle(bivariate_obs_cycler)
for d in range(thetas_shape[-1]):
= next(cycle)
styles = nb_ny_posterior_samples[theta_label][0].T[d].T
thetas
= np.empty(thetas_shape[:-1] + (2,))
theta_lines 0] = np.tile(np.arange(thetas_shape[-2]), [thetas_shape[-3], 1]).T
theta_lines.T[1] = thetas.T
theta_lines.T[
ax.add_collection(
LineCollection(theta_lines,=theta_label,
label=0.05, linewidth=0.9,
alpha**styles)
)
=True)
ax.autoscale(enable
plt.tight_layout()
=0.4) plt.legend(framealpha

import scipy
'all')
plt.close(
= plt.subplots(figsize=(8, 4.8))
fig, ax
=r'$y_t$', linewidth=1.0, color='black')
ax.plot(counts.new_positive_tests.values, label
= np.exp(np.dot(nb_ny_posterior_samples[theta_label][0], nb_dlm_ny_values[F_tt].squeeze()))
mu_t_sim = mu_t_sim / (mu_t_sim + nb_dlm_ny_values[r_tt])
p_t_sim
= scipy.stats.nbinom.rvs(nb_dlm_ny_values[r_tt], (1. - p_t_sim)).squeeze()
y_t_sim
= np.empty(y_t_sim.shape + (2,))
y_t_lines 0] = np.tile(np.arange(y_t_sim.shape[1]), [y_t_sim.shape[0], 1]).T
y_t_lines.T[1] = y_t_sim.T
y_t_lines.T[
ax.add_collection(
LineCollection(y_t_lines,=r'$y_t \mid \theta_t, D_T$',
label=0.3, linewidth=0.9, color='red'),
alpha
)
=True)
ax.autoscale(enable
plt.tight_layout()
=0.4) plt.legend(framealpha

Discussion
We’ve implemented a highly extensible Bayesian timeseries framework with respectable model coverage (e.g. ARIMA models, polynomial trends, Fourier seasonality, dynamic regression, etc.), and shown how such a basic class can be extended to non-Gaussian observations. Additionally, we were able to show how an existing model-specialized sampler, i.e. FFBS, can be adapted alongside such an extension using the conditional linearity provided by a Gaussian scale-mixture.
In general, the same scale-mixture and conditional linearity techniques can be employed for other observation distributions (e.g. Beta, Binomial), as well as state-of-the-art shrinkage and sparsity priors (Bhadra, Datta, Polson & Willard 2016),(Datta & Dunson 2016).
Although we used Gibbs sampling, these techniques do not necessitate it; one can use Metropolis–or any other mathematically sound–steps instead.
Bayesian frameworks like these have considerable flexibility and are amenable to much analytic creativity; unfortunately, we don’t often see this–directly, at least–in practice. There are clearly some roadblocks to the average applied statistics/data science developer when attempting to implement these models straight from published papers. Those roadblocks often involve subtle numeric stability issues and a sometimes non-standard mathematics proficiency.
We hope that future automations will be able to address more straight-forward numerical stability issues by–for instance–automating the SVD-based reformulations used herein. The scale mixture manipulations are likewise amenable to certain automations, but, at the very least, it would serve the Bayesian community–and all who desire to construct non-trivial principled statistical models–well to develop low-level tools that codify and generate such scale-mixture variations. In this way, even experimentation by knowledgeable developers will be much easier and less error-prone.
Bibliography
Harrison & West, Bayesian Forecasting & Dynamic Models, Springer (1999). ↩︎
Petris, Petrone & Campagnoli, Dynamic Linear Models, Springer (2009). ↩︎
Gamerman & Lopes, Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference, CRC Press (2006). ↩︎
Pole, West & Harrison, Applied Bayesian Forecasting and Time Series Analysis, CRC Press (1994). ↩︎
Bergstra, Breuleux, Bastien, Lamblin, Pascanu, Desjardins, Turian, Warde-Farley & Bengio, Theano: A CPU and GPU Math Expression Compiler, in in: Proceedings of the Python for Scientific Computing Conference (SciPy), edited by (2010) ↩︎
Willard, Symbolic-Pymc, , (2019). link. ↩︎
Zhang & Li, Fixed-Interval Smoothing Algorithm Based on Singular Value Decomposition, 916–921, in in: , Proceedings of the 1996 IEEE International Conference on Control Applications, 1996, edited by (1996) ↩︎
Petris, Petrone & Campagnoli, Dynamic Linear Models, Springer (2009). ↩︎
Fr"uhwirth-Schnatter, Data Augmentation and Dynamic Linear Models, Journal of time series analysis, 15(2), 183–202 (1994). link. ↩︎
McCullagh & Nelder, Generalized Linear Models, CRC press (1989). ↩︎
Polson, Scott & Windle, Bayesian Inference for Logistic Models Using P'olyaLatent Variables, Journal of the American Statistical Association, 108(504), 1339–1349 (2013). link. ↩︎
Bhadra, Datta, Polson & Willard, Default Bayesian analysis with global-local shrinkage priors, Biometrika, 103(4), 955–969 (2016). doi. ↩︎
Datta & Dunson, Bayesian Inference on Quasi-Sparse Count Data, Biometrika, 103(4), 971–983 (2016). ↩︎
Comments
comments powered by Disqus