- Introduction
- The Hidden Markov Model
- Sampling Model Graphs
- Log-likelihood
- PyMC3
`Distribution`

s - Discussion

# Introduction

This document describes how Theano graphs can be used as a focal point for Bayesian model specification, and how these graphs can be bridged with PyMC3 objects and–as a result–produce a dramatically more robust and simplified foundation for PyMC3.

Yes, PyMC3 does use Theano–and, thus, Theano graphs–but it doesn’t explicitly build a graph for the relationships between random variables. In the following, we call such graphs [Theano] *sample-space graphs* and we’ll show how all the forms of sampling performed by PyMC3 (i.e. prior and posterior predictive) can be implemented using these graphs. We’ll also demonstrate how the log-likelihood graphs currently used by PyMC3 can be constructed from such graphs.

Before we go any further, let’s look at a simple example of a sample-space graph.

Consider the model in Equation .

\[\begin{equation} \label{eq:simple-model} Y = a Z + b, \quad Z \sim \operatorname{N}\left( 0, \sigma^2 \right) \;, \end{equation}\]

Equation can be represented by the Theano graph in Listing 1.

When \(Y\) is an observed variable, the Theano graph represented by `Y_rv`

is our sample-space graph of interest. The graph for `Z_rv`

is also a sample-space graph, and, in some cases, we’ll refer to `Y_rv`

as a *model graph* due to its more direct correspondence to a model of interest (e.g. Equation ).

Naturally, the Theano tensors objects `Z_rv`

and `Y_rv`

are random variables, but–more specifically–the former is a direct output of a `RandomVariable`

operator, while the latter is a function of the aforementioned `RandomVariable`

output. This distinction will be important in what follows, but it suffices to say that both represent random draws from a distribution.

We can sample either of them–or both–by simply compiling the graphs into functions and evaluating those functions. This is demonstrated in Listing 2.

Throughout this exposition we’ll use a non-trivial hidden Markov model (HMM) as our guiding example. These sorts of time-series are difficult to implement in PyMC3 due to their reliance on the challenging `Scan`

function, symbolic shape issues in PyMC3, and the general complexity involved in writing their log-likelihoods and sampling functions–especially when one needs to iterate on hierarchical changes to such models. As a result, anything that can improve their ease of use within PyMC3 is a worthwhile consideration.

For other types of time-series that are represented and computed as Theano model graphs see Willard (2020). In those examples, custom samplers are constructed by hand in Theano; however, here we’re interested in using PyMC3 to generate posterior samples.

That said, we’ll demonstrate how the sample-space graphs for HMMs can be specified with relative ease, and we’ll build up to the construction of a custom PyMC3 `Distribution`

class based entirely on transformations of the original sample-space graph.

Ultimately, the process of deriving this `Distribution`

class demonstrates how all of the model information and objects required by PyMC3 can be derived from a sample-space graph. As a result, automation of this process would provide a new foundation for PyMC; one that can leverage Theano to perform all the complicated sampling and doesn’t suffer from the same symbolic shape limitations, and one for which it is much simpler to reason about a model systematically. This new PyMC would also set the stage for specialized optimizations that could dramatically improve performance.

# Sampling Model Graphs

In this section, we show how all the sampling functionality of PyMC3 is already provided by Theano.

## Prior Predictive Sampling

In Listing 7 we compile a Theano function that is able to draw samples from the model in Listing 6. More specifically, we created a function that computes \(\left( s_t, y_t \right) \sim \left( S_t, Y_t \right)\).

The samples produced by the compiled function `hmm_sampler`

are effectively the same type of samples that `pymc3.sample_prior_predictive`

would produce; however, in this case, Theano automatically handles variable dependencies in a way that `pymc3.sample_prior_predictive`

currently cannot–plus, it has all the advantages of Theano compilation (e.g. algebraic optimizations, C-compiled functions).

Figure 8 plots the samples in `theano_samples`

. Unfortunately, sampling from the prior predictive produces very “random” series, but, if we wanted to–say–generate samples conditional on very specific values of the seasonal transition matrix parameters, \(\xi^{(S_t)}_t\), we could compile a different sampling function that takes those parameters as arguments. Simply put, we want a function that computes \(\left( s_t, y_t \right) \sim \left( S_t, Y_t \mid \xi^{(S_t)}_t \right)\).

Listing 9 compiles such a function in Theano and uses it draw samples given specific values of \(\xi^{(S_t)}_t\) that demonstrate higher probabilities of staying in–and transitioning to–the zero value state during weekdays and late hours.

## Posterior Predictive Sampling

Now that we know how to draw samples conditional on terms

Listing 12 demonstrates a fundamental limitation with the sample-space graph models: we can’t condition on arbitrary random variables.

The problem is that our original sample-space model graph in Listing 6 generates the vector of all \(S_t\), \(S_{0:T}\), in the same Theano `Scan`

that generate all the \(Y_t\); in other words, \(Y_{0:T}\) isn’t a function of \(S_{0:T}\) according the the Theano graph.

Simply put, we need to create a graph that “condition on” \(S_{0:T}\), or–in other words–convert the `Scan`

output `S_rv`

into an input.

# Log-likelihood

In contrast to Theano sample-space graphs, there are Theano measure-space graphs, which–for our purposes–will correspond to Theano graphs that compute the log-likelihoods of terms in a model.

PyMC3 creates these kinds of Theano graphs in `Distribution.logp`

–and the methods that call it.

Working from our original sample-space model graph in Listing 6, we can create a log-likelihood graph with a simple two-step process

- for each
`RandomVariable`

create new variables to serve as inputs to a log-likelihood, then - replace each
`RandomVariable`

with its log-likelihood graphs–the latter being dependent on the previously created input variables.

Fortunately, we can use the existing `Distribution.logp`

implementations to complete the second step.

To demonstrate, Listing 15 follows the above two steps in order to construct a log-likelihood graph for the \(S_0\) term.

The log-likelihood in Listing 15 is–of course–conditional on the \(\pi_0\) term, but we could easily create a joint log-likelihood by performing the same operation for \(\pi_0\) and adding the two log-likelihood graphs.

Unfortunately, more steps are needed when `Scan`

s are involved. If we want to create a log-likehood graph for \(S_{0:T}\), then–just like the conditional sample-space graph in Listing 14–we need to first transform the `Scan`

so that it “conditions on” \(S_{0:T}\)–i.e. converts its output `S_rv`

into an input.

The situation for \(Y_{0:T}\) is a little more complicated; it requires log-likelihood conversions in the `emissions_tt`

term outside of the `Scan`

**and** an update to the `Scan`

so that it uses the log-likelihoods derived from `emissions_tt`

. An implementation is given in Listing 17.

The functionality for automating these two steps in the non-`Scan`

case already exist in `symbolic-pymc`

, and the requisite functionality for simple case of `Scan`

was introduced in #113 and #114.

The latter changes introduce a “push-out” optimization that helps expose `RandomVariable`

s hidden within the internal sub-graphs of `Scan`

operators.

For instance, in the original Theano graph model, \(S_t\) is a `RandomVariable`

created within the `Scan`

operator’s inner-graph (via the step function run by `Scan`

). Had \(S_t\) not been specified as an output of the inner function `state_step`

, this “push-out” optimization would redefine the model so that it is.

Additionally, functions were added in #114 that automate the process of turning `state_step`

into the `Y_given_S_step`

in Listing 14. This is how we can automate the construction of conditional sample-space graphs and log-likelihood graphs.

# PyMC3 `Distribution`

s

In order to use a complicated model like ours in Listing 6 within PyMC3, we need to construct a PyMC3 `Distribution`

class using the conditional sampler functions and the log-likelihood graphs we derived above from the original model graph. Basically, the log-likelihood graphs comprise the body of the `Distribution.logp`

method, and the conditional samplers comprise the `Distribution.random`

method.

Unfortunately, the shape issues and sampler problems of PyMC3 aren’t actually removed in this process, since we’re moving out of the Theano framework in which those problems are solved.

Listing 18 constructs the `Distribution`

class for \(Y_{0:T}\)–or `Y_rv`

.

The `Distribution`

objects constructed in this way can now be used to define and estimate a regular PyMC3 model, as demonstrated in Listing 19.

# Discussion

Regarding the compilation of Theano “model graphs” to PyMC3 (i.e. functionality that would allow us to produce posterior predictive samples using Theano while still being able to estimate), I’ve added most of the key functionality in PR #113 and #114.

Part of the functionality introduced there allows us to automatically produce new Theano graphs with inputs for each random variable dependency embedded in a `Scan`

. It’s currently being used just to produce the log-likelihood graphs, but it’s also what we need in order to construct a fast Theano `function`

that produces posterior predictive samples. Writing a posterior predictive sampler that does this is the next step.

The existing `symbolic_pymc.theano.pymc3.graph_model`

function does this for simple Theano model graphs, but it does not work with the `Scan`

s we need to use in order to define time-series model graphs.

# Bibliography

Willard, Dynamic Linear Models in Theano, *Brandon T. Willard*, (2020). link. ↩︎

## Comments

comments powered by Disqus