- Introduction
- The Hidden Markov Model
- Sampling Model Graphs
- Log-likelihood
- PyMC3
Distribution
s - Discussion
Introduction
This document describes how Theano graphs can be used as a focal point for Bayesian model specification, and how these graphs can be bridged with PyMC3 objects and–as a result–produce a dramatically more robust and simplified foundation for PyMC3.
Yes, PyMC3 does use Theano–and, thus, Theano graphs–but it doesn’t explicitly build a graph for the relationships between random variables. In the following, we call such graphs [Theano] sample-space graphs and we’ll show how all the forms of sampling performed by PyMC3 (i.e. prior and posterior predictive) can be implemented using these graphs. We’ll also demonstrate how the log-likelihood graphs currently used by PyMC3 can be constructed from such graphs.
Before we go any further, let’s look at a simple example of a sample-space graph.
Consider the model in Equation .
\[\begin{equation} \label{eq:simple-model} Y = a Z + b, \quad Z \sim \operatorname{N}\left( 0, \sigma^2 \right) \;, \end{equation}\]
Equation can be represented by the Theano graph in Listing 1.
When \(Y\) is an observed variable, the Theano graph represented by Y_rv
is our sample-space graph of interest. The graph for Z_rv
is also a sample-space graph, and, in some cases, we’ll refer to Y_rv
as a model graph due to its more direct correspondence to a model of interest (e.g. Equation ).
Naturally, the Theano tensors objects Z_rv
and Y_rv
are random variables, but–more specifically–the former is a direct output of a RandomVariable
operator, while the latter is a function of the aforementioned RandomVariable
output. This distinction will be important in what follows, but it suffices to say that both represent random draws from a distribution.
We can sample either of them–or both–by simply compiling the graphs into functions and evaluating those functions. This is demonstrated in Listing 2.
Throughout this exposition we’ll use a non-trivial hidden Markov model (HMM) as our guiding example. These sorts of time-series are difficult to implement in PyMC3 due to their reliance on the challenging Scan
function, symbolic shape issues in PyMC3, and the general complexity involved in writing their log-likelihoods and sampling functions–especially when one needs to iterate on hierarchical changes to such models. As a result, anything that can improve their ease of use within PyMC3 is a worthwhile consideration.
For other types of time-series that are represented and computed as Theano model graphs see Willard (2020). In those examples, custom samplers are constructed by hand in Theano; however, here we’re interested in using PyMC3 to generate posterior samples.
That said, we’ll demonstrate how the sample-space graphs for HMMs can be specified with relative ease, and we’ll build up to the construction of a custom PyMC3 Distribution
class based entirely on transformations of the original sample-space graph.
Ultimately, the process of deriving this Distribution
class demonstrates how all of the model information and objects required by PyMC3 can be derived from a sample-space graph. As a result, automation of this process would provide a new foundation for PyMC; one that can leverage Theano to perform all the complicated sampling and doesn’t suffer from the same symbolic shape limitations, and one for which it is much simpler to reason about a model systematically. This new PyMC would also set the stage for specialized optimizations that could dramatically improve performance.
Sampling Model Graphs
In this section, we show how all the sampling functionality of PyMC3 is already provided by Theano.
Prior Predictive Sampling
In Listing 7 we compile a Theano function that is able to draw samples from the model in Listing 6. More specifically, we created a function that computes \(\left( s_t, y_t \right) \sim \left( S_t, Y_t \right)\).
The samples produced by the compiled function hmm_sampler
are effectively the same type of samples that pymc3.sample_prior_predictive
would produce; however, in this case, Theano automatically handles variable dependencies in a way that pymc3.sample_prior_predictive
currently cannot–plus, it has all the advantages of Theano compilation (e.g. algebraic optimizations, C-compiled functions).
Figure 8 plots the samples in theano_samples
. Unfortunately, sampling from the prior predictive produces very “random” series, but, if we wanted to–say–generate samples conditional on very specific values of the seasonal transition matrix parameters, \(\xi^{(S_t)}_t\), we could compile a different sampling function that takes those parameters as arguments. Simply put, we want a function that computes \(\left( s_t, y_t \right) \sim \left( S_t, Y_t \mid \xi^{(S_t)}_t \right)\).
Listing 9 compiles such a function in Theano and uses it draw samples given specific values of \(\xi^{(S_t)}_t\) that demonstrate higher probabilities of staying in–and transitioning to–the zero value state during weekdays and late hours.
Posterior Predictive Sampling
Now that we know how to draw samples conditional on terms
Listing 12 demonstrates a fundamental limitation with the sample-space graph models: we can’t condition on arbitrary random variables.
The problem is that our original sample-space model graph in Listing 6 generates the vector of all \(S_t\), \(S_{0:T}\), in the same Theano Scan
that generate all the \(Y_t\); in other words, \(Y_{0:T}\) isn’t a function of \(S_{0:T}\) according the the Theano graph.
Simply put, we need to create a graph that “condition on” \(S_{0:T}\), or–in other words–convert the Scan
output S_rv
into an input.
Log-likelihood
In contrast to Theano sample-space graphs, there are Theano measure-space graphs, which–for our purposes–will correspond to Theano graphs that compute the log-likelihoods of terms in a model.
PyMC3 creates these kinds of Theano graphs in Distribution.logp
–and the methods that call it.
Working from our original sample-space model graph in Listing 6, we can create a log-likelihood graph with a simple two-step process
- for each
RandomVariable
create new variables to serve as inputs to a log-likelihood, then - replace each
RandomVariable
with its log-likelihood graphs–the latter being dependent on the previously created input variables.
Fortunately, we can use the existing Distribution.logp
implementations to complete the second step.
To demonstrate, Listing 15 follows the above two steps in order to construct a log-likelihood graph for the \(S_0\) term.
The log-likelihood in Listing 15 is–of course–conditional on the \(\pi_0\) term, but we could easily create a joint log-likelihood by performing the same operation for \(\pi_0\) and adding the two log-likelihood graphs.
Unfortunately, more steps are needed when Scan
s are involved. If we want to create a log-likehood graph for \(S_{0:T}\), then–just like the conditional sample-space graph in Listing 14–we need to first transform the Scan
so that it “conditions on” \(S_{0:T}\)–i.e. converts its output S_rv
into an input.
The situation for \(Y_{0:T}\) is a little more complicated; it requires log-likelihood conversions in the emissions_tt
term outside of the Scan
and an update to the Scan
so that it uses the log-likelihoods derived from emissions_tt
. An implementation is given in Listing 17.
The functionality for automating these two steps in the non-Scan
case already exist in symbolic-pymc
, and the requisite functionality for simple case of Scan
was introduced in #113 and #114.
The latter changes introduce a “push-out” optimization that helps expose RandomVariable
s hidden within the internal sub-graphs of Scan
operators.
For instance, in the original Theano graph model, \(S_t\) is a RandomVariable
created within the Scan
operator’s inner-graph (via the step function run by Scan
). Had \(S_t\) not been specified as an output of the inner function state_step
, this “push-out” optimization would redefine the model so that it is.
Additionally, functions were added in #114 that automate the process of turning state_step
into the Y_given_S_step
in Listing 14. This is how we can automate the construction of conditional sample-space graphs and log-likelihood graphs.
PyMC3 Distribution
s
In order to use a complicated model like ours in Listing 6 within PyMC3, we need to construct a PyMC3 Distribution
class using the conditional sampler functions and the log-likelihood graphs we derived above from the original model graph. Basically, the log-likelihood graphs comprise the body of the Distribution.logp
method, and the conditional samplers comprise the Distribution.random
method.
Unfortunately, the shape issues and sampler problems of PyMC3 aren’t actually removed in this process, since we’re moving out of the Theano framework in which those problems are solved.
Listing 18 constructs the Distribution
class for \(Y_{0:T}\)–or Y_rv
.
The Distribution
objects constructed in this way can now be used to define and estimate a regular PyMC3 model, as demonstrated in Listing 19.
Discussion
Regarding the compilation of Theano “model graphs” to PyMC3 (i.e. functionality that would allow us to produce posterior predictive samples using Theano while still being able to estimate), I’ve added most of the key functionality in PR #113 and #114.
Part of the functionality introduced there allows us to automatically produce new Theano graphs with inputs for each random variable dependency embedded in a Scan
. It’s currently being used just to produce the log-likelihood graphs, but it’s also what we need in order to construct a fast Theano function
that produces posterior predictive samples. Writing a posterior predictive sampler that does this is the next step.
The existing symbolic_pymc.theano.pymc3.graph_model
function does this for simple Theano model graphs, but it does not work with the Scan
s we need to use in order to define time-series model graphs.
Bibliography
Willard, Dynamic Linear Models in Theano, Brandon T. Willard, (2020). link. ↩︎
Comments
comments powered by Disqus