Symbolic PyMC Radon Example in PyMC4

# Introduction

Symbolic PyMC is a library that provides tools for symbolic manipulation of Tensor library models in TensorFlow (TF) and Theano. Over time, we plan to add tools that are mostly specialized toward Bayesian model manipulation and mathematical identities relevant to MCMC.

The main approach taken by symbolic-pymc is relational/logic programming powered by a miniKanren implementation in pure Python (based on the kanren package).

As an example of how symbolic-pymc’s offerings can be used, we’ll create a model “optimizer” that approximates the re-centering and re-scaling commonly demonstrated on a hierarchical normal model for the radon dataset. This optimization is symbolic and effectively produces another equivalent model with better sampling properties.

A similar example already exists in Theano and PyMC3; it can be found in the project README. In this case, we will operate on TF graphs via PyMC4 and approximate the same optimization using a very different approach targeted toward the log-likelihood graph.

To get started, we download the radon dataset and define the initial model in Listings 1, 2, and 3.

In Listing 5, we estimate the model using the sample routine from PyMC4’s Radon example Notebook (reproduced in Listing 4). The same plots from the aforementioned notebook are also reproduced here in Figures 7 and 8.

# The Model’s Log-likelihood Graph

In order to apply our optimization, we need to obtain a graph of the log-likelihood function generated by the model in Listing 3. With the graph in-hand, we can perform the re-centering and re-scaling transform–in log-space–and produce a new log-likelihood graph that improves sampling.

This exercise introduces the TensorFlow function-graph backed by the class tensorflow.python.framework.func_graph.FuncGraph. FuncGraph is a subclass of the regular Graph objects upon which symbolic-pymc indirectly operates. Just like Theano’s FunctionGraphs, FuncGraph simply specializes a generic graph by specifying which constituent tensors are considered inputs and outputs.

In Listing 8, we use PyMC4’s internal mechanisms to build the log-likelihood function for our model and a corresponding list of initial values for the parameters.

From here we need FuncGraphs for each input to logpfn. Since logpfn is a tensorflow.python.eager.def_function.Function instance, every time it’s called with a specific tensor it may create a new function-object with its own FuncGraph. In other words, it dynamically generates function objects based on the inputs it’s given.

This specialization process can be performed manually using logpfn.get_concrete_function(*args), which necessarily produces a tensorflow.python.eager.function.ConcreteFunction with the desired FuncGraph. Listing 9 creates and extracts these two objects.

The outputs are now available in graph form as logpfn_fg.outputs.

# The Log-space Transform

Consider the following two equivalent hierarchical models,

$$$\begin{gathered} Y = X + \epsilon, \quad \epsilon \sim \operatorname{N}\left(0, \sigma^2\right) \\ X \sim \operatorname{N}\left(\mu, \tau^2\right) \end{gathered} \label{eq:model-1}$$$

$$$\begin{gathered} Y = \mu + \tau \cdot \tilde{X} + \epsilon, \quad \epsilon \sim \operatorname{N}\left(0, \sigma^2\right) \\ \tilde{X} \sim \operatorname{N}\left(0, 1\right) \;. \end{gathered} \label{eq:model-2}$$$

Models $$\eqref{eq:model-1}$$ and $$\eqref{eq:model-2}$$ are represented in (log) measure space, respectively, as follows:

\begin{align} \log p(Y, X) &= \log P(Y\mid X) + \log P(X) \nonumber \\ &= C - \frac{1}{2} \left(\frac{y}{\sigma} - \frac{x}{\sigma}\right)^2 - \frac{1}{2} \left(\frac{x}{\tau} - \frac{\mu}{\tau}\right)^2 \label{eq:log-model-1} \\ &= \tilde{C} - \frac{1}{2} \left(\frac{y}{\sigma} - \frac{\mu - \tau \cdot \tilde{x}}{\sigma}\right)^2 - \frac{1}{2} \tilde{x}^2 \label{eq:log-model-2} \;. \end{align}

Via term rewriting, Equation $$\eqref{eq:log-model-2}$$ is produced–in part–by applying the replacement rule $$x \to \mu + \tau \cdot \tilde{x}$$ to Equation $$\eqref{eq:log-model-1}$$, i.e.

\begin{align*} \tilde{C} - \frac{1}{2} \left(\frac{y}{\sigma} - \frac{\mu + \tau \cdot \tilde{x}}{\sigma}\right)^2 - \frac{1}{2} \left(\frac{\mu + \tau \cdot \tilde{x}}{\tau} - \frac{\mu}{\tau}\right)^2 \;. \end{align*}

For consistency, the transform must also be applied to the $$dx$$ term where/when-ever it is considered.

After a few algebraic simplifications, one obtains the exact form of Equation $$\eqref{eq:log-model-2}$$.

# Creating the miniKanren Goals

symbolic-pymc is designed to use miniKanren as a means of specifying mathematical relations. The degree to which an implementation of a mathematical relation upholds its known characteristics is–of course–always up to the developer. For the needs of PPLs like PyMC4, we can’t reasonably expect–or provide–capabilities at the level of automatic theorem proving or every relevant state-of-the-art symbolic math routine.

Even so, we do expect that some capabilities from within those more advanced areas of symbolic computing will eventually be required–or necessary–and we want to build on a foundation that allows them to be integrated and/or simply expressed. We believe that miniKanren is a great foundation for such work due to the core concepts it shares with symbolic computation, as well as its immense flexibility. It also maintains an elegant simplicity and is amenable to developer intervention at nearly all levels–often without the need for low- or DSL-level rewrites.

User-level development in miniKanren occurs within its DSL, which is a succinct relational/logic programming paradigm that–in our case–is entirely written in Python. This DSL provides primitive goals that can be composed and eventually evaluated by the run function. We refer the reader to any one of the many great introductions to miniKanren available at http://minikanren.org, or, for the specific Python package used here: this simple introduction.

For the matter at hand, we need to create goals that implement the substitution described above. The first step is to understand the exact TF graphs involved, and the best way to do that is to construct the relevant graph objects, observe them directly, and build “patterns” that match their general forms. Patterns are built with symbolic-pymc meta objects obtained from the mt helper “namespace”. Wherever we want to leave room for variation/ambiguity, we use a “logic variable” instead of an explicit TF (meta) object. Logic variables are created with var() and can optionally be given a string “name” argument that identifies them globally as a singleton-like object.

## Inspecting the TF Graphs

In our case, the log-density returned by PyMC4–via the TensorFlow Probability library (TFP)– uses tf.math.squared_difference to construct the “squared error” term in the exponential of a normal distribution. This term contains everything we need to construct the substitution as a pair of TF graph objects.

Listing 10 shows the graph produced by a normal distribution in TFP.

Instead of looking for the entire log-likelihood graph for a distribution, we can focus on only the SquaredDifference operators, since they contain all the relevant terms for our transformation.

More specifically, if we can identify “chains” of such terms, i.e. SquaredDifference(y, x) and SquaredDifference(x, mu), then we might be able to assume that the corresponding subgraph was formed from such a hierarchical normal model.

Listing 13 shows the SquaredDifference sub-graphs in the log-likelihood graph for our radon model. It demonstrates two instances of said SquaredDifference “chains”: they involve tensors named values_5 and values_1.

The names in the TFP graph are not based on the PyMC4 model objects, so, to make the graph output slightly more interpretable, Listing 15 attempts to re-association the labels.

## Graph Normalization

In general, we don’t want our “patterns” to be “brittle”, e.g. rely on explicit–yet variable–term orderings in commutative operators (e.g. a pattern that exclusively targets mt.add(x_lv, y_lv) and won’t match the equivalent mt.add(y_lv, x_lv)).

The grappler library in TensorFlow provides a subset of graph pruning/optimization steps. Ideally, a library like grappler would provide full-fledged graph normalization/canonicalization upon which we could base the subgraphs used in our relations.

While grappler does appear to provide some minimal algebraic normalizations, the extent to which these are performed and their breadth of relevant operator coverage isn’t clear; however, the normalizations that it does provide are worth using, so we’ll make use of them throughout.

Listing 17 provides a simple means of applying grappler.

In Listing 17 we run grappler on the log-likelihood graph for a normal random variable from Listing 10.

Listing 19 compares the computed outputs for the original and normalized graphs–given identical inputs.

From the output of Listing 21, we can see that grappler has performed some constant folding and has reordered the inputs in "add_1_1"–among other things.

## miniKanren Transform Relations

In Listing 23, we create miniKanren functions that identify the aforementioned SquaredDifference “chains” and perform the re-centering/scaling substitutions.

As a test, we will run our miniKanren relations on the log-likelihood graph for a normal-normal hierarchical model in Listing 26.

Listing 27 shows the form that a graph representing a hierarchical normal-normal model will generally take in TFP.

Listing 29 runs our transformation and Listing 32 prints the resulting graph.

## Missing Graph Simplifications

From Listing 32 we can see that grappler is not applying enough algebraic simplifications (e.g. it doesn’t remove multiplications with 1 or reduce the $$\left(\mu + x - \mu \right)^2$$ term in SquaredDifference).

Does missing this simplification amount to anything practical? Listing 34 demonstrates the difference between our model without the simplification and a manually constructed model without the redundancy in SquaredDifference.

The output of Listing 35 shows exactly how large the discrepancy can be for carefully chosen parameter values. More specifically, as tau_tf gets smaller and the magnitude of the difference x_tf - y_tf gets larger, the discrepancy can increase. Since such parameter values are likely to be visited during sampling, we should address this missing simplification.

In Listing 38 we create a goal that performs that aforementioned simplification for SquaredDifference.

We apply the simplification in Listing 38 and print the results in 39.

After simplification, the difference is now gone.

# Transforming the Log-likelihood Graph

Now, we’re ready to apply the transform to the radon model log-likelihood graph.

Listing 44 shows the replacements that were made throughout the graph. Two replacements were found and they appear to correspond to the un-centered normal distribution terms a and b in our model–as intended.

Likewise, Listing 46 shows SquaredDifference subgraphs that appear in the transformed log-likelihood.

# Creating a new Log-likelihood Function

Now that we have a transformed version of the original log-likelihood graph (i.e. logpfn_trans_tf), we need to create a new FuncGraph from it. Listing 48 provides a simple function that creates a new ConcreteFunction from an updated output node.

The new TF function, logpfn_new_cf, in Listing 48 is the function we are going to use for sampling from the new log-likelihood.

Listing 50 shows the difference between a transformed and non-transformed log-likelihood value given the same inputs.

# Sampling from the new Log-likelihood

In Listing 53, we reproduce the remaining steps of pm.inference.sampling.sample and–unnaturally–force the PyMC4 machinery to draw samples from our new transformed log-likelihood function.

# Discussion

The goals in the two separate run calls we used in Listing 23 could have been combined into a single run. This could’ve been accomplished using some “meta” steps (e.g. construct and evaluate a goal on-the-fly within a miniKanren) or special goals for reading from a miniKanren-generated dicts or association lists. Goals of this nature are not uncommon (e.g. type inference and inhabitation exmaples), and serve to demonstrate the great breadth of activity possible within relational context of miniKanren.

However, the point we want to make doesn’t require much sophistication. Instead, we wanted to demonstrate how a non-trivial “pattern” can be specified and matched using symbolic-pymc, and how easily those results could be used to transform a graph.

More specifically, our goal shift_squared_subso in 23 demonstrates the way in which we were able to specify desired structure(s) within a graph. We defined one pattern, Y_sqrdiffo, to match anywhere in the graph then another pattern, X_sqrdiffo, that relied on matched terms from Y_sqrdiffo and could also be matched/found anywhere else in the same graph.

Furthermore, our substitutions needed information from both “matched” subgraphs. Specifically, substitution pairs similar to (x, z + x). Within this framework, we could just as easily have included y–or any terms from either successfully matched subgraph–in the substitution expressions.

In sample-space, the search patterns and substitutions are much easier to specify exactly because they’re single-subgraph patterns that themselves are the subgraphs to be replaced (i.e. if we find a non-standard normal, replace it with a shifted/scaled standard normal). In log-space, we chose to find distinct subgraph “chains”, i.e. all (y - x)**2 and (x - z)**2 pairs (i.e. “connected” by an “unknown” term x), since these are produced by the log-likelihood form of hierarchical normal distributions.

As a result, we had a non-trivial structure/“pattern” to express–and execute. Using conventional graph search-and-replace functionality would’ve required much more orchestration and resulted considerably less flexible code with little-to-no reusability. In our case, the goals onceo and tf_graph_applyo are universal and the forms in shift_squared_subso can be easily changed to account for more sophisticated (or entirely distinct) patterns and substitutions.

Most related graph manipulation offerings make it easy to find a single subgraph that matches a pattern, but not potentially “co-dependent” and/or distinct subgraphs. In the end, the developer will often have to manually implement a “global” state and orchestrate multiple single-subgraph searches and their results.

For single search-and-replace objectives, this amount of manual developer intervention/orchestration might be excusable; however, for objectives requiring the evaluation of multiple graph transformation, this approach is mostly unmaintainable and extremely difficult to compartmentalize.

This demonstration barely even scratches the surface of what’s possible using miniKanren and relational programming for graph manipulation and symbolic statistical model optimization. As the symbolic-pymc project advances, we’ll cover examples in which miniKanren’s more distinct offerings are demonstrated.