Readable Strings and Relational Programming in Hy

Just some thoughts on a generalized repr for Hy and some connections with relational programming.

Introduction

In the past few months, I’ve been working on Hy a lot. It’s been great for translating symbolic computation ideas originating in the Lisp community or simply performing the generic meta-programming inherent to the subject.

One feature I’ve been missing the most is “readable” print-outs from the REPL. In this case, “readable” means “a string that can be eval’ed to [re-]produce the object it’s meant to represent”. Python calls the function(s) that produce these strings “repr”s and provides a generic repr function–with limited Python “readability” guarantees–and a __repr__ property for object/class-level customization.

test_obj = {"a": 1, "b": [2, 3]}

# Produce a readable string using `repr`
obj_repr_str = repr(test_obj)
print(obj_repr_str)

# Re-create the object from its readable string form
obj_from_repr = eval(obj_repr_str)
print(obj_from_repr)

print(test_obj == obj_from_repr)

{'a': 1, 'b': [2, 3]}
{'a': 1, 'b': [2, 3]}
True

There’s already a hy.contrib.hy-repr module that gets most of the way there, but it doesn’t implement the Python standard library’s reprlib.Repr. The class reprlib.Repr implements limits for the display lengths of the strings it produces, and its source code provides a few standard library implementations of primitive object reprs–which require only trivial changes to produce the desired Hy syntax.

For these reasons–and an overall interest in using and translating more of the Python standard library to Hy–I decided to try a quick refactoring of hy.contrib.hy-repr that implements reprlib.Repr.

The Hy `repr` Problem(s)

The translation of Hy AST to string form is fairly straight-forward. In most cases, one only needs to change the reprs for Python primitives and basic function calls (e.g. from func(1) to (func 1)); however, changing just a couple lines in repr/__repr__ functions for all the Python builtins is very annoying.

Furthermore, what about those custom object __repr__ methods? While one might be able to manually patch most–if not all–of the (Python-implemented) standard library objects, there are far too many 3rd-party library __repr__s with exactly the same trivial function-call form that can’t reasonably be patched.

Some approaches

The first few things that come to mind when considering a more general approach to Python-to-Hy __repr__ translation involve some use of the existing repr code. That might come in the form of string manipulation of repr output, which hy.contrib.hy-repr already does in some cases, or quite possibly some use of a repr function’s source or code object.

The latter seems like it has the potential to be more thorough and far-reaching, but also considerably more involved and computationally inefficient. Unfortunately, similar things can be said about the regex approach. Although it does seem a little easier to implement and–for limited cases–efficient enough for most purposes, it also comes across as much more brittle.

Fortunately, the latter is unnecessary, because, when the existing repr output is Python readable, it can be parsed by ast.parse. The function ast.parse effectively handles the regex work and yields the bulk of information needed for a Hy repr string: the function name and its (positional and keyword) arguments.

Let’s say we implement our own object and repr.

(defclass TestClass [object]
  (defn --init-- [self arg1 arg2 &optional kwarg1 kwarg2]
    (setv self.arg1 arg1
          self.arg2 arg2
          self.kwarg1 kwarg1
          self.kwarg2 kwarg2))
  (defn --repr-- [self]
    (.format "TestClass({}, {}, kwarg1={}, kwarg2={})"
             #* (lfor a [self.arg1 self.arg2
                         self.kwarg1 self.kwarg2]
                      (repr a)))))

(setv test-obj (TestClass 1 {"a" 1 "b" 2} :kwarg1 1 :kwarg2 "ok"))
(print (repr test-obj))

TestClass(1, {'a': 1, 'b': 2}, kwarg1=1, kwarg2='ok')

Since the results are readable, we can do the following:

(import ast astor)
(setv repr-ast (ast.parse (repr test-obj) :mode "eval"))
(print (astor.dump repr-ast))

Expression(
    body=Call(func=Name(id='TestClass'),
              args=[Num(n=1),
                    Dict(keys=[Str(s='a'), Str(s='b')],
                         values=[Num(n=1), Num(n=2)])],
              keywords=[keyword(arg='kwarg1', value=Num(n=1)),
                        keyword(arg='kwarg2', value=Str(s='ok'))]))

A Generalized Hy `repr` Prototype

With existing repr output converted to Python AST by Python itself (using ast.parse), we can produce readable Hy strings from the resulting AST objects.

In this scenario, we need only be concerned with the conversion of Python AST into readable Hy strings. This works like an inverse to the Hy compiler: in other words, a Hy decompiler. For repr purposes, only function call statements and their arguments need to be decompiled. Unfortunately, function arguments can consist of arbitrary Python/Hy objects, and that’s how the decompilation responsibilities start to expand. If we limit our scope to a reasonable subset of Python builtins/primitives, the results can still be quite effective, and won’t require a complete decompiler.

On the down-side, if a Hy repr implementation overrides the built-in repr, then arguments in existing repr/__repr__s might already be converted by the overridden repr; however, the results from ast.parse will undo/discard those results. Even so, custom class __repr__s aren’t guaranteed to use the built-in repr on their arguments, so attempts to salvage already-converted repr output are undeniably fraught with complications.

Working from the repr-produced AST above, I mocked-up a quick prototype for a generic Python-to-Hy conversion function.

(import ast)
(import builtins)

(import [hy.contrib.hy-repr [hy-repr :as -hy-repr]])

(defn ast-funcall-to-hy [ast-obj repr1
                         &optional [level 1]]
  "Turn Python `ast.Call` expressions into Hy `repr` strings.

XXX: Only a very minimal subset of Python-to-Hy AST is implemented.

This can be used to turn a \"readable\" `repr` result, via an actual \"read\" by
`ast.parse`, to Python AST then Hy AST.
"
  (assert (and (instance? ast.Expression ast-obj)
               (instance? ast.Call ast-obj.body)))
  (setv func-name (. ast-obj body func id))
  (setv eval-fn (fn [o]
                  (if (instance? ast.Name o)
                      o.id
                      (repr1 (ast.literal-eval o) (dec level)))))
  (setv func-args (lfor a (. ast-obj body args) (eval-fn a)))
  (setv func-kwargs (lfor k (. ast-obj body keywords)
                          (.format ":{} {}" k.arg (eval-fn k.value))))
  (.format "({})" (.join " " (+ [func-name] func-args func-kwargs))))


(setv test-ast (ast.parse "range(x, y, blah=1, bloh=\"ok\")" :mode "eval"))
(print (ast-funcall-to-hy test-ast (fn [x &rest y] (-hy-repr x))))

(range x y :blah 1 :bloh "ok")

ast-funcall-to-hy is an extremely narrow decompiler that only handles readable function calls (represented by ast.Call nodes), but, as part of a fallback sequence in a Hy repr implementation, it’s still pretty useful.

A function like ast-funcall-to-hy can be used in repr logic as follows:

(defn hy-repr [x &optional [level 1] [-repr (fn [x &rest y] (-hy-repr x))]]
  "Use `builtin.repr` results to generate readable Hy `repr` strings for cases
we haven't covered explicitly.
"
  (try
    (setv s (builtins.repr x))
    (when (not (.startswith s "<"))
      (do
        (setv repr-ast (ast.parse s :mode "eval"))
        (setv s (ast-funcall-to-hy repr-ast -repr))))
    s
    (except [Exception]
      (.format "<{} instance at {}>" x.__class__.__name__ (id x)))))

Now, for the example class, TestClass, we can demonstrate automatic conversion of its Python __repr__ implementation.

(setv test-ast (TestClass 1 {"a" 2 "b" 3} :kwarg1 1 :kwarg2 "ok"))
(print (.format "before: {}\nafter: {}"
                (repr test-ast)
                (hy-repr test-ast)))

before: TestClass(1, {'a': 2, 'b': 3}, kwarg1=1, kwarg2='ok')
after: (TestClass 1 {"a" 2  "b" 3} :kwarg1 1 :kwarg2 "ok")

Relational Programming

While considering all this, I kept thinking about how nice it would be to have a “bijective” compiler; in other words, the existing Hy compiler, which translates Hy-to-Python, and a Python-to-Hy (de)compiler. With a Python-to-Hy AST compiler, we could more broadly convert Python AST output–like the kind in our example above–to a repr/readable string in Hy.

The idea isn’t too crazy, especially since one can easily work backward from a lot of the logic in the existing Hy compiler. There will be some edge cases that result in non-bijective translations (i.e. some round-trip Hy/Python translations might only be equivalent and not exactly equal), but this isn’t necessarily a blocking issue. Decisions regarding “canonical” or reduced forms of Hy/Python AST might be necessary, especially if the resulting AST is intended to be more human readable than not.

Perhaps what’s more discouraging is the effort it would take to ensure that the compilation processes going both ways are–and stay–coherent during the course of development. For instance, when changes are made to the standard compilation process (i.e. Hy-to-Python), it’s likely that changes and tests would also be needed for the other direction.

This is where a paradigm like relational programming is particularly appealing: it provides a language for defining–and means for computing–the maps

\[\begin{equation*} \text{Hy Syntax} \longleftrightarrow \text{Python AST} \longleftrightarrow \text{Python Syntax} \; \end{equation*}\]

in a cohesive way.

My relational programming DSL of choice, miniKanren, already has an implementation in Hy: loghyc (and to be formally known as adderall). We’ve been using it to perform static code analysis and refactoring in the project hydiomatic, so there’s also a precedent for parsing Hy syntax in a relational context.

The missing/next step would be to output Python AST (instead of more Hy forms, like hydiomatic produces, for example). In the following sections, we will construct a small relational Hy/Python compiler as a proof-of-concept.

A Prototype Relational Compiler

Creating a bi-directional Hy/Python AST compiler in miniKanren involves the construction of goals “relating” the two AST forms. For simplicity, we’ll just consider function call expressions, like func(args) and (func args).

Also, since these kinds of relations are more easy to specify using constraints and subtle unification adjustments, we’ll use a prototype microKanren implementation in Hy that provides immediate access to those: hypoKanren.

Regardless, given the universality of miniKanren, the goals we construct should be directly translate-able to other implementations of miniKanren (even in completely different host languages).

The only obvious caveat to such translation is the availability of traditional cons semantics in the host language (i.e. the standard Lisp behavior of cons, car, cdr, and improper lists/cons pairs).

(import ast)
(import astor)
(import types)
(import [collections [Callable]])

(import hy.models)
(import [hy.compiler [asty hy-eval hy-compile]])

(import [hypoKanren.goals [*]])
(import [hypoKanren.core [*]])


(require [hy.contrib.walk [let]])
(require [hypoKanren.goals [*]])
(require [hypoKanren.core [*]])

First, let’s examine the general structure of the Python AST output generated by the Hy compiler for the Hy function-call given by `(func x :y z).

(astor.dump (hy-compile `(func x :y z) "__console__"))

Module(
    body=[Expr(value=Call(func=Name(id='func'), args=[Name(id='x')], keywords=[keyword(arg='y', value=Name(id='z'))]))])

In what follows, we’ll exclude the ast.Module and focus only on the src.Expr and its children.

AST Object Unification

To make existing Python AST objects amenable to the unification used by miniKanren, we implement unify specializations for ast.AST types. Our implementation simply generates unevaluated Hy forms, or Hy AST, that–when evaluated–would (re)create the ast.AST objects.

Alternatively, we could only ever use and create unevaluated Hy forms for Python AST. Providing unification for AST objects allows for more immediate integration with existing Python code and/or what it would most likely produce.

hypoKanren uses multipledispatch, so augmenting the unification process is easy. This is how we’ll add support for AST objects.

There’s already a good pure Python library for unification built upon multipledispatch, unfication. At a later time, it might be worthwhile to simply add support for Hy objects and use that library instead.

(import [multipledispatch [dispatch]])
(import [hypoKanren.unify [*]])
(import [hy.models [*]])
(import [hy.contrib.walk [prewalk]])


(defmacro/g! dispatch-unify-trans [disp-type trans-func &optional [func 'unify]]
  `(do
     #@((dispatch ~disp-type object object)
        (defn unify-post-walk [~g!u ~g!v ~g!s]
          (~func (~trans-func ~g!u) ~g!v ~g!s)))
     #@((dispatch object ~disp-type object)
        (defn unify-post-walk [~g!u ~g!v ~g!s]
          (~func ~g!u (~trans-func ~g!v) ~g!s)))
     #@((dispatch ~disp-type ~disp-type object)
        (defn unify-post-walk [~g!u ~g!v ~g!s]
          (~func (~trans-func ~g!u) (~trans-func ~g!v) ~g!s)))))

(defn py-ast-to-expr [x]
  (defn -py-ast-to-expr [u]
    (setv ast-expr
          `(~(HySymbol (+ "ast." (name (type u))))
            ~@(chain.from-iterable
                (lfor f u.-fields
                      :if (hasattr u f)
                      [(HyKeyword f) (getattr u f)]))))
    ast-expr)
  (prewalk (fn [y] (if (instance? ast.AST y)
                       (-py-ast-to-expr y)
                       y))
           x))

;; Python AST expansion pre-unification
(dispatch-unify-trans ast.AST (fn [x] (py-ast-to-expr x)))

;; One is an `ast.AST` object, the other an unevaluated `ast.AST`
;; object-generating form.
(setv unify-exa-1 (unify (ast.Expr :value [])
                         `(ast.Expr :value ~(var 0))
                         {}))

;; Both are `ast.AST` objects
(setv unify-exa-2 (unify (ast.Expr :value [])
                         (ast.Expr :value (var 0))
                         {}))

(= (.get unify-exa-1 (var 0))
   (.get unify-exa-2 (var 0))
   [])

True

Listing 16 illustrates unification of two ast.AST forms. The (var 0) objects are “logic variables” taking the value of sub-expressions that cause the two unify arguments to, well, unify. The third argument to unify is simply a dict that stores the logic variable/sub-expression mappings.

In other words, logic variables are like unknowns that unify(u, v, s) will “solve” in order to make u and v equal.

(unify (cons 'ast.Expr (var 0))
       (ast.Expr :value [(ast.Name :id "a")])
       {})

{(LVar 0): HyExpression([
  HyKeyword('value'),
  [HyExpression([
    HySymbol('ast.Name'),
    HyKeyword('id'),
    'a'])]])}

Listing 18 is a more interesting example that demonstrates partial/improper list unification. Since ast.AST objects are expanded into equal object-instantiating Hy AST forms, (cons 'ast.Expr (var 0)) is ultimately unified with a HyExpression (a subclass of list). Under the cons abstraction, (var 0) can be anything that–when consed with the symbol ast.Expr–will produce the expression (ast.Expr :value [(ast.Name :id "a")]). The result is the partial HyExpression comprising the arguments to the ast.Expr constructor–in other words, the cdr of the ast.AST form.

We will also need to unify some limited Hy AST forms; specifically, HySymbols. We will want to extract only the name part of a Hy symbol and relate that to Python ast.Names via one of the latter’s constructor arguments.

Similar to Python AST nodes, we will expand/lift/abstract HySymbols to Hy expressions that–when eval’ed–would construct them. We can only do this in very limited cases; otherwise, we could end up producing ever-expanding forms.

;; Hy AST expansion pre-unification
(defn unify-hysymbol [u v s]
  (cond
    [(= (first v) 'HySymbol)
     (print )
     (unify `(HySymbol ~(name u)) v s)]
    [True
     (unify u v s)]))

#@((dispatch HySymbol HyExpression object)
   (defn unify-post-walk [u v s]
     (unify-hysymbol u v s)))

#@((dispatch HyExpression HySymbol object)
   (defn unify-post-walk [u v s]
     (unify-hysymbol v u s)))

(unify 'a `(HySymbol ~(var 0)) {})

{(LVar 0): 'a'}

Listing 20 demonstrates the expansion and unification of Hy AST symbols.

Call-expression Goals

Next, we create the miniKanren goals that encapsulate the relationships between simple Hy and Python AST forms. In particular, we’ll limit ourselves to only variable reference and function call forms.

(defn listo [l]
  "A goal stating that `l` is a list."
  (conde
    [(== l []) s#]
    [(fresh [lcar lcdr]
            (== l (cons lcar lcdr))
            (listo lcdr))]
    [s# u#]))

The first AST relation is a simple one between HySymbols and ast.Names. This is where the HySymbol unification implemented above is used.

(defn hy-py-symbolo [hy-ast py-ast]
  "A goal relating Hy and Python AST symbol/name objects (e.g. variable and
 function references)."
  (fresh [symbol-name py-ctx]
         (== hy-ast `(HySymbol ~symbol-name))
         (== py-ast `(ast.Name :id ~symbol-name
                               :ctx (ast.Load)))))

Some Python ast.AST types have fields consisting of lists containing other ast.AST objects (e.g. the ast.Call expressions below). We need a goal that enforces a relation between the Hy and Python AST forms of each element in such lists.

(defn lapplyo [func l-in l-out]
  "A goal that applies the goal `func` between all elements in lists `l-in` and
 `l-out`."
  (conj+
    (listo l-in)
    (conde
      [(fresh [lcar lcdr lout-car lout-cdr]
              (== l-in (cons lcar lcdr))
              (func lcar lout-car)
              (lapplyo func lcdr lout-cdr)
              (== l-out (cons lout-car lout-cdr)))]
      [(== l-in [])
       (== l-out l-in)])))

Finally, we create a goal for the AST of call expressions like func(x, y, z) and (func x y z).

(defn hy-py-callo [hy-ast py-ast]
  "A goal relating call expressions in Python and Hy AST."
  (fresh [hy-op hy-args py-op py-args]
         ;; Hy AST form
         (== (cons hy-op hy-args) hy-ast)
         ;; Py AST form
         (== py-ast `(ast.Expr :value
                               (ast.Call :func
                                         ~py-op
                                         :args
                                         ~py-args
                                         :keywords
                                         [])))
         ;; These two must be related symbols
         (hy-py-symbolo hy-op py-op)
         ;; The arguments are related lists containing more of each AST type.
         (lapplyo hy-py-asto hy-args py-args)))

(defn hy-py-asto [hy-ast py-ast]
  "A goal for a 'branching' relation between multiple types of forms and their
 corresponding Python AST."
  (conde
    [(hy-py-symbolo hy-ast py-ast)]
    [(hy-py-callo hy-ast py-ast)]))

To demonstrate our [extremely] minimal relational compiler, we create a Hy function call expression and its corresponding Python AST.

(setv hy-ast-exa `(print x y z))
(setv py-ast-exa (. (hy-compile hy-ast-exa "__console__") body [0]))
(.format "hy_ast_exa = {}\npy_ast_exa = {}"
         hy-ast-exa
         (astor.dump py-ast-exa))

hy_ast_exa = HyExpression([
  HySymbol('print'),
  HySymbol('x'),
  HySymbol('y'),
  HySymbol('z')])
py_ast_exa = Expr(value=Call(func=Name(id='print'), args=[Name(id='x'), Name(id='y'), Name(id='z')], keywords=[]))

We first run the Hy-to-Python direction by providing hy-expro the hy-ast-exa value above and a logic variable (i.e. an “unknown”) for the Python AST term.

(setv rel-res (run 1 [py-ast] (hy-py-asto hy-ast-exa py-ast)))
(setv ast-res (get rel-res 0 0))
ast-res

HyExpression([
  HySymbol('ast.Expr'),
  HyKeyword('value'),
  HyExpression([
    HySymbol('ast.Call'),
    HyKeyword('func'),
    HyExpression([
      HySymbol('ast.Name'),
      HyKeyword('id'),
      'print',
      HyKeyword('ctx'),
      HyExpression([
        HySymbol('ast.Load')])]),
    HyKeyword('args'),
    HyExpression([
      HyExpression([
        HySymbol('ast.Name'),
        HyKeyword('id'),
        'x',
        HyKeyword('ctx'),
        HyExpression([
          HySymbol('ast.Load')])]),
      HyExpression([
        HySymbol('ast.Name'),
        HyKeyword('id'),
        'y',
        HyKeyword('ctx'),
        HyExpression([
          HySymbol('ast.Load')])]),
      HyExpression([
        HySymbol('ast.Name'),
        HyKeyword('id'),
        'z',
        HyKeyword('ctx'),
        HyExpression([
          HySymbol('ast.Load')])])]),
    HyKeyword('keywords'),
    HyList()])])

And, now, the other direction (i.e. known Python AST, unknown Hy AST).

(setv rel-res (run 1 [hy-ast] (hy-py-asto hy-ast py-ast-exa)))
(setv ast-res (get rel-res 0 0))
ast-res

[HyExpression([
  HySymbol('HySymbol'),
  'print']), HyExpression([
  HySymbol('HySymbol'),
  'x']), HyExpression([
  HySymbol('HySymbol'),
  'y']), HyExpression([
  HySymbol('HySymbol'),
  'z'])]