Fri Jan 14 13:09:28 EST 2011

Combining code generation with memoization: Not without monads?

So from the previous post it seems that while it is possible to
recover simple mathematical operations from _black box_ definitions,
the sharing information is lost.

data Expr = Value Integer
          | Signum Expr
          | Abs Expr
          | Add Expr Expr
          | Mul Expr Expr
            deriving (Show, Eq)

instance Num Expr where
    a + b = Add a b
    a * b = Mul a b
    signum = Signum
    abs = Abs
    fromInteger = Value

f1 x y = x * y
f2 x y = x + y

-- f1 1 2 :: Expr  =>  Mul (Value 1) (Value 2)
-- f1 1 2 :: Expr  =>  Add (Value 1) (Value 2)

f3 x y = a * a where
    a = x + y

-- f3 1 2 :: Expr  =>  Mul (Add (Value 1) (Value 2)) (Add (Value 1) (Value 2))

The subexpression (Add (Value 1) (Value 2)) appears twice.  What we
would like to do is to make this sharing explicit by introducing named
nodes, i.e. transforming the tree into a directed a-cyclic graph,
which can be represented as a dictionary of nodes of primitive
operations, where the operands of each node only point "backwards" in

It seems impossible to write a Num class instance that recovers this
sharing structure.  It is possible to work around this solving a
_different_ problem using the Eq relation on nodes.  That effectively
implements common subexpression elimination and is an expensive
operation.  "True" node equality that was present in the source
program is lost forever.  This is a _feature_ of the language called
referential transparency[2] : expressions represent values and
_nothing else_.

To contrast this, I've implemented a similar code generator in Scheme,
and its impure features of strict specified evaluation order, global
shared state and object identity make it trivial to translate
expressions to dictionary form, preserving the sharing.  The meaning
of the generated output program (the specification of the exact
operation sequence) mirrors the observable side effect of the Scheme
code evaluation.

In a pure language that simply does not work as all this execution
machinery does not influence the computations, so you have to

   * Introduce these side effects or evaluation order explicitly,
     i.e. through monads or combinators.  This is the approach used
     ``Jacques Carette and Oleg Kiselyov Multi-stage programming with
     functors and monads: eliminating abstraction overhead from
     generic code.'' (as accepted but unrevised) Science of Computer
     Programming [1].

     Essentially this writes the `let' operation as a function.

   * Work around them in other ways (i.e. forgetting about sharing and
     using common subexpression elimination to convert the exploded
     trees back to a DAG).

     This has the advantage of being able to recover expressions the
     programmer missed, but it completely ignores any sharing
     expressed in the source code.

     I've used this approach here[3].

So let's bite the bullet: implement the sharing Monad in Haskell.

[1] http://www.cas.mcmaster.ca/~carette/publications/scp_metamonads.pdf
[2] http://en.wikipedia.org/wiki/Referential_transparency_%28computer_science%29
[3] http://zwizwa.be/darcs/meta/ai
[4] https://agora.cs.illinois.edu/download/attachments/16883/emsoft04.pdf