[<<][meta][>>][..]
Fri Jan 14 13:09:28 EST 2011

## Combining code generation with memoization: Not without monads?

So from the previous post it seems that while it is possible to
recover simple mathematical operations from _black box_ definitions,
the sharing information is lost.

data Expr = Value Integer
| Signum Expr
| Abs Expr
| Mul Expr Expr
deriving (Show, Eq)

instance Num Expr where
a + b = Add a b
a * b = Mul a b
signum = Signum
abs = Abs
fromInteger = Value

f1 x y = x * y
f2 x y = x + y

-- f1 1 2 :: Expr  =>  Mul (Value 1) (Value 2)
-- f1 1 2 :: Expr  =>  Add (Value 1) (Value 2)

f3 x y = a * a where
a = x + y

-- f3 1 2 :: Expr  =>  Mul (Add (Value 1) (Value 2)) (Add (Value 1) (Value 2))

The subexpression (Add (Value 1) (Value 2)) appears twice.  What we
would like to do is to make this sharing explicit by introducing named
nodes, i.e. transforming the tree into a directed a-cyclic graph,
which can be represented as a dictionary of nodes of primitive
operations, where the operands of each node only point "backwards" in
time.

It seems impossible to write a Num class instance that recovers this
sharing structure.  It is possible to work around this solving a
_different_ problem using the Eq relation on nodes.  That effectively
implements common subexpression elimination and is an expensive
operation.  "True" node equality that was present in the source
program is lost forever.  This is a _feature_ of the language called
referential transparency[2] : expressions represent values and
_nothing else_.

To contrast this, I've implemented a similar code generator in Scheme,
and its impure features of strict specified evaluation order, global
shared state and object identity make it trivial to translate
expressions to dictionary form, preserving the sharing.  The meaning
of the generated output program (the specification of the exact
operation sequence) mirrors the observable side effect of the Scheme
code evaluation.

In a pure language that simply does not work as all this execution
machinery does not influence the computations, so you have to

* Introduce these side effects or evaluation order explicitly,
i.e. through monads or combinators.  This is the approach used
in:

``Jacques Carette and Oleg Kiselyov Multi-stage programming with
generic code.'' (as accepted but unrevised) Science of Computer
Programming [1].

Essentially this writes the `let' operation as a function.

* Work around them in other ways (i.e. forgetting about sharing and
using common subexpression elimination to convert the exploded
trees back to a DAG).

This has the advantage of being able to recover expressions the
programmer missed, but it completely ignores any sharing
expressed in the source code.

I've used this approach here[3].