Sun Jul 17 10:36:56 CEST 2011

Mixing expressions (trees) and sharing (directed, acyclic graphs)

( So much effort for something seemingly so trivial.  Granted, the
  effort was largely due to ignorance.  I understand the state-cont
  monad better now, and playing with it a bit the structure seems
  quite powerful. )

Can we now write such monadic functions as (returnN . sin) to obey a
type class such as Num?  No.

The reason for this is that Num allows nesting s.a. sin(sin(sin x))
and a + (b + c).  Such tree-structured expressions with intermediate
nodes are exactly what we're trying to avoid.  It is the bind
operation (the arrow in do-notation) that introduces order of
evaluation so sorely needed to incrementally build the data flow graph
incrementally as memoized values.

So, what do we have here?  A do-notation that resembles ordinary SSA
form and a collection of functions with types:

  a -> M a
  a -> a -> M a
  a -> a -> a -> M a

Now, this notation sucks.  Mostly because it's a pain in the ass to
use at points where it would be straightforward to infer from simple

What about using a middle ground here?  Allow nested expressions
(simply num class) but use the monad only when sharing needs to be

Let's try this.

The prototypical function would be square.  I've noticed a quaint
difference here between sharing the input or output.  See the
functions below and their output.

square x = do
  vx <- returnN x
  return $ vx * vx
f x = do
  x2 <- square (1 + x)
  x4 <- square (1 + x2)
  square x4

*Main Control.Monad> toSSA $ f $ var "a"
R0 = (add 1 a)
R1 = (add 1 (mul R0 R0))
R2 = (mul R1 R1)
return (mul R2 R2)

square' x =
  returnN $ x * x

f' x = do
  x2 <- square' (1 + x)
  x4 <- square' (1 + x2)
  square x4

*Main Control.Monad> toSSA $ f' $ var "a"
R0 = (mul (add 1 a) (add 1 a))
R1 = (mul (add 1 R0) (add 1 R0))
R2 = R1
return (mul R2 R2)

I did not realize this subtlety, but in retrospect it is obvious:
returnN introduces the variable, so calling it on the input of square
or the output of square' is quite a difference.  In square'
duplication happens because the input is used twice, and the input
might be an expression.

Simply judging from use, it seems best to let functions themselves
decide whether they should create a shared node for their inputs or
not: whenever it's duplicated in the function body you share,
otherwise not.

It seems also best to introduce a different name for returnN,
i.e. "node" or "name" or "var" or a shorter variant of those.  Btw, is
that the same trick as the "i" in [1] which embeds any monad in tge
CPS monad?

                   Moral: If you DUP, name it.

The drawback in Haskell is that you have to do this explicitly: it's
easy to use a variable more than once.  So yes, it's possible to
express this but it still feels a bit clumsy.

The interesting thing about a concatenative language is that the DUP
is always explicit, so you don't need to invent separate notation.

[1] http://blog.sigfpe.com/2009/01/rewriting-monadic-expressions-with.html