Sat Sep 27 10:30:24 CEST 2008

A MetaOcaml offshoring example

This post contains some examples of metaprogramming based on
MetaOcaml's staging operations and Ocaml->C offshoring.  By
representing a subset of C as a subset of Ocaml, basic Ocaml->Ocaml
metaprogramming can be used, avoiding manual generation of C code.

MetaOcaml:    http://www.metaocaml.org/
terminology:  http://web.cecs.pdx.edu/~sheard/staged.html

See the following paper for an introduction by the authors:
``Implicitly Heterogeneous Multi-Stage Programming''

The idea is that heterogenous metaprogramming can be implemented in a
homogeneous staged language by directly mapping a particular subset of
a representation language to an as large as possible subset of a
target language.

This translation is different from ordinary compilation in that

1. Translation is DIRECT.  In order to give full control over
   generated code, there is no optimization as there is in an ordinary
   language compiler.

2. The aim is at full coverage of the TARGET language, not the source
   language as happens in ordinary language compilation.  Support as
   many of C's language features as possible, but do not attempt to
   translate all of Ocaml's features, like higher order functions,
   because they have no equivalent target representation.

The staging operations used are:

.< e >.            bracket
.~ e               escape
.! e               run
.!{Trx.run_gcc} e  offshored run, using gcc backend


Ocaml code that can be translated to C needs to have the following
properties: Functions need to be first order, and can use only the C
base types char,int,double and 1/2 dimensional arrays of those base
types as function arguments.  Functions can return only base types.

C function that take multiple arguments correspond to first-order
Ocaml functions that take a single tuple as input.

# .!{Trx.run_gcc}(.< fun (a, b) -> a + b >.) ;;
- : int * int -> int = <fun>

This produces a piece of C code:

  int procedure(int a_1 , int b_2 )
    return a_1 + b_2;

The other code in the C file is necessary to run within Ocaml, but is
guarded by an #ifdef OCAML_COMPILE to make sure the source can be
included as-is into different projects.

The following defines a higher order (curried) function, which cannot
be translated.

# .!{Trx.run_gcc}(.< fun a b -> a + b >.) ;;
Can't translate function return type in  <int -> int>
Can't run sth that is not a function
   or is not of int type; you passed <int -> int -> int>
Fatal error: exception Typecore.Error_Translation_type_check_subset

Replacing + by +. (Ocaml's numeric operations are not polymorphic
between ints and floats) generates floating point code:

# .!{Trx.run_gcc}(.< fun (a, b) -> a +. b >.) ;;
- : float * float -> float = <fun>

  double procedure(double a_1 , double b_2 )
    return a_1 + b_2;

Arrays are supported: Ocaml's arrays map to C arrays.

# .!{Trx.run_gcc}.<fun x -> x.(0) + 123>. ;;
- : int array -> int = <fun>

  int procedure(int *  x_1 )
    return x_1[0] + 123;

The looping construct supported is Ocaml's "for" loop.  Note that
functions cannot return void (unit) so this includes a dummy "0"
expression after the side-effecting array element assignment loop.

# .!{Trx.run_gcc}.<fun x -> for i=0 to 99 do x.(i) <- 123 done ; 0>. ;;
- : int array -> int = <fun>

  int procedure(int *  x_1 )
    int i_2 ;
    for(i_2 = 0; i_2 <= 99; i_2++)
      x_1[i_2] = 123;
    return 0;

Options can be passed to gcc like this:

# ( .!{Trx.run_gcc with compiler_flags="-g -O2";
                  .<fun x -> x + x>.) 2;;
 - : int = 4

The following examples use a file called /tmp/test.ml which when
loaded generates test.c as a side effect.

  #use "/tmp/test.ml" ;;

This assumes the .ml file contains a line

  let p = .!{Trx.run_gcc}procedure

at the end, where "procedure" is bound to the code object that needs
to be offshored.


Creating parameterized code can be done using two mechanisms:
explicit excape (unquote) and cross-stage persistence.

The simplest example I can think of is a specialized addition

  let make_adder c = .< fun x -> x + .~c >.
  let procedure = make_adder .< 123 >.

The following C code is produced from the code above:

  int procedure(int x_1 )
    return x_1 + 123;

This is an example of explicit escape: the variable "c" is bound to a
piece of code that is spliced into the body of the function.

The same example using a mechanism called "cross-stage persistence"
(csp) looks like this:

  let make_adder n = .< fun x -> x + n >.
  let procedure = make_adder 123

Here one of the operands "n" is taken from the lexical environment
outside the quoted code.  This variable lives at an earlier stage than
the code fragment, and so needs to be carried over to the next stage.

Carrying over such values to the next stage is no problem as long as
one stays inside Ocaml, where csp values can remain abstract.  In
order for csp values to work for gcc offshoring, they must be
representable as concrete C literals.  In the example above this is no
problem since "n" refers to a numeric value.


Code expansion without reduction (partial evaluation) is quite
limited.  When generating code, often some kind of specialization is
desired which eliminates unnecessary run-time computations.

In the following paper the technique of abstract interpretation is
employed to generate FFT code:

Instead of working with abstract code objects directly as produced by
the bracket operation, and consumed by the run and escape operations,
one works with types that allow the representation of extra
information when it is available.  The idea is that code represented
by such objects can be "processed" without having to look inside any
abstract code objects.  Deconstructing code objects could violate
guarantees provided by the type system and is not possible in

I.e. an integer could be modeled as One | Other, indicating that we
know whether the code value is 1 or otherwise do not have more
information and will model it as an abstract integer code object.
This enables the definition of an abstract interpreting multiplication
operator operating on this type, that can eliminate multiplication by
1 at generation time, but defers the multiplication operation to the
next stage by generating multiplication code when there is no
information about the value of the code.