\documentclass[12pt]{article}
\begin{document}

These are originally from PDP, but they belong more in the Chicken
framework. Mostly about keeping PDP alive, and building some proper
design for multimedia apps.


\section{Mon Oct 31 16:21:36 EST 2005}


I start to feel the itch. It's PDP reappreciation day!
After a long journey through forth land i come to the conclusion that it is better
to decouple PDP and PF. For several reasons:
\begin{enumerate}
\item[STABILITY] 
It will take a while to stabilize the design of PF. Currently, a lot is possible,
but I'm afraid I won't stop wanting to change things. There has been a significant
lack of direction in the past. This will probably prevail.
PDP is relatively stable. I am still using PDP for my own performances.
\item[SPEED] 
PDP is simple and and it is fast.
Contrary to my goals, PF does not
reach the speed it is supposed to reach.
This is the libtile fiasco.
\item[API] 
PDP is effectively frozen: people are using it,
writing extensions for it, and there have been no updates to the API since quite a while. 
\item[VM]
Integration between PD and PF is really bad. This is a design error. Reasons
are mostly that PD uses the C stack, while PF uses its own return stack and a single
shared resource: the PF virtual machine. This necessitates context switching between
PD and PF, which makes PF objects expensive and clumsy. PF is really a standalone system,
despite it's original goal as a plugin system and extension language.
\end{enumerate}

A new direction could be to fix PDP in its current state \verb+version 0.12.4+ and incorporate
changes from PF in the form of \verb+C+ files. First task is to make it build. Second task is
to repair 3DP.

\section{Tue Nov  1 11:17:26 EST 2005}
Some important questions. Most of them not for now, but need to think about this
when redesigning everything.
\begin{itemize}
\item[CORE]
Is it possible, and how much work would it be, to separate out the image processing core
(again) so it fits into old style PDP and PF?
\item[GC]
Does it make sense to implement mark/sweep garbage collection for pairs together with reference counts?
Where exactly does this solve the problem of fast reuse of on--stack data packets, assuming
the data stack itself is a chained pair list. The packets would be atoms.
\item[TYPE]
How to efficiently implement type dispatching and generic functions?
\end{itemize}

What i'm saying here is that I'd like to implement a forth in lisp, but in a way that the
fast reuse of data on the stack is still possible. Now, suppose the stack itself is
an object which has a reference count, these counts do not need to be propagated to
the things contained in the stack. The way to implement this is to make the contents
of the stack \emph{unobservable}. 

Does this at all make sense? Maybe yes. Take the analogy with Quantum Mechanics.
If the garbage collected scripting language (i.e. a lisp) is the ``classical universe'',
and the forth is the ``quantum universe'', then observing a quantum state will change it.
Basicly this means that observing the state of the forth will effectively copy data from
forth to ``outside''. Maybe i can use quantum information theory for this?

I did effectively implement this already: in the \verb+guile<->pf+ bindings there
is exactly this mechanism. As long as stuff stays in the forth, it will be fast reused.
When it is pulled out into scheme to put it in advanced data structures, there is
an \emph{observation} which involves copying of state. A similar approach is used
in SuperCollider.

Writing this out, we can find the following modules. The server side will consist of
\begin{itemize}
\item[FORTH] Central processing unit with type system. Purely functional with tree state.
\item[I/O]   Low level I/O to hardware and OO libraries, directly connected to the processing unit. This contains
both synchronous (audio/video) and asynchronous (keyboard, mouse) I/O.
\item[MEM]   Low level RC memory management.
\end{itemize}
The client can be all kinds of things, with the protocol between the two
being single or full duplex serialized forth.
\begin{itemize}
\item[LOGIC]  High level data structure and control flow. Gui like PD or scripting language like Scheme, Python.
\item[I/O]    High level I/O, like serialization for FORTH state etc.
\item[MEM]    Full GC.
\end{itemize}

The key element here is the type system and memory management features of the core FORTH. 
This is where the real choices are to be made. For example. Typed versus blind. Blind is more
efficient CPU wize, but typed will severely improve client/server communication and programming. Then
memory management: linear or tree? We cannot go to net (circular structure) since we cannot
afford a mark/sweep GC in the core, but probably linear memory is a bit too raw.

The lisp client will never look ``inside'' the data packets. Can FORTH look inside?
Probably yes. This enables some low--level hacks, without having to resort to C
all the time. 

We need to look at an object system for FORTH. 
\begin{itemize}
\item Every data item is an object, accessible through its id. This makes the image relocatable.
Each object has an RC. This makes it collectable.
\item An object is either a pair, a grid or an atom. 
Type is a property of atom, not of cell in container as in pf.
This is the same structure as lisp/scheme,
to make binding easier. Grids are necessary optimizations of a subset of structures 
pairs can represent. Grids contain only a single type.
This is enough to construct any composite data type. Circular references are allowed as exceptions, 
but will not be collected when orphaned. Only a save/load operation will eliminate those.
\item The entire state is an object and can be serialized.
\item The ports system is not serializable as pure data, it does have \emph{boot code},
which will restore its state to the current one, or a compatible state.
\end{itemize}

This fixes the data structures, except we need to define types and generic functions.
The control flow needs a polling loop. Do we allow for parallel processing? The latter
is probably easier to solve using multiple kernels.
The regular composite data types have independent components. They need to
be serialized somehow, so we have to make a choice about how to do that.
An example. An 3 component image could be a rank 4 bit grid (float, x, y, c), 
or it could be a list containing 3 rank 3 grids (float, x, y). 
The choice between float and int should be made early on for example.

An atom is a grid or a packed object. The main difference is that a packed object
cannot be processed using general (tensored) methods, while grids can. A packed object
can be of variable length, and can contain boot code for instance.

I've been loosing a lot of time by not acknowledging this symmetry beforehand.
\emph{A grid can be transposed AND a grid can be represented by pairs.}
The type system could be summarized as
\begin{verbatim}
object = atom | pair
atom = grid | packed
grid = type x length x width ...
\end{verbatim}

Now the big question is, how to solve the 2 problems?
The isomorphism between pairs of grids and tensor products of grid spaces should
be solved automaticly: it is about generating iterators.
An intermediate solution could be the \emph{grid of grids}. A pair could be
mapped into a grid of grids. And a grid can be mapped to a grid of grids if
we allow \emph{subgrids}.

\def\iso{\leftrightarrow}

Let's call $G$ the space of all grids, $GG$ the space of grids of grids,
$TG$ the space of tree
s of grids and and $PTG$ the space of proper trees
of grids. Assume that a grid can be zero--dimensional,
representing a scalar, and ignore the base type for a while. Introducing
this can be done by tensoring with the type space.
We ignore transposition too, and assume the dimensions of a grid are ordered.
Transpose can be introduced as permutations in all spaces except for $T$, which
includes non--proper trees of grids.
There is a map $PTG \to TG$ which chooses only the proper trees.
Assuming an ordering in the dimensions of the grids, we have $PTG \iso GG \iso G$.

Practically, we have to assume this ordering, since it is implemented in the
way the grid data is serialized in memory. The possible permutations are reflected
in \emph{permutations of iteraton} and \emph{permutator code}. For efficiency reasons,
we need to unroll say 2 dimensions of grid space.

Memory reuse. Whenever we use functional maps, and no in--place processing, it is
always clear when we \emph{can} perform in--place processing. For example, the
basic unitary and binary operations on grids can be done in--place depending on
the reference status of the buffers. The only point where this has impact is at
the level of register usage: doing an operation in--place requires less registers
for iteration. 

Another remark. Using the trees of grids representation will allow me to keep
compatibility with PDP, as long as it is implemented as subgrids of a packed
grid. Such a beast would look like a 3--tuple of (non--equal) 2--grids to the
new core, and as a YCrCb packet to old PDP. 

\section{Wed Nov  2 12:03:55 EST 2005}

Something I wrote up yesterday before going to the movies. About light cones.
How can we classify the causal relations between elements of a tensor, and how
does this relate to tiling? 

First, \emph{EXECUTION = SERIALIZATION}. This means loop structures can be coded
as permutation and natural unfolding. What does
this mean? For example, if we have a second order $256 \times 256$ image, it can be transposed
into a fourth order $16 \times 16 \times 16 \times 16$ tensor, which then onfolds from
left to right: row, column, rowtile, columtile. The higher order dimensions inherit the
causal structure. So wich causal structures can we distinguish between source and destination
of a calculation?

\begin{itemize}
\item[LOCAL]        Complete elementwize independence. (i.e. mixing)
\item[COMPACT]      Along some dimensions there is a local dependence (i.e. separable and non--separable convolution)
\item[NON-COMPACT]  An entire subtensor is mapped. (i.e. arbitrary resampling)
\item[CAUSAL]       Simplified version of non--compact, where there is a left $\to$ right causal relation along some dimension.
\end{itemize}

A permutation needs to respect these causal relations. Then, a tensor could be encoded
from inside to outside as $16 \times 16$ grid, $16 \times 16$ grid of grid, and on top of that
a list or tree. Iterators for the inner grids could be hardcoded, while iterators for the outer
grids can be dynamic. 
These two together is a lot of structure to build a generic theory of iterators
around.
I don't need to implement it because i can use BLAS, but this works for matrix multiplication for instance,
which is way more complicated due to the unequal non--compact subtensors mapped to a single element.


\section{Thu Jan 12 10:34:54 CET 2006}

So without looking back, the most important language feature of PF is polymorphy and
object support. These are basicly the same: some words have different symantics depending
on what is on top of the stack. This is similar to LISP generic functions.
While for most tasks an OO model is not necessary, it surely helps for
things like configuring video cameras. Better to do \verb+mycamera grab+ than \verb+mycamera v4l:grab+.

The standard way of solving this is to have a pointer from the \verb+mycamera+ object to its class,
which in turn points again to a dictionary from which grab can be looked up. In PF, the word \verb+grab+
itself contains pointers to \verb+v4l:grab+ and other objects, which decide themselves to run or not
run (by emitting a type error).

The real questions here are: how to speed up searches, and how to easy definition of polywords
like that. An important issue is binding time. Will we bind at runtime or compile time? The whole
polyword business is actually about late binding, while the general philosophy in FORTH seems
to be to bind as early as possible (for speed issues).

So to summarize over another axis: i ran into two big differences between LISP and FORTH:
garbage collected containers vs. stacks and refcounts, and late binding vs. early binding.

The first difference means FORTH cannot have \verb+cons+ (cannot have GCd pointers), 
while the second means the type system needs to be thought out a lot better.


\end{document}