Musings about using Scheme, PLT, and the zwizwa/lib toolbox.

Library contents:

   * zwizwa/lib/mfile:  Scheme objects with a 1-1 correspondence to
     filesystem objects.  Convenient for representing unix utilities
     that operate on files as scheme functions and garbage-collected
     objects.

   * zwizwa/lib/rv: Reactive Values.  A simple lazy update, strict
     invalidate reactive programming abstraction.


The lib contains more as zwizwa/lib/x/* but those are considered
unstable or private (only used in other zwizwa/* packages).  If you
find one of those useful, inform me so I can promote it to a stable
public API.


Entry: Sequences, generators and lazy lists
Date: Thu Mar 26 17:02:12 CET 2009

Looking for a standard lazy list implementation that will work well as
an intermediate step to transform generators -> sequences.  The
problem is in these conflicting requirements:

  * 'make-do-sequence needs to lookahead in the sequence: know if
    there's an element before the next is generated.  This needs a
    1-level deep lookahead buffer when translating generators to
    squences.

  * generators are easily built using prompt/control but can't know
    end-of-sequence beforehand.

Maybe this is best replaced with the lazy list structure in lazy
scheme: it uses delayed constructors, (not delayed tails).  The code
then becomes:

;; Convert generator to lazy list
(define (generator->lazy-list g [done? not])
  (let next ()
    (delay
      (let ((item (g)))
        (if (done? item) '()
            (cons item (next)))))))

;; Convert lazy list to sequence.
(define (in-lazy-list ll)
  (make-do-sequence
   (lambda ()
     (define (ll-car x)   (car (force x)))
     (define (ll-cdr x)   (cdr (force x)))
     (define (ll-more? x) (pair? (force x)))
     (values ll-car ll-cdr ll ll-more? void void))))

;; Composition.
(define (in-generator g [done? not])
  (in-lazy-list (generator->lazy-list g done?)))


The deeper conclusion is this: lazy lists (static) are better behaved
than generators (dynamic).


Entry: Using control/prompt
Date: Thu Mar 26 20:46:22 CET 2009

Using prompt and control it's possible to turn a data traversal into
an element generator.  For example, iterate over all elements in an
xml tree, and prodice a sequence of (list path element):

;; Generate all elements + (reverse) path.
(define (element-generator top-e)
  (define (next)
    (let down ((e top-e)
               (p '()))
      (control k
               (set! next k)
               (list p e))
      (for ((e (element-content e)))
           (when (element? e)
             (down e (cons (element-name e) p))))
      #f))
  (lambda ()
    (prompt (next))))


Now I wonder, can we create a sequence directly without mutation and
going from generator -> lazy list?  The (element . k) pair can
probably be instantiated as a data item.

;; A glist is either a pair of (element . generator-thunk) or '()
;; The thunk generates the tail of the list.
(define (element->glist top-e)
  (prompt
   (let down ((e top-e)
              (p '()))
     (control k
              (cons (list p e)
                    (lambda () (prompt (k)))))
     (for ((e (element-content e)))
          (when (element? e)
            (down e (cons (element-name e) p))))
     '())))

(define glist-null? null)
(define glist-car   car)
(define (glist-cdr gl) ((cdr gl)))

This can be combined with memoization by wrapping the continuation in
a promise instead of a thunk.

;; Directly convert a traversal program into a lazy list, a memoized
;; version of the glist above.


(define (element->ll top-e)
  (prompt
   (let down ((e top-e)
              (p '()))
     (control k
              (delay
                (cons (list p e)
                      (prompt (k)))))
     (for ((e (element-content e)))
          (when (element? e)
            (down e (cons (element-name e) p)))))
   (delay '())))


Abstracted with macros this becomes:

(define-syntax-rule (ll-yield x)
  (control k (delay (cons x (prompt (k))))))
(define-syntax-rule (ll-begin . body)
  (prompt (begin . body) (delay '())))

(define (element->ll top-e)
  (ll-begin
   (let down ((e top-e) (p '()))
     (ll-yield (list p e))
     (for ((e (element-content e)))
          (when (element? e)
            (down e (cons (element-name e) p)))))))


This allows construction of lazy lists from lazy list comprehensions,
giving a simple composition mechanism:

(ll-begin
  (for ((x (in-lazy-list other-ll)))
    (ll-yield (+ 1 x))))


Entry: Zipper
Date: Fri Mar 27 12:16:42 CET 2009

Using lazy lists as in the previous example only allows the
consumption of a data structure.  How can we use this to update/create
new trees?

Starting from a map function instead of a for-each functions, we
create the zipper data structure.

http://okmij.org/ftp/Scheme/zipper-in-scheme.txt

(define-struct zipper (element yield))

(define (collection->zipper map collection)
  (reset
   (map (lambda (el)
          (shift k (make-zipper el k)))
        collection)))
  

So, is it fair to say a zipper is a symmetric (bi-directional)
coroutine and a lazy list an assymetric (uni-directional) coroutine?

At the traversal side, the other side is represented by the mapped
procedure, while on the zipper side the iterator is represented by the
delimited continuation.


Entry: Union types
Date: Mon Mar 30 10:05:36 CEST 2009

Are union types possible in PLT scheme outside of typed scheme?  No:
elements of a union are types in themselves, and exist independently
of the union.


Entry: Structs
Date: Tue Mar 31 12:40:52 CEST 2009

Is it possible to create a derived struct given a prototype?

(define-struct a (x))
(define-struct (b a) (y))

can we make an instance of struct b without knowing a?  This probably
needs delegation (a 'super field) instead of subclassing.


Entry: Visitor
Date: Sun Apr 12 16:27:15 CEST 2009

http://calculist.blogspot.com/2008/05/functional-visitor-pattern.html


  This simple traversal language instructs the fold either to continue
  (Cont), to skip all descendants of the current node (Skip), or to
  abort the traversal entirely (Stop).


Entry: Generator end of sequence
Date: Fri Apr 17 10:04:50 CEST 2009

Generators need a unique value to indicate end-of-stream.  #f is
convenient but probably not a good idea.


Entry: Multiple values and sequences
Date: Fri Apr 17 11:09:46 CEST 2009

For lazy lists, it's not possible (by their definition: linked cons
cells contain only a single value in the CAR position).

PLT's sequences (which are generators) do support them.

As mentioned before: the problem with multiple values is channeling
the end-of-sequence value and the multiple values through the same
channel.  The scheme's sequence api has different functions for these.

The simplest solution seems to be to:

  - stick with lazy lists as basic abstraction.
  - use an "unpacking" iterator: (in-lazy-list ll values)


I'm adding a form "in-producer" which couples a multiple values
generating body in terms of (yield . args) with a multiple values
producing sequence.


Entry: Zipper and sequences
Date: Fri Apr 17 11:25:34 CEST 2009

Sequences are uni-directional, zipper is bi-directional.  However, is
there a way to unify both in the same construct?  Something like
for/list or for/fold?

This is not the right question.  The right question is: how to relate
processes that communicate through channels with zippers and lazy data
structures.


Entry: Using fold vs. explicit tail recursion
Date: Sun Apr 19 00:49:38 CEST 2009

I often find myself writing tail recursive loops, accumulating results
in (multiple) stacks, then reversing them in the end.  Is this pattern
necessary?

I guess what I'm looking for is an "unpacked" fold, where individually
named state objects get passed during iteration.  This makes sense for
a left fold.  Does it also work for a right fold?


Entry: Hygiene: syntax-case & introduced identifiers
Date: Fri Apr 24 09:27:24 CEST 2009

A good thing about syntax-case is that it can introduce identifiers
whenever a quoted syntax object does not refer to a pattern variable.
However, if such a symbol is is a reference position, you will find
out only at expansion time of the macro that the identifier is not
defined.

Why is this?  Why can't this be known at macro compile time (i.e. the
for-template includes should have it, no?)

Is this because the compiler can't know whether the identifier is in a
binding position (in which case it is legal) or a reference position
(in which case it needs to be defined in the context in which the
macro is expanded) ?


Entry: Imperative struct update
Date: Sun Apr 26 13:14:18 CEST 2009

I've been looking for this for a while.. Using introspection it should
be possible to create a macro that updates a struct functionally, but
with field accessors..  I think it's in machine.ss


Entry: Immutable cyclic structures
Date: Sat May  2 23:15:35 CEST 2009

self-refence with immutable structures is only possible with
lazyness. however, PLT scheme contains the 'shared form, much like
'let but for data structure bindings.

as long as references are guarded by list/struct constructors, this
form can create cyclic structures.

(shared ((ones (cons 1 ones))) ones)

This works with the (immutable) native data types and mutable structs.


Entry: Immutable hash tables
Date: Tue May  5 10:22:50 CEST 2009

From the PLT Scheme reference manual:

3.13 Hash Tables

  "A hash table (or simply hash) maps each of its keys to a single
  value. For a given hash table, keys are equivalent via equal?, eqv?,
  or eq?, and keys are retained either strongly or weakly (see Weak
  Boxes). A hash table is also either mutable or immutable; immutable
  tables support constant-time functional update.

I didn't know about the immutable constant-time functional update.
That's pretty cool!  I wonder how it is implemented.

Red-black trees [1] [2].

[1] http://groups.google.com/group/plt-scheme/browse_thread/thread/d6da66d2bb3855e6/445f23588930fc90?lnk=raot
[2] http://en.wikipedia.org/wiki/Red-black_tree


Entry: Code vs. Data  (abstract or concrete instructions?)
Date: Thu Jun 11 12:21:17 CEST 2009

In Scheme, I am tempted to use code instead of data as the result of
functions.  This is akin to an actor[4] style of programming.

Instead of letting functions return data that needs to be interpreted
by the caller, have them return behaviour: abstract entities that only
leave the freedom of execution time, not freedom of interpretation.

This can lead to some very concise code, but when used over-zealously
it can complicate matters.  Sometimes intermediate data stucutures
that require (centralized, explicit) interpretation are easier to
understand than objects that delegate functionality behind the scenes.

This seems to be about two conflicting types of abstraction: message
passing (GOTO with arguments/continuations), vs static descriptions =
concrete representation based approach.

This popped up in the construction of a dataflow language (DFL)
compiler, which transforms a parallel description of nodes and
processes into a serialized runnable Scheme form.  I was throwing
abstract (lambda () ...) all over the place until I realized that
maybe an abstract data structure to be interpreted later was a simpler
approach.  Often however, the abstract approach is better as it allows
to specify some functionality locally to where the decision needs to
be made, instead of in a central place.

Looks like this is very much related to the "dependency injection"
principle.  If you _can_ do it concretely, please do as this is often
easier to understand.  However, sometimes things need to be abstracted
out and decided elsewhere.  Late binding can be useful (and when used
as a central organizing principle[5] it can be quite powerful).  But
I'd like to say in general: don't make it too flexible!

Note that abstraction can be a (resource use) optimization.  I.e. in
Forth, explicit data structures are difficult to handle due to lack of
local namespace support and lack of automatic memory management.
Implementing those would be a prohibitive complexity explosion going
agains the grain of the "simplicity" module.  Representing things as
behaviour instead of data structures is usually a valuable and
efficient option.  In Forth, memory is usually best managed where it
is used, which means stacks or dynamic variables (which are
essentially stacks).  This management is then easily hidden behind a
procedure.  This is "don't set a flag, set the data" from Leo Brodie's
"Thinking Forth"[1].  Or is this quote from Dijkstra[2][6]?  I can't
seem to find the original reference.

This is related to Chuck Moore's early binding[3] philosophy.  Aiming
for minimalism, the best strategy is to not leave things open for
(re)interpretation.  Minimalism makes rewriting easier: so you aim for
full re-implementation instead of trying to anticipate future
behavioural modifications.


[1] http://prdownloads.sourceforge.net/thinking-forth/thinking-forth-color.pdf?download
[2] http://dereksorensen.com/?p=45
[3] http://www.ultratechnology.com/forththoughts.htm
[4] http://en.wikipedia.org/wiki/Actor_model
[5] http://en.wikipedia.org/wiki/Smalltalk
[6] http://c2.com/cgi/wiki?EwDijkstraQuotes


Entry: Abstract interpretation ('eval in macros)
Date: Thu Jun 11 13:51:03 CEST 2009

In Scheme low-level macro programming, most of the time 'syntax->datum
or 'syntax-case will be enough to allow for compile-time computations
necessary to construct output syntax.  However, sometimes you need to
actually "bring the input syntax to life" once before deciding how it
will be transformed finally.  In general this is called abstract
interpretation [1].

When this interpretation process is similar to interpreting Scheme
(i.e. it includes dealing with lexical variables to construct graphs)
it can be easier to just invoke 'eval directly, and encode the desired
alternative semantics as macros.

The dataflow pattern of such a syntax transformer is:

                            scheme code
  input syntax             ------------->  modified syntax

                                eval
  modified syntax          ------------->  value

                            scheme code
  input syntax + value     ------------->  transformed syntax


In PLT Scheme this kind of trickery can be performed by using
(anchored) namespace objects passed to 'eval.

Namespace objects map identifiers to semantics.  In normal macro
programming you don't need to use explicit namespace objects since
everything has only one meaning.  When you want _alternative_
interpretations, you need to either perform interpretation manually
(i.e. using 'syntax-case or 'syntax->datum) or use the Scheme
interpreter as a whole by invoking 'eval in a specically constructed
namespace.


This problem popped up when trying to compile a parallel program in a
dataflow language (DFL) to a staticly scheduled sequential Scheme
program.  In order to construct a properly sequenced code list, the
dataflow graph needs to be constructed and analysed for dependencies.
However, I implemented graph building on top of Scheme's lexical
scoping mechanism using macros, which means that the graph data
structure itself is only available _after_ expansion.  This allows me
to avoid having to construct the graph manually.  However, building a
sequential program from this requires a two pass algorithm: build the
graph and interpret it abstractly to find a serialized code sequence,
then compile everything to plain scheme in the right order.  The first
pass can then be performed using 'eval on an alternative
interpretation which yields the proper code sequence.  This can then
be used together with the original syntax to construct a compilation
to sequential Scheme code.

In Staapl I did run into this problem a couple of times before.  One
other case is the automatic wrapping of Scheme functions, which needs
information about their arity which is only available after executing
the functions.

Another occurance happened when trying to compile the dependency graph
of the cyclic forth bootstrapper, which involved several alternative
interpretations of the code: as code running on a threaded target
interpreter, and as code running inside the host compiler on a
different VM.


[1] http://en.wikipedia.org/wiki/Abstract_interpretation


Entry: Unix utilities
Date: Mon Jun 15 17:10:52 CEST 2009

I'm looking for a way to easily abstract unix unitilities as Scheme
functions, in a way that abstracts the use of the filesystem.

The idea is simple: 

  * keep a 1-1 map of objects managed in the Scheme memory model, and
    data stored on-disk.

  * provide a mechanism for external programs to access the files:
    external programs are modeled as "directory transformers".

  * tie into the Scheme GC using 'register-finalizer

  * add a storage size monitor to trigger global GC.

The way to do this is to manage all files in temporary storage, and
move files around into temporary directories that are the "context" of
the external programs.

In general, files are owned by Scheme, and essentially invisible to
the external world, until they are made available during a "procedure
call".  Note that this mechanism doesn't work for files that are
accessible from the outside.  (However, filesystem reference counts /
hard links can be used for this).

Since files are now tied into the global garbage collection mechanism,
they are subject to long life if collection doesn't get triggered.
Therefore on each operation we might want to keep track of the size of
the files, and trigger a collection whenever a preset limit is
reached.


Entry: Mfile cleanup
Date: Tue Jun 16 11:06:25 CEST 2009


- mfile is now abstract.  it supports conversion into input/output
  port and can be passed to external programs.

- mdir is a hash table of mfiles

- i've included latex.ss as an example.  this code is factored into 2
  parts: command wrappers that call format/system (fsystem) and
  filesystem wrappers that use these together with mfile and mdir
  abstractions

- with-mdir now looks like a let-expression, which resembles its
  semantics.  however it still returns a directory object (external
  programs are mdir->mdir maps) which then can be queried using
  mdir-refs

(define (tex->dvi tf)
  (mdir-refs
   (with-mdir `(("doc.tex" . ,tf))
              (lambda () (latex "doc.tex")))
   "doc.dvi"))

- added mfile->bytes and bytes->mfile 


Entry: Weak References
Date: Thu Jun 18 13:56:58 CEST 2009

Next: add weak references[1] to accelerate mfile->bytes.

[1] http://docs.plt-scheme.org/reference/weakbox.html
[2] http://docs.plt-scheme.org/reference/ephemerons.html


Entry: Apache logfile parsing
Date: Sun Jul 12 11:19:31 CEST 2009

After figuring out that minimal regexp matching is a lot faster than
what I had before, it's time to work on incremental updates.

I suppose the regexp problem was due to the date matcher causing
excessive backtracking. I used "(.*)/(.*)/(.*):(.*):(.*):(.*)" instead
of "(.*?)/(.*?)/(.*?):(.*?):(.*?):(.*?)".

In "share.ss" allow for the dictionaries to be passed in so they can
be updated.


Entry: About 'eval at expansion time
Date: Sun Jul 19 09:39:53 CEST 2009

(to plt-scheme@list.cs.brown.edu)

Dear list,

I've read a number of posts on this list about replacing the need for
'eval by appropriate use of macros.  Overall, I've found the way
macros are used in PLT Scheme fascinating, and it has been a good
guide to clarify some practical problems I've been struggling with in
the past.

However, recently I've run into a pattern that confuses me.  It has to
do with the observation that 'eval seems to sometimes be the simplest
way to turn source code into an informative data structure at macro
expansion time, such that this can then be used as extra input to a
syntax transformation.  (Abstract Interpretation)

While this feels like a necessary use since 'eval actually does
something that can only be mimicked by implementing something like
'eval, it also feels like a dirty hack in a way I can't really
describe.

Concretely, this is about a macro that compiles a dataflow language to
Scheme.  The input syntax is interpreted twice: once to construct a
simplified interpretation as a dependency graph, which is then used to
sort a list of subforms in the original syntax so it can be expanded a
second time into a serial Scheme program.  Concrete code for the two
stages can be found here [1][2].  I use 'eval together with namespace
anchors to implement this.

I guess I'm looking for a name for this pattern or a reason to not use
'eval.  It seems that any alternative would involve higher order
macros in a way I don't quite grasp..

Ideas welcome,
Tom

[1] http://zwizwa.be/darcs/staapl/staapl/machine/dfl.ss
[2] http://zwizwa.be/darcs/staapl/staapl/machine/dfl-compose.ss


Entry: Re: About 'eval at expansion time
Date: Sun Jul 19 14:28:13 CEST 2009

Reply from Matthew Flatt[1].  Also, see at the end of this post.

The trick is to expand to `let-syntax'.  I feel I understand the basic
idea, but I have to think a bit more about this though.

I read the reply, slept over it, now I'm going to try to explain it.
The macro that does the magic is:

 (define-syntax (show stx)
   (syntax-case stx ()
     [(_ e)
      #'(let-syntax ([exp
                      (lambda (stx)
                        (with-syntax ([v e]) ;; (1)
                          #`(printf "~s yields ~s\n" 'e 'v)))]) ;; (2)
          (exp))]))

The `let-syntax' form[2] is quite straightforward.  It binds an
identifier to syntax.  The `with-syntax' form[3] is like `syntax-case'
but matches all patterns.  No magic there.

The magic happens by using `e' in two different positions.  One that
makes it verbatim into the eventual expansion of the macro (2) and one
that will be evaluated at expansion time (1).  Or: (1) is behind one
`syntax' and (2) is behind two, where one level of syntax quoting is
removed by the expansion, which serves the role of `eval' at expansion
time of `show'.

A neat trick indeed!


[1] http://list.cs.brown.edu/pipermail/plt-scheme/2009-July/034623.html
[2] http://docs.plt-scheme.org/reference/let.html#%28form._%28%28lib._scheme/private/letstx-scheme..ss%29._let-syntax%29%29
[3] http://docs.plt-scheme.org/reference/stx-patterns.html#%28form._%28%28lib._scheme/private/stxcase-scheme..ss%29._with-syntax%29%29


======================================================================
Matthew's full reply:

If I understand, then the following little program (in two modules) is
analogous to yours. It defines a `show' form that takes an expression
`e' to evaluate at compile time, and it prints the quoted form of `e'
followed by is compile-time result. The compile-time form can use a
`plus' binding, which stands for any sort of binding that you might
like to use during compile-time evaluation:

----------------------------------------

 ;; "arith-ct.ss"
 #lang scheme/base

 (define (plus a b)
   (+ a b))

 (define-namespace-anchor a)

 (define (show* e)
   (eval e (namespace-anchor->namespace a)))

 (provide show*)


 ;; use-arith.ss:
 #lang scheme

 (require (for-syntax "arith-ct.ss"))

 (define-syntax (show stx)
   (syntax-case stx ()
     [(_ e)
      (with-syntax ([v (show* #'e)])
        #`(printf "~s yields ~s\n" 'e 'v))]))

 (show (plus 1 2))
 ;; expands to (printf "~s yields ~s\n" '(plus 1 2) '3)

----------------------------------------

Here's how I'd implement the `show' form, instead:

----------------------------------------

 ;; "arith.ss"
 #lang scheme

 (define-for-syntax (plus a b)
   (+ a b))

 (define-syntax (show stx)
   (syntax-case stx ()
     [(_ e)
      #'(let-syntax ([exp
                      (lambda (stx)
                        (with-syntax ([v e])
                          #`(printf "~s yields ~s\n" 'e 'v)))])
          (exp))]))

 (show (plus 1 2))
 ;; expands to (printf "~s yields ~s\n" '(plus 1 2) '3)

 ;; To use `show' outside this module, we need to
 ;;  export `for-syntax' any bindings intended
 ;;  to be used within `show' expressions:
 (provide show (for-syntax plus))

----------------------------------------

This second implementation expands

 (show e)

to 

 (let-syntax ([(exp) .... e ....])
   (exp))

where `e' is in an expression position in the right-hand size of the
`let-syntax', so it gets evaluated at expansion time. The main trick is
to invent a local name (in this case `exp') to house the expand-time
expression and trigger its evaluation in an expression position.

As you suggest, this approach requires a macro-generating `show', which
is in some sense a "higher order macro".


For a problem like this, I'd avoid `eval' because the other approach
composes better. Consider a module that imports `show', but also adds
its own compile-time extension `times':

 #lang scheme
 (require "arith.ss")

 (define-for-syntax (times a b) (* a b))

 (show (plus 1 (times 2 3)))

This wouldn't work if "arith.ss" were replaced by "use-arith.ss",
because the `eval' in "arith-ct.ss" wouldn't know where to find the
module that has the `times' binding. The problem is that module is in
the process of being compiled, and so it hasn't been registered for
reflective operations like `eval'.

Your example doesn't seem to involve any bindings like `plus' or
`times', and, offhand, I can't think of another concrete reason that
"arith.ss" is better than "arith-ct.ss" plus "use-arith.ss". Less
concretely, though, the reflection in the latter seems to me more
difficult to reason about. (As it turns out, I correctly predicted that
the `times' example would not work with "use-arith.ss", but I
mispredicted the specific reason.)


I hope I've understood your actual problem well enough that the above
examples are relevant.

Matthew


Entry: Double fork
Date: Mon Jul 20 14:06:51 CEST 2009

The idea is this: whenever you spawn a process from mzscheme using the
``subprocess'' function[1], and the child process dies, a zombie
process is created.

A zombie is essentially a tiny wrapper around the process' exit()
value[2].  Until the mzscheme process calls `subprocess-wait' to
collect this value, the zombie sticks around.

The way this is usually solved in a unix environment is to make sure
that the parent of the fork() call dies _before_ the child.  In that
case the child is inherited by the ``init'' process, which will
collect its return value when it dies, preventing it to become a
zombie.

In most cases however you want the spawning process to stay alive.  To
solve this one can use a ``double fork''.  An intermediate process
(i.e. a `spawner') is created which spawns the child process and dies
so the spawner's parent can collect the spawners error value.


[mzscheme]--fork()-+-------wait()-----------------+--------->
                   |                              |
                   +--cleanup()--fork()-+--exit()-+
                                        |
                                        +--exec()-[child]---> 

Currently it is:

[mzscheme]--fork()-+---------------------------------------->
                   |                              
                   +--cleanup()------------exec()-[child]--->


The code seems to be in plt/src/mzscheme/src/port.c

Original thread[3].

EDIT: problem is now solved by handling SIGCHLD

[1] http://docs.plt-scheme.org/reference/subprocess.html#%28def._%28%28quote._~23~25kernel%29._subprocess%29%29
[2] http://en.wikipedia.org/wiki/Zombie_process
[3] http://list.cs.brown.edu/pipermail/plt-scheme/2008-July/025614.html


Entry: Avoiding macros with 'eval
Date: Wed Jul 22 09:17:34 CEST 2009

Let's abstract the pattern in [1] since it seems quite useful in
general.  What I'm interested in is to trigger evaluation of input
syntax to guide expansion.  The basic form that does what I need is:

(define-syntax (foo stx)
  (syntax-case stx ()
    ((_ e)
     #`(let-syntax ((m
                     (lambda (stx)
                       (let ((v e))  ;; (1)
                         (if (zero? v)
                             #'zero
                             #`e)))))  ;; (2)
         (m)))))

The `foo' form expands to a form that defines a local macro using
`let-syntax' and triggers its expansion.

This macro has the original syntax object, represented by the `e'
syntax variable, occuring in two positions: (1) in an expression
position valid at the macro's run time, and (2) in a quoted form that
will make it into the output syntax of that macro.

The net result is that both a syntax object and its evaluated form are
available to a macro body.  This can be captured in a `let-staged'
form:


(define-syntax (foo stx)
  (define-syntax-rule (let-staged ((n v) ...) body ...)
    #`(let-syntax
          ((m (lambda (stx)
                (let ((n v) ...) body ...))))
        (m)))
  (syntax-case stx ()
    ((_ e)
     (let-staged ((e-eval e))
         (if (zero? e-eval)
             #'zero
             #`e)))))

Basicly, `let-staged' behaves as `let' with the exception that the `v'
forms are made up of syntax that will be evaluated.  This is useful
because these forms can refer to syntax variables in the enclosing
`syntax-case' context.

[1] entry://20090719-142813


Entry: Phase namespace problems
Date: Wed Jul 22 11:21:27 CEST 2009

It's about time I understand these things:

/home/tom/staapl/staapl/tools/stx.ss:125:5: compile: unbound
identifier (and no #%app syntax transformer is bound) at: let-syntax
in: ...

There are two sides of the coin: the _importer_ needs to make sure
that the identifiers it wants to use at different phases are loaded
correctly.  The _exporter_ needs to make sure that all identifiers
referenced at all stages are available at the definition site.

What I forgot was a (require (for-template scheme/base)) at the
definition site, which made the `let-syntax' identifier unknown.

TODO: find a set of rules to debug such statements.  Look here[1] at
the for-meta form.

Turn this into a question: why is a "(require (for-template
scheme/base))" necessary if the use point has the scheme/base already
imported?

[1] http://docs.plt-scheme.org/reference/require.html#%28form._%28%28lib._scheme/base..ss%29._require%29%29


Entry: Parsing mpeg4 files
Date: Thu Jul 30 13:02:20 CEST 2009

see entry://../davinci/20090730-105349


Entry: Trees and paths
Date: Fri Jul 31 12:23:28 CEST 2009

I'm looking for a nice abstraction that combines tree structures
(filesystems, XML, quicktime/mpeg4, ...) with zipper-like iteration.

Frankly I'm confused again by slight nuances between different tree
forms.  The operation I want is mostly path access (single item) or
path globbing (list).

[1] http://en.wikipedia.org/wiki/Xpath
[2] http://en.wikipedia.org/wiki/XQuery
[3] http://en.wikipedia.org/wiki/XSLT


Entry: Direct staging vs. the 'eval trick
Date: Fri Sep  4 12:46:50 CEST 2009

In [1] and related posts I described a trick to express the following
code staging flow:

     stx -+-> value ---\
          |            |
          |            v
          \--------> modified-stx

I.e: the original syntax `stx' is first interpreted to yield `value',
which is then combined with the original `stx' to yield
`modified-stx'.

The reason for doing this was to be able to use Scheme's binding
operations to create a network (the value) which can then be used to
process `stx'.  (For the data flow language DFL).

In the staged constraint propagation language (CPL) I'm taking a more
traditional direct staging approach:

     stx ---> modified-stx

by interpreting `stx' manually: the network is builtup sequentially
from a set of nodes and a set of rules.  

I wonder if there is some more fundamental reason i ``shouldn't'' use
the former approach.  One part of me says the former approach is just
a way to implement the latter (the cross-stage mixing performed using
a ``hygienic eval'' implemented with a staged macro doesn't have any
side-effects).

Anyways..  Unresolved but probably not important.  I have the feeling
this is quite an arbitrary artifact that is a consequence of
implementing some functionality as macros, and other as functions
operating on syntax..

[1] entry://20090722-091734


Entry: Pure hash tables
Date: Sun Sep  6 10:36:12 CEST 2009

I need an updatable (pure) finite function for identifier -> value
mapping to make it easier to add backtracking to a constraint
propagation algorithm.  I'm looking for `hash-update' described in
the PLT reference manual[1].

[1] http://docs.plt-scheme.org/reference/hashtables.html#%28def._%28%28lib._scheme/private/more-scheme..ss%29._hash-update%29%29


Entry: Param -> pure functions using delimited continuations
Date: Sun Sep  6 12:07:52 CEST 2009

For the general principle see [1].  This post is more concerned with
finding a nice abstraction.  Leaving out some steps I arrive at:

(define-syntax-rule (shift/parameters parameters k . expr)
  (let ((ps (list . parameters)))
    (apply (lambda (x . pvs)
             (for ((p ps) (pv pvs)) (p pv))
             x)
           (let ((pvs (for/list ((p ps)) (p))))
             (shift _k
               (let ((k (lambda (x) (_k (cons x pvs)))))
                 expr))))))

This saves all parameters from dynamic to lexical context before
continuation capture and restores them at invokation.


[1] entry://../compsci/20090906-105507


Entry: Prefab Structures
Date: Sat Sep 19 12:28:21 CEST 2009

Apparently it's possible to use structures without defining them
beforehand, using the "#s(" syntax.


Entry: dynamic-data and caches
Date: Tue Sep 29 10:30:19 CEST 2009

There is a need for aggregation of cached objects, whch requires a
logical composition of the `dirty?' flag.


Entry: Hash tables with custom equality test
Date: Thu Oct  8 17:30:18 CEST 2009


  Q: Why do hash tables in PLT scheme do not have a parameterizable
     equality operation?

Answer: there is a `make-custom-hash'[1] function.  However, the
hashing operation depends on the equality operation, so this needs
more than just an equality op.

  Q: Is there a hash in terms of `bound-identifier=?' ?


Entry: Naive Reactive Program
Date: Sat Mar 27 16:08:26 CET 2010

I want to implement a reactive programming system to serve as a
website cache.  I.e. `make' combined with input events.

   - PUSH: do not propagate invalid inputs past invalid nodes; a
           network invariant is that

   - PULL: only compute what is necessary

   Network invariant: a valid node can never depend on an invalid one.


The PULL part is easy: it's essentially a lazy value (delay/force).

The PUSH part is more difficult in practice.  There are 2 problems:

    - reverse the linking to enable the PUSH propagation

    - unlink when a value gets collected (finalizers).


Setting up the dependencies in the first place seems to be the hardest
problem.  Let's focus on that first.

It seems it's best to solve registration at node creation time.  A
node is a computation that is parameterized by a set of nodes.  Let's
curry the procedure so it deals with all non-node (constant)
arguments.  The form then becomes.

    (rp-app fn n1 n2 ...)

This takes a strict function fn and a number of nodes.  The nodes are
evaluated and the struct function is applied to it.

Due to lexical scope, a node does not need an explicit store of its
parents: this is available in the rp-app_ function.

For the rp-register-child! function we need weak references.  It seems
a weak hash table is the easiest way to implement this.

Apparently finalizers aren't necessary: the weak hash table
automatically removes references that are no longer valid.


Entry: RP: race conditions
Date: Sat Mar 27 18:00:37 CET 2010

So, what happens when a pull and a push interfere?  Or can't they?  

It seems they can; i.e. invalidation can take a ``different branch''
compared to a parallel evaluation step.

Suppose the computation needs to evaluate:

        a + b

which both depend on a common value c which is about to be
invalidated, but the invalidation is interleaved with the evaluation
like this:

        force: cache   a  x
        inval: clear   a
        inval: clear   b
        force: compute b  x

then the value a, b seen by the force correspond to different
histories, leading to possibly inconsistent data.

Another problem is that concurrent evaluations might interfere.
I.e. a two threads might start re-evaluations in parallel.

The same goes for concurrent invalidation.


How to solve these problems?

It seems like synchronization on both sides separately isn't much of a
problem: node locking should be enough.  However, for push/pull sync I
don't see how to do this except for locking the whole network.
        

Entry: Resist the urge to use lazy lists
Date: Sun Apr 11 13:12:22 EDT 2010

For day-to-day small dataset conversions, it is probably best to use
strict lists in Scheme.  Lazy lists need some machinery to work well
in Scheme; and it complicates programs.

So, note to self: don't use streams unless the data sets are really
large.  PLT scheme's list comprehensions are probably simpler to use.


Entry: FAM and finalizers
Date: Fri Apr 16 10:54:49 EDT 2010

Problem: I want files wrapped in reactive objects that are GC-able.

Problem: The FAM keeps a reference to all files monitored, but there
is no direct correspondence from wrapped reactive object to filename,
i.e. two distinct ROs can point to the same filename.

Looks like this needs a level of indirection + resource management.

Solution:

  - keep a *paths* function that maps filenames to a set of reactive
    objects.  the set of ROs is a weak hash.


Entry: Finalizers are bad
Date: Fri Apr 16 12:03:05 EDT 2010

Why?  Because they are hard to get right.

However, it is possible to build correct (low-level) abstractions on
top of finalized objects, in case some layer of management is needed.

In general however, it seems that this only works well if the resource
that is managed really behaves like memory; meaning it is not too
scarse.

Otherwise an explicit link is necessary from the depletion event to
`collect-garbage'.


Entry: Files and GC
Date: Fri Apr 16 12:23:05 EDT 2010

With rv-file.ss there is now a nice collection of file interfaces that
make it possible to integrate functional reactive scheme code into a
stateful unix world.

     rv.ss       reactive values & computations
     rv-file.ss  monitored files as RVs
     mfile.ss    managed tempfiles as scheme values

These should now completely replace the dynamic.ss abstraction.  Let's
start changing sweb.

Best approach seems to be to do this in parallel.


Entry: Should rv-force be private?
Date: Fri Apr 16 15:33:04 EDT 2010

I.e. it is not allowed to ever "unpack" a reactive value, except at
the top-level = exit point / end point of the DAG.

I'm not sure how to express that yet, but what I do know is that an
unfold of an RV should be a list of RVs.


Reactive lists?

What I'm really looking for is a way to construct a collection of
computations that all depend on a single value: parse a file into a
list of articles, but make sure that when the file gets updated the
list gets re-computed.

The list can never be represented concretely (it's size is not known).
The best way to access it is abstractly through index functions.


This is an interesting problem.


Let's make it more concrete: how to abstract a list of reactive values
by a finite function that performs a lookup.  The index will be
rebuilt whenever the file changes.  Additionally, each result depends
on the input.

It seems simplest to abstract reactive collections as dictionaries
(finite functions).

(define (rv-lookup rv-dict)
  (lambda (key)
    (let ((node (rv-delay (dict-ref (rv-force rv-dict) key))))
      (rv-register-child! rv-dict node)
      node)))


One problem: the result of a lookup is a freshly computed rv.  Maybe
it's best to add caching here too, so the rvs are shared.

This needs a memo-function abstraction.

Seems to work, but how does this extra persistence behave in the face
of errors?  RV's that where once defined but now trigger errors will
not be collected.  I.e. this doesn't work for transient data.


Entry: Reactive Dataflow & GC
Date: Sun Apr 18 20:21:43 EDT 2010

Is it possible to have intermediate values disappear when all their
dependencies have read the value?

This is PDP! Funny..  Went full circle there ;)

How do the computation thunks reference the reactive values?

(define (rv-apply fn parents)
  (let ((node
         (rv-delay
          (apply fn (map rv-force parents)))))
    (for ((p parents))
      (rv-register-child! p node))
    node))

Esentially through `parents' in the `rv-apply' function.  The thunk is
the expression in the `rv-delay' form.

The right question seems to be: Is it possible to let go of the value
reference to the parent nodes value slot, but keep track of the node
references?  I.e. keep the connectivity alive, but only keep values as
long as they are needed.

A possible answer could be to keep a semaphore as a reference count:
every time a node is created, initialize the semaphore to the number
of clients the node has.  On each read of the node through an
rv-apply, decrement the semaphore.  When zero, delete the node.

It feels a bit ad-hoc to do that though..  Also I can't really see
whether that introduces problems elsewhere..

Edit: it's indeed not that simple.  Supoose we have a value y = a + b,
and a gets invalidated, invalidating y.  When y gets pulled, a is
recomputed, but b is pulled from the cache.  Point being: even if
there is only a single dependency b -> y, the value of b, it might
still be useful to keep b around after y is computed.


Entry: Experience with using the cache abstraction (rv.ss)
Date: Tue Apr 20 15:04:26 EDT 2010

It seems to work quite well, allowing this workflow: 

  1 focus on solving the problem using struct, pure functions without
    worrying about storage. i.e. build a vocabulary.

  2 identify nodes in the computation graph that would benefit from
    cached/lazy operation.

  3 write functions that relate those nodes.

  4 lift the latter functions into the reactive value domain using
    `rv-app'.

  5 use `rv-force' at the toplevel to pull values from the network.


What I miss though after initial Haskell brainwash (how quickly the
world owes me something..), is a type system that tells me whether a
computation is pure or not.  I.e. you can't see what's behind an #<rv>
without applying `rv-force'.

The idea of "lifting" pure computations into the effectful world is a
very powerful one.  A type system that can understand the differences
between pure and lifted values gives some pretty direct feedback while
manipulating code.


Entry: Syntax for cached values
Date: Tue Apr 20 15:19:38 EDT 2010

Something like:

  (lambda/rv (a b (rv: c) d (rv: e) ...)

which then would translate to:

  (lambda (a b c d e)
    (rv-apply (lambda (c e) ...) 
              (list c e)))

Useful for other kinds of value lifting.


Entry: How to collaborate on a PLaneT package?
Date: Sat Oct  2 17:36:41 CEST 2010

I'd like to see if I can help out Dave to make some fixes to his c.plt
package.  He's been so kind as to create an archive at github:

git clone git://github.com/dherman/c.plt

So how can I test this?

Essentially you need to install a local version.  I use this in Staapl
Makefile[1].  Modified a bit but seems to work now.  Is there a way to
do this scaffolding cleaner in scheme?

Anyways, seems Dave did fix some bugs but didn't release them yet.
This doesn't fail any longer:

    typedef int (*fn) (int*);

Nope that's fixed in (3 2).

The following isn't:

    void foo(void *x, void *y) { int i = foo(x, y); }
    void bar(void *x, void *y) { int i = bar(x, y); }


So, let's see, what do I need:

  - Parse files in libprim
  - Pretty-printing compilable C syntax

[1] http://zwizwa.be/darcs/staapl/Makefile.planet


Entry: Release of zwizwa/plt lib?
Date: Sat Oct  2 20:12:40 CEST 2010

Maybe it's time to publish (I'm going to need it for c.plt).  Let's
see what needs to be done first.

  * name: zwizwa/plt is not a good name.  renaming to zwizwa/lib is
    probably best.

  * contents:
      -mfile: wrap tempfiles in variables
      -rv:    reactive values


Entry: Debian ready for Racket?
Date: Tue Dec 28 14:54:17 EST 2010

Hmm.. Maybe not yet.

[1] http://ftp-master.debian.org/new/racket_5.0.2-1.html


Entry: Switched to Racket 5.1
Date: Sun Mar 20 10:18:19 EDT 2011

I'm using the 5.1 branch in [1].  This is still a work in progress:
the 5.1+dfsg1 version isn't ready yet.  To generate it do:

git archive origin/dfsg | gzip -9 > ../racket_5.1+dfsg1.orig.tar.gz

See here[2] for how to use gitpkg.

[1] http://git.debian.org/?p=collab-maint/racket.git
[2] entry://../pool/20110306-122548


Entry: Racket FFI
Date: Sat Mar 26 10:38:43 EDT 2011

I'm having some trouble getting the libusb.ss code back online after
upgrade to Racket 5.1

Basically, I forgot everything ;)

I've reconstructed the following basic example:

#lang scheme/base
(require scheme/foreign)

(unsafe!)

(define-cstruct _a
  ((next _a-pointer/null)))

The thing to note is that recursive data structures always need the
"`-pointer" or "-pointer/null" suffix.


So I wonder, how did this work before?

Oki, I see the problem:

(define-cstruct _usb-device
  ([next         _usb-device-pointer/null]
   [prev         _usb-device-pointer/null]
   [filename     _path-type]
   [bus          _usb-bus-pointer-dummy]
   [descriptor   _usb-device-descriptor]
   [config       (_cpointer _usb-config-descriptor)]
   [dev          _pointer]
   [devnum       _uint8]
   [num_children _uint8]
   [children     (_cpointer _usb-device-pointer)]))

I get the message that the identifier `usb-device-config' is already
defined.  The reason is that `define-cstruct' now defines accessors
for the struct members, and from the "descriptor" field it generates
the clashing name.


Entry: Racket FFI : packed structs
Date: Sat Mar 26 11:15:44 EDT 2011

How does the FFI distinquish between packed and unpacked structs?


Entry: regexp -> pregexp
Date: Tue Nov  1 17:41:21 EDT 2011

Apache log parser broke.  Trouble seems to be related to the use of
`regexp'.  Using `pregexp' at least some things work..

Overall the thing is that this is really too brittle and hard to
debug.  Let's just make something that reads one item at a time in a
more automatic way.  It seems quite straightforward to dispatch on the
first character:

  - "  string
  - [  date
  - ?  space-separated word

It works pretty well with just "read" but as I remembered that was
horribly slow, which was the main reason why I used regexps.  So let's
stick to that decision and find a way to debug better.


What I really want is composable regular expressions with named
variable binding (not position mapping).

Anyways, I fixed the bug in the match string and went on to use the
(test) routine, which filled up memory after 4meg lines, using
apparently about 1k per line.  That's a bit over the top..

Total number of lines is 4080622 which takes a couple of minutes to
parse.  Estimate about 25k lines/second.  That's fairly reasonable.
Now what about hashing it so the main table can be dumped out as a
table of indices so we don't need to let the database do this.

tom@zoo:~/plt/lib/x$ time mzscheme -it apachelog.ss -e '(begin(test)(exit))'


tom@zoo:~/plt/lib/x$ /opt/apache-logs/sorted | mzscheme -t apachelog.ss -e '(test)'
848597 date: (#(struct:exn:fail "find-secs: non-existent date (inputs: 0 17 2 13 3 2011)" #<continuation-mark-set>) #"00" #"17" #"02" #"13" #"Mar" #"2011")
tom@zoo:~/plt/lib/x$ 


Looks like a bug:
(find-seconds 1 17 2 13 3 2011)
find-secs: non-existent date (inputs: 1 17 2 13 3 2011)

Let's see if it's in current snapshot.

Still there.  Sent email to list.  See next post.

EDIT: was a daylight saving thing..


Entry: What happened on March 13 2011?
Date: Wed Nov  2 00:16:57 EDT 2011

Hello,
I found this while parsing some log files:

Welcome to Racket v5.1.3.
> (require racket/date)
> (find-seconds 1 17 2 13 3 2011)
find-secs: non-existent date (inputs: 1 17 2 13 3 2011)

 === context ===
/home/tom/racket/collects/racket/private/misc.rkt:85:7


( Or is it an obscure Grateful Dead reference?
  http://en.wikipedia.org/wiki/March_13 ;)


Entry: Status
Date: Wed Nov  2 10:41:34 EDT 2011

Got parsing working.  Parses 700 megs worth of logs (4M lines) in a
couple of minutes.  Got sharing working also, feeding the logs in as a
generator instead of a list.

It's still getting quite large even with sharing, so I wonder if
that's actually working as it should.  Maybe hashes are using eq?
instead of equal?

438016 -> 294m virtual


(hash-equal? (make-hash))  => #t

Doesn't look like it...  It does seem to flatten a bit:

Lines  -> virtual

 438016 -> 294m
 808300 -> 446m
 970609 -> 542m
1473112 -> 736m
1859149 -> 880m
2420298 -> 1180m

Seems to be +- 50% memory savings, which is not as much as I hoped.
It also seriously slows down at this point.

I see it also stores backreferences which maybe are not necessary..

Anyways, the basic idea does seem to work.  Let's try it on a subset
of files and get it to spit out an SQL database.  There are 2 roads:

- put it in a standard MySQL / SQLite database and use SQL queries

- keep everything in memory and write a small query language in Scheme


Entry: Hashing the strings?
Date: Wed Nov  2 11:01:49 EDT 2011

Given that the actual string values are not so interesting by
themselves, what about using a hash that has a relatively low
collision rate, and perform the inverse lookup only when necessary?

This would avoid keeping track of the large strings which are the bulk
of the memory.

What about just interning them as symbols?  Right there is a hashing
mechanism that's already quite useful..

Anyways.. Let's just focus on making it work for a smaller dataset,
then running it on the big one.  Once it's converted, it can be
updated in-place.


Entry: Next: apachelog
Date: Wed Nov  2 15:57:14 EDT 2011

* Persistence: most important requirement.  The parsing step needs to
  be cached.

* Query hacks & representation.  Do I really care?  It's probably best
  to get the damn data into a simple database and play with that a
  bit.


Next: replace


Entry: Table size
Date: Wed Nov  2 16:18:09 EDT 2011

reading 1/16 of the data set I get the following numbers:

rows:   219121

ip:       5901
date:   193871
req:     69166
status:     14
ref:      2228
client:   2466

So apart from date it makes sense to collect the value of the fields
in separate tables as there are indeed quite some dupes.

I wonder why req is so high though..

I also wonder if it's not easier to just pipe this straight to mysql
as SQL syntax and be done with it.  Let it do the ID generation[1].

[1] http://dev.mysql.com/doc/refman/5.0/en/example-auto-increment.html


Entry: Streaming
Date: Thu Nov  3 11:59:31 EDT 2011

Something that's important probably is to stream out the data as soon
as it is available, istead of dumping out the whole hash at the end.


Entry: TODO
Date: Thu Nov  3 12:47:40 EDT 2011

- dump main table
- sanitize strings  (pff...)
- don't keep so much state
- stream everything?


Entry: Eliminate "share.ss" hash stuff
Date: Thu Nov  3 13:49:30 EDT 2011

Might be simpler to do it in a more straightforward way.

Done.  Got it to stream now.  Worked around the sanitizing so it's not
100% correct.

It looks like mysql can keep up just fine.  The main hog is racket.

22844 tom       20   0 89180  60m 3784 R   98  1.1   0:43.41 racket                                            
12044 mysql     20   0  229m  28m 6360 S   18  0.5   1:24.60 mysqld                                            
22848 root      20   0 33604 2556 1988 S    7  0.0   0:03.05 mysql  

The fully streamed version:

$ time (/opt/apache-logs/sorted | racket ~/plt/lib/x/apachelog-dump.ss | tee /tmp/access.sql | sudo mysql -D apachelogs)


Entry: TODO apachelogs
Date: Thu Nov  3 14:54:39 EDT 2011

- fix date: currently still hashed

- fix the string parser: currently not correct as it doesn't parse
  escapes correctly.  see [1]  \"(\\.|[^\"])*\"

[1] http://stackoverflow.com/questions/249791/regex-for-quoted-string-with-escaping-quotes

Entry: Parsing strings
Date: Thu Nov  3 17:30:17 EDT 2011

All these hacking attempts are really hard to parse!

This one doesn't work for the string matcher:


"giebrok.zwizwa.be:80 83.101.57.157 - - [26/Aug/2011:00:18:08 +0200] \"GET /WebID/IISWebAgentIF.dll?postdata=\\\"><script>foo</script> HTTP/1.1\" 302 417 \"-\" \"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)\""

Or in unquoted form:
giebrok.zwizwa.be:80 83.101.57.157 - - [26/Aug/2011:00:18:08 +0200] "GET /WebID/IISWebAgentIF.dll?postdata=\"><script>foo</script> HTTP/1.1" 302 417 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)"

(define q "\"((\\.|[^\"])*)\"")  ;; quoted string (quotes not included)

Looks like that regexp is wrong, some quoting is missing:

(define q "\"((\\.|[^\\\"])*)\"") 

Another problem: looks like this one is corrupt:

zwizwa.fartit.com:80 81.52.143.34 - - [24/Nov/2010:22:44:06 +0100] "GET zwizwa.fartit.com:80 208.115.111.244 - - [25/Nov/2010:07:05:50 +0100] "GET /robots.txt HTTP/1.1" 302 360 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.dotnetdotcom.org/, crawler@dotnetdotcom.org)"

probably a cut-off write from:

zwizwa.fartit.com:80 81.52.143.34 - - [24/Nov/2010:22:44:06 +0100] "GET <CUTOFF>
zwizwa.fartit.com:80 208.115.111.244 - - [25/Nov/2010:07:05:50 +0100] "GET /robots.txt HTTP/1.1" 302 360 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.dotnetdotcom.org/, crawler@dotnetdotcom.org)"

This parses but it shouldn't.  How to fix that?

INSERT INTO req (id_req, req) values (0, "GET zwizwa.fartit.com:80 208.115.111.244 - - [25/Nov/2010:07:05:50 +0100] "GET /robots.txt HTTP/1.1");


Entry: still crashing
Date: Thu Nov  3 19:28:06 EDT 2011

tom@zoo:~/plt$ time (/opt/apache-logs/sorted | racket ~/plt/lib/x/apachelog-dump.ss | tee /tmp/access.sql | sudo mysql -D apachelogs)
1145249 ERROR 1064 (42000) at line 2190300: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'INTO entry (id_agent, id_date, id_req, id_ip, id_stat, id_ref) values (1650, 897' at line 2
1145308 error writing to stream port (Broken pipe; errno=32)

real    9m43.920s
user    8m43.313s
sys     1m26.421s
tom@zoo:~/plt$ 

Offending line:

INSERT INTO date (id_date, date) values (897508, #f);

It seems to be due to this one:


WARNING: can't handle date (4350104328545217515939634383792078420781148110824235358325269992011008006050 44 22 24 11 2010 3600)

Which I assume is this one, another botched line:

213.133.113.86 - - [24/Nov/2010:22:59:35 +0100] "GET / HTTP/1.1" 200 497 "-" "Hetzner System Monitorin213.133.113.84 - - [25/Nov/2010:03:05:53 +0100] "GET / HTTP/1.1" 200 497 "-" "Hetzner System Monitoring"

Maybe i should start anchoring lines and throw out matches that don't work that way..

It runs to completion.

$ time (/opt/apache-logs/sorted | racket ~/plt/lib/x/apachelog-dump.ss | tee /tmp/access.sql | sudo mysql -D apachelogs)
1145249 WARNING: can't handle date (4350104328545217515939634383792078420781148110824235358325269992011008006050 44 22 24 11 2010 3600)
4080617 ^[[C
real    49m46.360s
user    45m55.024s
sys     5m20.604s

Only 63981 unique IPs.  That's a surprise.

So I fixed up the date format to generate MySQL syntax.

That makes it a lot faster too!

$ time (/opt/apache-logs/sorted | racket ~/plt/lib/x/apachelog-dump.ss | tee /tmp/access.sql | sudo mysql -D apachelogs)
4080617 
real    11m7.764s
user    11m25.983s
sys     0m48.835s


Entry: Next: PLT MySQL bindings
Date: Fri Nov  4 19:01:31 EDT 2011

Next is to get at the data.
First set perms on the database:

GRANT ALL PRIVILEGES ON apachelogs.* TO <user> IDENTIFIED BY <pass> WITH GRANT OPTION;

Maybe look here[1] to work without passwords using GPG auth.`

[1] http://dev.mysql.com/doc/refman/5.0/en/checking-rpm-signature.html


Entry: Writing opendocument spreadsheets
Date: Thu Feb 16 19:17:53 EST 2012

While it's not too hard to write CSV (it's not too easy either) that
still leaves some manual steps at import, like selecting separators,
column types, date formats etc..

Isn't there a simple way to have typed columns, and put them in a
spreadsheet?  I guess that's called SQL, but it's not 100% standard
either..

Anyways.  I'm happy I got my data into racket by mining a GnuCash XML
file.  Now the next step is to put it into a spreadsheet.

So I'm wondering about open document format[1] ods spreadsheets.

There's content.xml in the zip which seems straightforward, but I
don't really want to wade through all the rest.  Is there a simpler
format that has typed columns?  I don't need so many bells and
whistles..


[1] http://en.wikipedia.org/wiki/OpenDocument
[2] http://www.openoffice.org/xml/general.html
[3] http://stackoverflow.com/search?q=ods+xml&submit=search


Entry: syntax-case
Date: Fri Feb 22 13:44:42 CET 2013

Apparently, it's not a good idea to mix literals and macros.  I.e. if
`loop' is defined as syntax, the following doesn't work properly:

(syntax-case stx (loop)
  ((loop a) #'a))


Entry: racklog
Date: Sat Feb 23 19:28:50 CET 2013

Trying out racklog.

(%which (x)
   (%= (list x 1) '(0 1)))

The problem I have is evaluating a number of constraints over a set of
nodes.  So what I have is a list of:

(nodes, node-predicate)

The outcome should be a binding of all the nodes to the values
mentioned in the predicates through %==.

The problem I have seems to be a level problem:
- I have a list of nodes, and a list of predicates (or meta: functions that generate predicates).
- The which form takes identifier syntax, not nodes

Does this need eval or a macro?

In other words: the number of variables in my query is problem-dependent.


[1] http://docs.racket-lang.org/racklog/unification.html


Entry: geiser
Date: Tue Apr  9 13:18:21 EDT 2013

Trying out geiser[1].

Some nice things:
- ,enter : switch namespaces modes in repl
- geiser-mode: identifiers are annotated in the minibuffer
- C-c C-e RET : edit module


[1] http://www.nongnu.org/geiser


Entry: enter! fix
Date: Wed May  8 09:40:27 EDT 2013

This needs a nightly build.
Works ok with plt-5.3.4.7-bin-x86_64-linux-debian-squeeze.sh 


Entry: Shorter compile cycle
Date: Wed May  8 09:41:04 EDT 2013

I need this:
- reload module
- execute command

Bound to 1 emacs key, or possibly automatically whenever something
changes.

How to send command to repl?


Entry: raco pkg git repository
Date: Sat Sep 28 09:27:18 EDT 2013

Some questions
- how do import names and git repository structurs correspond?
- where to find a list of libraries?


It seems that the info.rkt goes into a subdirectory?

So a git repository contains a subdirectory with the import name, and
that directory contains the info.rkt ?

see:
http://docs.racket-lang.org/pkg/how-to-create.html


Entry: Edit a database in a browser
Date: Mon Dec 16 12:26:09 EST 2013

I'd like to solve a simple problem that I've been struggling with for
a long time.  Provide a UI for data base editing that is better than a
plain text interface + CSV files.

For my purpose, a database is an _optimization_.  I.e. something to
run read-only queries on.  All the data is easier to maintain if it is
placed in a simpler form such as CSV.


Entry: RacketCon
Date: Tue Dec 17 22:21:28 EST 2013

http://con.racket-lang.org/2013/


Entry: Racket Package System - Jay McCarthy
Date: Wed Dec 18 01:00:01 EST 2013

- no backwards incompatible changes.  break API = create new package
- breaking includes changing documented behaviour
- when the version number goes up, you can add stuff
- we're exposing what the core does, to everybody
- it's possible to binary-compile packages (see also build-deps)
- it's possible to maintain compatibility when breaking up a package
- it's possible to undo mistakes by reverting
- there's a catalog[2]

[1] https://www.youtube.com/watch?v=jnPf6S0_6Xw
[2] http://pkg.racket-lang.org


Entry: Racket Database Connectivity
Date: Mon Dec 23 10:30:43 EST 2013

Looks like MySQL wire protocol is now supported.

It does not try to capture query syntax, instead it uses parameterized
queries that can be "prepared".  The main point is to get relations
out of the db, as lists/iterators of vectors and embedding the values
in scheme.


[1] http://docs.racket-lang.org/db/


Entry: Parsing HTML
Date: Tue Dec 31 12:35:06 EST 2013

Which is better.  html or xml ?

(require html)
(define x (read-html-as-xml (open-input-file "...")))
(define h (read-html (open-input-file "...")))

I want to look for an element that matches this:
   <table id="<ID>">

How to query XML in racket?

http://docs.racket-lang.org/xml/#%28part._.Simple_.X-expression_.Path_.Queries%29

(se-path*/list '(table) x)


Ok this is a bit of a mess.
xml, html, xexpr


What's the problem?  The file is not xhtml, so doesn't seem to be
easily handled in racket.  I.e. I'll need to do manual traversal on
the html datastructure.

How to do better?
- use external tool to convert to xhtml, then use the xml / xexpr tools in racket
- use the racket html datastructure anyway


-> See next post

Entry: Neil's html-parsing lib
Date: Tue Dec 31 18:57:32 EST 2013

(require (planet neil/html-parsing:2:0))
(define x (html->xexp (open-input-file "...")))


http://planet.racket-lang.org/display.ss?package=html-parsing.plt&owner=neil


Noticed a difference between

(td (@ (class "gh-td"))  ...)

as produced by html->execp and 

(td ((class "gh-td"))  ...)

as used for other Racket x-expr tools.

What is this?

Ok, these are not the same!  Two versions:
- Racket's xexpr
- Oleg Kiselyov's SXML


http://lists.racket-lang.org/users/archive/2011-February/044456.html

So the xml/path functions are not compatible with SXML.
more here:
http://www.neilvandyke.org/racket-xexp/

sxml stuff is nice!