Wed Nov 2 10:41:34 EDT 2011


Got parsing working.  Parses 700 megs worth of logs (4M lines) in a
couple of minutes.  Got sharing working also, feeding the logs in as a
generator instead of a list.

It's still getting quite large even with sharing, so I wonder if
that's actually working as it should.  Maybe hashes are using eq?
instead of equal?

438016 -> 294m virtual

(hash-equal? (make-hash))  => #t

Doesn't look like it...  It does seem to flatten a bit:

Lines  -> virtual

 438016 -> 294m
 808300 -> 446m
 970609 -> 542m
1473112 -> 736m
1859149 -> 880m
2420298 -> 1180m

Seems to be +- 50% memory savings, which is not as much as I hoped.
It also seriously slows down at this point.

I see it also stores backreferences which maybe are not necessary..

Anyways, the basic idea does seem to work.  Let's try it on a subset
of files and get it to spit out an SQL database.  There are 2 roads:

- put it in a standard MySQL / SQLite database and use SQL queries

- keep everything in memory and write a small query language in Scheme