Wed Nov 2 10:41:34 EDT 2011
Got parsing working. Parses 700 megs worth of logs (4M lines) in a
couple of minutes. Got sharing working also, feeding the logs in as a
generator instead of a list.
It's still getting quite large even with sharing, so I wonder if
that's actually working as it should. Maybe hashes are using eq?
instead of equal?
438016 -> 294m virtual
(hash-equal? (make-hash)) => #t
Doesn't look like it... It does seem to flatten a bit:
Lines -> virtual
438016 -> 294m
808300 -> 446m
970609 -> 542m
1473112 -> 736m
1859149 -> 880m
2420298 -> 1180m
Seems to be +- 50% memory savings, which is not as much as I hoped.
It also seriously slows down at this point.
I see it also stores backreferences which maybe are not necessary..
Anyways, the basic idea does seem to work. Let's try it on a subset
of files and get it to spit out an SQL database. There are 2 roads:
- put it in a standard MySQL / SQLite database and use SQL queries
- keep everything in memory and write a small query language in Scheme