Thu Oct 1 13:10:36 CEST 2009

Apache log mining

I'd like to get this stuff under control.

  1. automatically gather logs

Currently I use an MD5-index archive to make rsync-based mirror from
server->local a bit easier.  This needs to be fine-tuned such that the
``big pool of data'' will only grow.

  2. write indexing in PLT scheme

I have a parser that's relatively fast.  However, since the data is
constant, index files could be cached.

  3. make a query language

Some preprocessing steps are necessary to remove junk.  Bots comprise
the bulk of the requests.