Thu Oct 1 13:10:36 CEST 2009
Apache log mining
I'd like to get this stuff under control.
1. automatically gather logs
Currently I use an MD5-index archive to make rsync-based mirror from
server->local a bit easier. This needs to be fine-tuned such that the
``big pool of data'' will only grow.
2. write indexing in PLT scheme
I have a parser that's relatively fast. However, since the data is
constant, index files could be cached.
3. make a query language
Some preprocessing steps are necessary to remove junk. Bots comprise
the bulk of the requests.