[<<][plt][>>][..]
Thu Nov 3 17:30:17 EDT 2011

Parsing strings

All these hacking attempts are really hard to parse!

This one doesn't work for the string matcher:


"giebrok.zwizwa.be:80 83.101.57.157 - - [26/Aug/2011:00:18:08 +0200] \"GET /WebID/IISWebAgentIF.dll?postdata=\\\"><script>foo</script> HTTP/1.1\" 302 417 \"-\" \"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)\""

Or in unquoted form:
giebrok.zwizwa.be:80 83.101.57.157 - - [26/Aug/2011:00:18:08 +0200] "GET /WebID/IISWebAgentIF.dll?postdata=\"><script>foo</script> HTTP/1.1" 302 417 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)"

(define q "\"((\\.|[^\"])*)\"")  ;; quoted string (quotes not included)

Looks like that regexp is wrong, some quoting is missing:

(define q "\"((\\.|[^\\\"])*)\"") 

Another problem: looks like this one is corrupt:

zwizwa.fartit.com:80 81.52.143.34 - - [24/Nov/2010:22:44:06 +0100] "GET zwizwa.fartit.com:80 208.115.111.244 - - [25/Nov/2010:07:05:50 +0100] "GET /robots.txt HTTP/1.1" 302 360 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.dotnetdotcom.org/, crawler@dotnetdotcom.org)"

probably a cut-off write from:

zwizwa.fartit.com:80 81.52.143.34 - - [24/Nov/2010:22:44:06 +0100] "GET <CUTOFF>
zwizwa.fartit.com:80 208.115.111.244 - - [25/Nov/2010:07:05:50 +0100] "GET /robots.txt HTTP/1.1" 302 360 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.dotnetdotcom.org/, crawler@dotnetdotcom.org)"

This parses but it shouldn't.  How to fix that?

INSERT INTO req (id_req, req) values (0, "GET zwizwa.fartit.com:80 208.115.111.244 - - [25/Nov/2010:07:05:50 +0100] "GET /robots.txt HTTP/1.1");




[Reply][About]
[<<][plt][>>][..]