[xml
tom@zwizwa.be**20140103204150
Ignore-this: 33e19eef65c22575da82846fed948db9
] hunk ./plt.txt 1926
+
+Entry: Parsing HTML
+Date: Tue Dec 31 12:35:06 EST 2013
+
+Which is better. html or xml ?
+
+(require html)
+(define x (read-html-as-xml (open-input-file "...")))
+(define h (read-html (open-input-file "...")))
+
+I want to look for an element that matches this:
+
+
+How to query XML in racket?
+
+http://docs.racket-lang.org/xml/#%28part._.Simple_.X-expression_.Path_.Queries%29
+
+(se-path*/list '(table) x)
+
+
+Ok this is a bit of a mess.
+xml, html, xexpr
+
+
+What's the problem? The file is not xhtml, so doesn't seem to be
+easily handled in racket. I.e. I'll need to do manual traversal on
+the html datastructure.
+
+How to do better?
+- use external tool to convert to xhtml, then use the xml / xexpr tools in racket
+- use the racket html datastructure anyway
+
+
+-> See next post
+
+Entry: Neil's html-parsing lib
+Date: Tue Dec 31 18:57:32 EST 2013
+
+(require (planet neil/html-parsing:2:0))
+(define x (html->xexp (open-input-file "...")))
+
+
+http://planet.racket-lang.org/display.ss?package=html-parsing.plt&owner=neil
+
+
+Noticed a difference between
+
+(td (@ (class "gh-td")) ...)
+
+as produced by html->execp and
+
+(td ((class "gh-td")) ...)
+
+as used for other Racket x-expr tools.
+
+What is this?
+
+Ok, these are not the same! Two versions:
+- Racket's xexpr
+- Oleg Kiselyov's SXML
+
+
+
+http://lists.racket-lang.org/users/archive/2011-February/044456.html
+
+So the xml/path functions are not compatible with SXML.
+more here:
+http://www.neilvandyke.org/racket-xexp/
+
+sxml stuff is nice!
+
+