Experiments with the PLT Scheme webserver.

The stable part of this code supports the http://zwizwa.be/ramblings
log file formatter, used for most of my code projects.  It is based on
a simple graph based document model with lazy parsing.

Additionally, I'm thinking about and experimenting with a "PHP for
Scheme" model for low-level parametric page generation, memory
image based web application development (without database, only using
store for snapshot backups) and what the role of objects are in a
continuation based web server without database.

Darcs archive: http://zwizwa.be/darcs/sweb

Entry: fun with the plt web server
Date: Sat Sep  8 14:59:18 CEST 2007

the stuff i got running now is a website generated from a single xml
file with content. this works pretty well, but doesn't really require
any dynamic stuff.

next: ramblings.txt parser this uses streams.ss, a copy paste from the
brood project.


Entry: ramblings
Date: Wed Sep 12 01:47:46 CEST 2007

parsing a ramblings file seems to be pretty straightforward. need to
fix some small organization problems for 'wikifying' my syntax. i
guess i've been able to avoid most parsing problems using lisp and
forth.. time to read some theory.

anyways. one important node for indexing is to make the article url
stable, such that if i edit the original ramblings file, all
references stay unless the article is deleted.

the only thing that's reliable is the date. since i'm using standard
'date' output, this can be parsed back into some simpler date rep.

just was watching
http://video.google.com/videoplay?docid=2159021324062223592

and one of the remarks david weinberger is making is that it is simply
too expensive to delete everything. he also mentioned some
'meta-wikipedia' where experts link to a subset of frozen articles
that they can endorse.

made me thing of these ramblings files. instead of cleaning them up, i
can just make a decent article here and there and archive the
lot. then if i want to link to some historic thought, i just
can.. that's why i do need a proper indexing system. the exact dates
are probably good enough. i don't even need a project name.

in fact, i can index everything i run into by standard date
encoding.. maybe that's the 'pool' i've been thinking about?


Entry: ramblings ok
Date: Sun Sep 16 14:26:55 CEST 2007

i think i got most of it working now, based on
- simple http:// link highlighting
- indexing through date string (using ad hoc parser for 'date' cmd output)
- nodes stored in a hash in memory: left-right linked + indexed.

looks +- elegant, except for the use of some global vars.

next:
- put this online, maybe add another level of abstraction: per project ramblings
- make a parser for pawfaliki format
- create new site about purrr docs


remember: the idea is to have a STABLE link structure. i prefer never
to change the linking, which will be:

http://zwizwa.be/<servlet-name>/....


Entry: scheme!
Date: Mon Sep 17 14:49:32 CEST 2007

the funny thing is that now i'm used to pure functions, working with
mutation looks terribly dirty. i run into this with the web server's
data structure. building it seems strange: explicitly dealing with
inconsistent intermediate states is something you don't really
miss. and persistence is so convenient..

the most conventient structure for me is a hierarchical structure of
hash tables (file system), which is further cross-referenced according
to need. so:

- hierarchical tree -> everything is reachable
- cross referenced graph -> some things can be accessed fast

this brings me to the following: why not stick with functional
programming? the reason is of course that functional data structures
are not trivial once you go beyond simple lists.

but. is it really so hard to have an efficient hash? and, given the
problem, isn't a mere association list enough? 

let's go back to a simplistic view. the things i need to aim for are
concurrency and persistence. both are a lot easier when functional
data structures are used. so the sloganesk view:

- functional data structures for concurrency and persistence
  (data includes code here.. what is data other than dumb code?)

- all mutation is hidden in caching mechanisms or declarative
  abstractions.


Entry: real parser
Date: Thu Sep 20 23:12:27 CEST 2007

first thing to find out is that an ad-hoc syntax likely requires
separate parser/lexers. for example: i'm using a header/body
separation in ramblings.txt which have completely different lexemes.

so... i can leave the preprocessor as is.

it sort of works, but i get some errors i don't understand. time to
call it a day.


Entry: pawfaliki
Date: Fri Sep 21 15:53:06 CEST 2007

it looks like the wiki grammar is regular, so i could do with a lexer
only. i need to be careful though not to turn this into a big pile of
hacks.. it's already a small pile.

about parsing: i've been largely able to ignore this because

1. forth is regular
2. all other ad-hoc things i've used were also regular
3. the only other thing i use is s-expressions (with a simple recursive decent parser)

need to read a bit more about LR grammers. currently i don't
understand how to resovle the shift/reduce conflict in the ramblings
parser.


Entry: graphs
Date: Mon Sep 24 16:27:20 CEST 2007

plt scheme has a graph.ss library. there is some talk about it on the
plt list:

http://groups.google.com/group/plt-scheme/browse_thread/thread/391b57f756c75678

hmm.. it's mrlib.


Entry: a new wiki standard
Date: Fri Sep 28 14:01:20 CEST 2007

basicly what i want is just one format to concentrate on for
development of high--quality texts. i already have that, and it's
called latex.

especially together with tex2page, which includes scheme as a
meta--langage, there the sky is the limit. 


Entry: web server debug
Date: Sat Oct  6 17:28:15 CEST 2007

i need to change some things so debugging becomes easier.

- data structure reload without explicit sync (plt file date)


Entry: parsing
Date: Sun Oct  7 13:51:40 CEST 2007

hmm.. it's quite a mess, because the syntax is so ad-hoc. the thing is
i need several lexers and parsers, instead of one.


Entry: speedup
Date: Sun Jan 27 17:01:40 CET 2008

i don't know exactly what the problem is, but it's REALLY slow. it
looks like parsing the articles can better be postponed until when
they are actually accessed.

fixed: adding a delay around the text->xexpr translation does the
trick. the index loads pretty fast now.

TODO: hide this delay in object accessors.


Entry: forth as markup
Date: Wed Jan 30 11:41:28 CET 2008

hmm... in the ramblings body parser i had made a reservation for the (
and ) characters to be able to handle s-expressions. but, given that
it's text, it might be a lot easier to use a forth-style approach:
words separated by spaces, most words are just text, but some reserved
words are used for formatting..


Entry: @chunk broken
Date: Thu Jan 31 14:40:49 CET 2008

i took that out of brood.. so i need to find a replacement. the reason
for taking out @chunk in brood was the move to delimited parsing,
which means no "put-back" or no "peek".


Entry: move to plt v4
Date: Tue Feb 19 16:33:00 CET 2008

EDIT: just changing plt-web-server-text to plt-web-server in the
sweb/start script did the trick. i don't know what ghosts i saw last
week that made it fail..

some things changed in the implementation of the webserver: the old
sweb version doesn't work any more. a good occasion to fix some
things.

what i'd like to have is a repl attached to the server to inspect data
structures while it's running.

but first, start with a working server that's started entirely from a
script. there's /start.ss as a first attempt.

 (enter! "start.ss")

loads the start module when in a repl. the first questions: 

  * how to create a servlet dispatcher?
  
  * what to do with the config dir? i.e. the ordinary file server?


i'm reading a bit on the history of PLT scheme, and it looks like it's
probably best i switch to learning mode. get it to know a bit better..

after briefly reading about delimited continuations and mixins, i do
wonder how to use them to implement the current graph-based web-app
structure a bit better. continuations OR objects, or continuations AND
objects?

time to get practical though.. not ready for theory atm.


the procedure -> dispatch lifter
http://pre.plt-scheme.org/docs/html/web-server/dispatch-lift_ss.html

looks like that's what i need if i don't want servlets.


Entry: plt 4 problems
Date: Wed Feb 20 14:25:59 CET 2008

tom@giebrok:~/sweb$ ./start 
/usr/local/bin/plt-web-server -p 8181 -f configuration-table
"../web-config-unit.ss" broke the contract 
  (->
   path-string?
   #:listen-ip
   (or/c false/c string?)
   #:make-servlet-namespace
   (->*
    ()
    (#:additional-specs (listof any/c))
    namespace?)
   #:port
   (or/c false/c number?)
   unit?)
on configuration-table->web-config@; expected a procedure that accepts 1 ordinary argument and the mandatory keywords #:listen-ip #:make-servlet-namespace #:port, given: #<procedure:configuration-table->web-config@>

 === context ===
/usr/local/mz-3.99.0.12/collects/scheme/private/contract-guts.ss:200:0: raise-contract-error


on zzz with plt-3.99.0.12 it doesn't give this error.
maybe plt vs. mz packaging?
i sent a mail to the plt list.

Entry: /last servlet
Date: Wed Feb 20 18:27:32 CET 2008

http://zwizwa.be/last/<file> is a redirect point for cached
dynamically generated files, such ad .pdf files from .tex documents,
and tarballs from the darcs source projects.

the main goal is to have a fixed entry point (service) that provides
these documents.

ideally, this would be demand-driven, but probably best to just use
cron jobs now.

next action: figure out how to redirect in scheme.
http://pre.plt-scheme.org/docs/html/web-server/response-structs_ss.html


Entry: roll your own stream processing library
Date: Wed Feb 20 20:28:39 CET 2008

i'm wondering if this is actually a good idea.. in brood, it's a core
component (syntax streams), but maybe it's better to use something
standard?

the reason is: i was apparently the first one running into a bug with
SRFI-45.. so what's the popular solution? is there a canonical one +
operations on those streams??


Entry: sweb contains data
Date: Thu Feb 21 16:13:40 CET 2008

the idea was to separate data from functionality, but i guess there's
no point, since funcitonality is so specialized. what might happen
later is that i spin off some scheme web facade library for other
projects (including the stream stuff..) but that's not for now.

  CONVENTION: data used for dynamic content specific to the site sits
              in the db/ directory


Entry: darcs meta?
Date: Thu Feb 21 16:15:56 CET 2008

what about serving the darcs archives through sweb, and providing some
standardized frontend + making all projects comply to a standard
building process?


Entry: separating out library code
Date: Thu Feb 21 17:07:23 CET 2008

instead of copy-pasting or ad-hoc linking of scheme files between
different projects it might be better to start working on my
"personal" scheme code library, merely as a practical way of
organizing different projects.

atm, there is:

     [BROOD] -> [LIB] <- [SWEB]

let's call it zwizwa-plt and have it live in collects/


Entry: web server debugging
Date: Thu Feb 21 18:30:47 CET 2008

http://groups.google.com/group/plt-scheme/browse_thread/thread/8cab8d9142780cb9


Entry: classes in sweb
Date: Fri Feb 22 11:02:56 CET 2008

what about moving the 'graph' thing to a class? if i'm to use objects,
a page should clearly be an object.


Entry: zwizwa cache cleanup
Date: Fri Feb 22 11:05:22 CET 2008

let's try to move to automate the generation of cached objects:
  * papers
  * darcs src + bin tarballs

first, the papers. what needs to be done is:
  1. check consistency of document
  2. if not consistent, initiate a compilation (on different host!)
  3. retreive the result

security-wise, the webhost can only send messages to the compilation
host. the compilation host responds to this by placing a file in a
cache directory (or exporting one?)

toplevel archive organization. is the builder part of sweb? yes. so
it's best to put it in one archive. let's standardize the builder as a
Makefile. builder takes archives from giebrok.

the builder runs parameterized with the source host's name, fix it in
the makefile for now.


Entry: caches
Date: Tue Feb 26 17:08:06 CET 2008

the thing about call-by-need and makefiles is probably best seen as a
filesystem based object cache. i already have something like that:
cached-file-object

let's put it in a separate module, since it might be useful.

so.. this is not completely the same as 'make': since Makefiles have
rules that are opaque, the only way to figure out wether something
needs to be recompiled, is to re-run make.

what might be interesting, is a sort of rate-limited make: call make
only if last pull was >> than the time it takes to run make.

it looks like these things are better solved in the archives
themselves. an 'update' target in the toplevel makefile should perform
all the necessary operations to sync from central darcs and
recompile. also generally useful.

ok.. i separate it out in 2 targets:

 update: sync with authoritative zwizwa.be archive
 all:    build everything

both are polling operations. one wonders wether they can be made
synchronous. 

another problem that pops up is for latex: building latex requires
multiple iterations per file to ensure convergence. how to do that
automatically? -> fixed (see latex-bibtex in darcs/papers)


Entry: ssh + scheme?
Date: Tue Feb 26 20:42:35 CET 2008

maybe it's time to write a dispatcher for ssh logins. rpc over
SSH. this can then be used to send messages between remote hosts over
secure but limited channels.

this needs only parsing of arguments + environment variables:

  getenv putenv current-command-line-arguments

ok. it's pretty simple. making sure to use the right identity file:

 ssh -i <identityfile> -o "IdentitiesOnly yes" <host> <command>

.ssh/authorized_keys can be configured to dispatch the identity to a
program which interprets the environment variable
SSH_ORIGINAL_COMMAND, a string without zero bytes. so glueing together
scheme programs over ssh is really simple.

this basicly solves the problem of building stuff on klimop: only
allow a single command (i.e. 'build') that produces a tar file with
the result. no other functionality is needed, other than the trust in
klimop.


Entry: no objects?
Date: Wed Feb 27 16:52:50 CET 2008

i cleaned up the code a bit, and hid the forcing/cache triggering in
the node/graph object. now it looks reasonably nice, with the main
operations on the data struture being 'n!' and 'n@'. adding nodes is
possible using 'delay' or 'dynamic'. (the latter produces just a
tagged thunk). all promises/thunks are forced whenever a node is
accessed.

maybe this graph is good enough as data structure? sweb is just a
linked frontend to a couple of dynamic or cached pages. in
fact. that's really most of an object structure, where object methods
do not take parameters. (a 'button' object).


Entry: continuation experiments
Date: Wed Feb 27 19:51:57 CET 2008

/usr/local/bin/plt-web-server -p 8181 -f configuration-table
Servlet (@ /servlets/ct) exception:
"current-continuation-marks: no corresponding prompt in the continuation: #<continuation-prompt-tag:web>"

 === context ===
/usr/local/mz-3.99.0.13/collects/web-server/lang/abort-resume.ss:132:0: send/suspend
...r/private/servlet.ss:32:19
...r/private/servlet.ss:32:19


i suspect this has to do with the expiration based manager, see the
note here:

http://pre.plt-scheme.org/docs/html/web-server/none_ss.html#(def~20((collects~20web-server~20managers~20none..ss)~20create-none-manager))

  "if you are considering using this manager, also consider using the
   Web Language. (See Web Language Servlets.)"

that's not it. it looks as if the default loader doesn't do web
language servlets..

ha, i got something working here. after looking at the source
mz/collects/web-server/dispatchers/dispatch-lang.ss i found that a
path contains 2 values: a path and a search path list, so i made my
fake path generator generate an empty list as 2nd value. now the
dispatcher works.

apparently web-serverl/lang/stuff-url.ss stores the continuation in a
file in ~/.urls

looks like this stuff is still quite experimental.. it does work,
though i'm not sure why these continuations need extra data. thought
there was no server state?


Entry: scheme/control
Date: Thu Feb 28 12:24:14 CET 2008

using 'prompt' and 'abort' for 404 escapes. currently i use call/ec +
a parameter. using a specific prompt tag should simplify
this. indeed. works perfectly.


Entry: wowri + sweb update
Date: Sat Mar 29 12:34:30 EDT 2008

sweb: link to current scat rambligns

wowri: figure out where to put the data. sweb code should be on
giebrok, but all the site data, managed by different people, should go
on kurk.

big choice: use sweb or just use wordpress or other cms? since this is
not just for melissa, it's probably best to stick to something popular
to limit the support cost..

what's important in a wiki
  - themes + customizable design
  - data backend for backups
  
maybe i should go for a ruby on rails wiki, to get a chance to jump on
that train? i'd like to do stuff in the scheme webserver, but that
creates more dependencies.. maybe check on planet for a wiki?

checked with the boss: she's the only one updating, so i can go on
experimenting. was thinking about using scribble instead of xml..


Entry: 2 servers
Date: Sun Mar 30 10:32:51 EDT 2008

going to try something with 2 servers
  * giebrok: has sweb running
  * kurk: has data

what i need is a way for giebrok to get data from kurk, but with
modification time so it can rebuild its caches appropriately. i tried
nfs readonly but i can't get that to work (mount gives permission
denied).

http objects have mtime, so i need a way to access that from scheme.

maybe it's easier to just roll a quick file server that gives mtime
data.

ok.. got something working. but do i really need tcp servers? probably
easier to use ssh + a single script that reads arguments from the
SSH_ORIGINAL_COMMAND.

what solves what problem?

   ssh: authentication + security
   program: data I/O

it's best to have a daemon to eliminate startup time. unix socket
daemon with socat?


Entry: server channels
Date: Sun Mar 30 14:16:24 EDT 2008

it's best to go back to point-to-point message qeueus. those have the
simplest properties to integrate. all the nuts and bolts can be solved
on system level (ssh/socat).

send binary data as efficient as possible. this means to prefix it
with a header. i've had enough trouble with quoting/unquoting strings
between different lisps now.

cons atom a atom b
cons atom a cons atom b null
cons binary 5 xxxxx null

made the simple server thing, split into 3 parts:
  * tcp server
  * function -> interpreter
  * path access

probably the path access is better with a guard. currently, it's a bit
limited.


Entry: browser events
Date: Mon Jun  2 14:20:14 CEST 2008

an annoying problem is two way synchronization of generated documents
and a browser:

  * browser should be notified whenever a source document changes so
    that it can refresh the doc.

  * whenver a browser refreshes / requests a doc, the server checks if
    the cache (compilation) is still valid.

i'd like to solve this problem for serving Scribble documentation. it
can probably be solved with a bit of javascript. the main question is
how to move events from server -> client? something i never understood
properly...

also: ramblings needs rss support.


Entry: taking out stream.ss
Date: Sun Aug 10 19:36:58 CEST 2008

The stream lib seems to complicated.  It's only used in parsing the
ramblings files into a stream, while a list would do just fine.  The
lazy part is implemented using an explicit delay around the article
parser anyway.

Going to try a local fork that simply deletes the stream.ss files and
works up from there.

Deleted split.ss

Then moved from there. Most adaptation was in parse-ramblings.ss but
quite straightforward to solve.


Entry: images and attachments
Date: Sun Aug 10 22:40:53 CEST 2008

I'd like to add images and other attachments.

The main problem is where to put them.  Currently, all is contained in
a single ramblings file, which is quite convenient.  All ramblings
files are part of a darcs archive, maybe they should use just relative
directories?

Relative is good. Part of darcs archive is too.


Entry: blogging software..
Date: Sun Aug 10 23:01:17 CEST 2008

ai ai ai.. she's not happy with Wordpress any more!

So.. What is necessary?

  * login
  * posts + reply
  * spam filtering
  * rss

I'd like to get rid of MySQL and PHP too.. So an alternatives using
sqlite:

platform

python+django  http://en.wikipedia.org/wiki/Byteflow
perl           http://en.wikipedia.org/wiki/Movable_Type
python/zope    http://en.wikipedia.org/wiki/Plone_%28software%29
python/django  http://en.wikipedia.org/wiki/PyLucid
ruby/rails     http://en.wikipedia.org/wiki/Radiant_%28software%29
ruby/rails     http://en.wikipedia.org/wiki/Typo_%28software%29


ntry: apache logs
Date: Mon Aug 11 08:48:37 CEST 2008

Apparently the standard debian install has a 'read' able apache log.

Wrote some stuff on top of it, to get unique ips and user agents.  It
is rather slow.. Maybe find a better way to represent it?

So, what to do with it?

Add some filters, like bots.  I'm not really interested in those.
Let's just grep for 'bot' first.


Entry: forms + login
Date: Wed Aug 13 20:09:47 CEST 2008

Let's try to figure out how to do this.  Login using some special
cookie (no passwords) and edit a site's content.


Entry: instaweb
Date: Wed Aug 13 20:38:47 CEST 2008

Maybe switch to instaweb, since I'm only interested in running a
single servlet.  On the other hand, it might not be necessary.  What
does it bring?  Simplified configuration + 'root' servlets.  It's easy
to add next time I add something:

#lang scheme/base
(require (planet schematics/instaweb/instaweb))
(instaweb #:servlet-path "servlets/ramblings"

with some minor tweaks in the url handling code.

EDIT: this won't work. the approach i take is to make a single
webserver for all networked applications, and run it behind apache to
expose only public servlets.


Entry: disorganized
Date: Sun Aug 17 15:45:12 CEST 2008

I'd like to make some 'stuff' available through my intranet.  What
this 'stuff' is mostly about location of files and terminals, and
trying to organize it all a bit better.


Entry: Electric project pages
Date: Sun Aug 17 16:00:26 CEST 2008

For example, the Staapl homepage could use some parametricity.  What I
need is a mechanism to format pages from darcs projects.  Let's call
it prj-www, and let's use a Scribble based frontend.

It's quite straightforward.  However, a non-interpreted language using
only scheme/reader is still tedious: funcitonality needs to be
provided through an interpreter.  

Let's use one of the scribble languages that expands to 'doc.  The
approach I'm trying now is 'executable xhtml' : the servlet frontend
will load the module into a namespace and strip all the scribble
markings, leaving just the s-expressions generated by the code.  This
leaves the responsability to generate proper xhtml at the servlet
side.  It's a bit raw ``PHP style'', but somehow makes sense for quick
& dirty apps.

Ok. Formatting one document (staapl homepage) it's pretty clear that
this is a bit too much hassle.  Better switch to the structure imposed
by scribble, and render it.


Entry: continuations
Date: Tue Aug 19 11:22:25 CEST 2008

Ok, it seems quite straightforward.  Leaving the continuation
management at the default, here's an example:

#lang scheme/base
;; -*- scheme -*-
(require
 scheme/pretty
 web-server/servlet
 web-server/servlet/web)

;; plt servlet interface
(provide interface-version timeout start)
(define interface-version 'v1)
(define timeout +inf.0)

(define (start req)
  (for ((i (in-naturals)))
    (send/suspend (page i))))

(define ((page i) url)
  (printf "~a\n" url)
  `(xhtml ()
          ,(format "Page ~a. " i)
          (a ((href ,url)) "[next]")))


So, when to use continuations, and when to use objects?  I.e. a
shopping cart is an object: it should collect items from different
parallel threads.


Entry: web server tutorial
Date: Wed Aug 20 19:48:51 CEST 2008

http://docs.plt-scheme.org/continue/index.html


Entry: databases
Date: Fri Aug 29 22:09:20 CEST 2008

I think I'm simplifying things too much.. A significant service a
database provides is a guarantee of consistency: 

  A database transaction, by definition, must be atomic, consistent,
  isolated and durable. These properties of database transactions are
  often referred to by the acronym ACID.

http://en.wikipedia.org/wiki/Database_transaction

In relation to persistence, this is not trivial.


Entry: snooze
Date: Tue Sep  2 19:56:09 CEST 2008

http://planet.plt-scheme.org/display.ss?package=snooze.plt&owner=untyped
http://planet.plt-scheme.org/package-source/untyped/snooze.plt/2/1/planet-docs/snooze/index.html

tom@sornfit:~/sweb$ mzscheme lib/db.ss
date-test.ss:67:43: compile: unbound variable in module in: make-srfi:date
setup-plt: error: during making for <planet>/untyped/unlib.plt/2/5 (unlib)
setup-plt:   date-test.ss:67:43: compile: unbound variable in module in: make-srfi:date


Entry: OpenDocument
Date: Fri Sep  5 15:57:31 CEST 2008

Trying to typeset Marycela's book.  Making some routines to process
.fodt XML files.  Load and save work.

This could save as a nice frontend for Melissa's writer website.
Instead of asking people to create HTML files, maybe they should just
stick with wordprocessors?

This looks like it's doable, if the style used is a bit restricted.
Marycela uses a lot of `space and tab' formatting, which is difficult
to recover properly.

Let's try to artificially create some documents and see how they look
in XML.

Hmm.. Coming from the structured documents side, it seems odd that
text is not structured.  Text is a sequence of paragraphs, where some
paragraphs might be tagged with a "Heading" style.  This is then
collected to make table-of-contents etc..

.odt is a JAR (ZIP) file.  Apparently the flat XML is not the standard
one.  And paragraphs and headers are separated: 'text:p and 'text:h
tags.  ZIP is supported in PLT Scheme through dherman/zip

http://planet.plt-scheme.org/display.ss?package=zip.plt&owner=dherman

Works like a charm:

(define (load-odt filename)
  (unzip-entry
   filename
   (read-zip-directory filename)
   #"content.xml"
   (lambda (name dir port)
     (xml->xexpr
      (document-element
       (read-xml port))))))


Entry: guessterpreter
Date: Mon Sep  8 01:04:26 CEST 2008

I'm done hacking my way through the xml file to convert it to
something with minimal markup.  However, current structure allows only
for one tag / style, while a style should be modeled as an expression
transformer.

OK. Works.

Next step: figure out how to retreive indentation information + add a
parsing step to recover stanzas.


Entry: 4.1.4
Date: Mon Mar  9 12:36:53 CET 2009

Startup breaks again with update to 4.1.4
This time it's a strange error

procedure application: expected procedure, given: #f; arguments were: "/servlets/ramblings" #f #f

 === context ===
/usr/local/plt/collects/scheme/private/contract-arrow.ss:1347:3
/usr/local/plt/collects/scheme/private/contract-arrow.ss:1347:3
/usr/local/plt/collects/scheme/private/contract-arrow.ss:1347:3
/usr/local/plt/collects/scheme/private/contract-arrow.ss:1347:3
/usr/local/plt/collects/web-server/dispatchers/dispatch-passwords.ss:33:2
/usr/local/plt/collects/scheme/private/contract-arrow.ss:1347:3
/usr/local/plt/collects/scheme/private/contract-arrow.ss:1347:3
/usr/local/plt/collects/scheme/private/more-scheme.ss:175:6: loop
/usr/local/plt/collects/scheme/private/contract-arrow.ss:1347:3
/usr/local/plt/collects/scheme/private/contract-arrow.ss:1347:3
/usr/local/plt/collects/scheme/private/more-scheme.ss:175:6: loop
/usr/local/plt/collects/scheme/private/more-scheme.ss:175:6: loop
/usr/local/plt/collects/scheme/private/contract-arrow.ss:1347:3
/usr/local/plt/collects/scheme/private/contract-arrow.ss:1347:3
/usr/local/plt/collects/scheme/private/contract-arrow.ss:1347:3
/usr/local/plt/collects/web-server/private/dispatch-server-unit.ss:62:2: connection-loop

Maybe it's time to move to a single-servlet server?  I don't
understand the red tape, and I currently don't need anything fancy..

Let's try to build it anew from the documentation.

OK. got something that works: start.ss:

#lang scheme
(require web-server/servlet
         web-server/servlet-env
         (rename-in "servlets/ramblings" (start servlet-start)))

(serve/servlet servlet-start
               #:port 8181
               #:command-line? #t
               #:servlet-path "/servlets/ramblings"
               #:servlet-regexp (regexp "^/servlets/ramblings")
               )


Entry: aggregation
Date: Mon Mar 30 14:45:41 CEST 2009

I'd like to add some aggregation mechanism for navigating all
different log entries to track my time usage.  Having all this time
available again after working out of the house on a full time basis
leads to chaos.  Most of my projects are linked in some way and
insights in one lead to changes in others..  I can't keep track any
more.


Entry: parsing
Date: Tue Mar 31 17:40:16 CEST 2009

I think I'm just going to remove the lex/yacc parser.  Anything I want
to do with syntax in ramblings files is going to be completely ad-hoc:
datamining based on existing text files instead of any kind of sane
structure.  So let's just use a direct tokenizer.

What the current lexer produces is a list of strings and xhtml
elements in xexpr form.

OK: replaced with whitespace/workd tokenizer state machine +
individual word matcher.


Entry: implementing aggregation
Date: Wed Apr  1 11:17:52 CEST 2009


problems:
  - update: currently when file changes it gets recomputed
  - mutation is used to index the list of articles.

solutions: 
  - separate indexing (the node datastructure) from a pure
    sequential list of articles


OK.. Using the straightforward approach creates problems.  I'm not
sure which evaluations get triggered again and again..

TODO: eliminate whole-datastructure traversals.  


Entry: links
Date: Thu Apr  2 14:44:55 CEST 2009

Things like [1]
...
[1] http://


Now that I have the body parser in a separate file this should be
straightforward to implement.  But, how to distinguish declaration
and reference?

An FIR style filter would be nice: one that slides a function over a
window of the stream and builds a new stream.

(define (ll-fir ll fn)
  ...)
  
Got reference registration + evaluation working: now figure out how to
bind urls.


Entry: databases
Date: Sun Apr 12 21:37:47 CEST 2009

So, what's the conclusion for the ramblings?  It costs too much.  It's
probably best to separate data storage and website logic after all:
the data isn't so dynamic, and the site is quite slow when updating
from the large text files.

Suggested split:

  * data cached in a database (SQLite probably, with untyped/snooze)
  * offline compiler to convert text -> database.


Entry: what to use a db for?
Date: Thu Apr 16 16:10:07 CEST 2009

This might seem a stupid question, but I'm thinking about using a
database as a data cache only.  Is this a good idea?

Only if data flows unidirectional from a different store format
(ramblings.txt) to the db, and never in the other direction (i.e. post
comments).

I'm biased towards text files: "source code" for data.  I don't have a
good feel for dbs due to lack of experience.  How to design tables?
In snooze [1] [2], is it ok to sometimes add tables or should we use a
different table for things that might associated to objects?

Let's stick to the real problems:

  * ramblings should be sped-up.  reload is too slow with huge text
    files.

  * I'd like to add some kind of comment posts facility to the
    ramblings posts.

[1] http://planet.plt-scheme.org/display.ss?package=snooze.plt&owner=untyped
[2] http://planet.plt-scheme.org/package-source/untyped/snooze.plt/2/6/planet-docs/snooze/index.html

Entry: indexing ramblings files
Date: Thu Apr 16 16:16:50 CEST 2009

The ramblings files don't tend to change much.  Let's invent an
indexing format which stores post offsets.


Entry: back to multiple servlets
Date: Tue May  5 09:52:56 CEST 2009

Time to get some more functionality behind sweb.  Basicly I'd like to
turn it into an OS for all kinds of network applications with
redundancy.

Now, I did make an apache log analyzer before.  Where dit it go?  Ha!
it's right here in lib/apache.ss


Entry: Data structure of the day: reference counter.
Date: Tue May  5 13:15:41 CEST 2009

This macro builds a data structure for reference counting and
histogram building.

(define-struct entry-ref (object refs) #:mutable)
(define (add-ref! hash object entry)
  (let ((er (hash-ref hash object
                      (lambda ()
                        (let ((er (make-entry-ref object '())))
                          (hash-set! hash object er)
                          er)))))
    (set-entry-ref-refs! er (cons entry (entry-ref-refs er)))
    er))
(define-syntax (define-ref-struct stx)
  (define (fmt fmt-string . a)
    (datum->syntax (car a)
                   (string->symbol
                    (apply format fmt-string (map syntax->datum a)))))
  (syntax-case stx ()
    ((_ name (fieldname ...))
     (let ((fieldnames (syntax->list #'(fieldname ...))))
       (syntax-case
           (list* (fmt "make-~s" #'name)
                  (fmt "make-rc-~s" #'name)
                  (for/list ((f fieldnames))
                    (list (fmt "set-~s-~s!" #'name f)
                          (fmt "~s-ref" f))))
           ()
         ((make-instance
           make-rc-instance
           (set-field! field-param) ...)
          #`(begin
              (define-struct name (fieldname ...) #:mutable)
              (define field-param (make-parameter (make-hash))) ...
              (define (make-rc-instance fieldname ...)
                (let ((instance (make-instance fieldname ...)))
                  (set-field! instance
                              (add-ref! (field-param)
                                        fieldname instance))
                  ...
                  instance)))))))))


So I removed this completely (the names complicate it greatly) and
created this function to work on tables instead:

(define (table-share table)

  ;; Register the object and put a shared instance in the vector.
  (define ((register! vec) hash object i)
    (vector-set! vec i
                 (reflist-object
                  (add-ref! hash object vec))))
    
  (let* ((n (vector-length (car table)))
         (hashes (build-vector n (lambda _ (make-hash)))))
    (values
     (for/list ((entry table))
       (let ((v (make-vector n)))
         (for ((column entry)
               (hash hashes)
               (i (in-naturals)))
           ((register! v) hash column i))
         v))
     hashes)))


Now I've updated this to generated a memoized expression by wrapping
it in a 'let expression that produces the table with shared data when
'eval ed.  This might come in handy somewhere else..  (aka memoization
aka common subexpression elimination).

box> (table->let (list (vector "foo" "a") 
                       (vector "foo" "b")
                       (vector "foo" "c")))
(let ((0:0 '"foo") 
      (1:2 '"c")
      (1:0 '"a")
      (1:1 '"b"))
  (list (vector 0:0 1:0) 
        (vector 0:0 1:1) 
        (vector 0:0 1:2)))


Entry: log analyzer
Date: Tue May  5 17:55:16 CEST 2009

main features:
     - bot filter
     - cross-referencing unique identifiers

i do wonder if this is better than putting everything in a database.

let's see if the cross-referencer can be abstracted better without so
much mutable state.

What is it?

Converts a table of object relations by wrapping each object in a
table entry in a reference list that links it to other table entries
in which it occurs.

So, this is better abstracted as a map:

    table ->  table w. shared data
              set of hashes

in fact, the hashes are a bit of a side-effect of introducing sharing.


Entry: servlets
Date: Wed May  6 12:23:29 CEST 2009

Goal: 

  - run all scheme web apps in a single server (on giebrok + redundant
    on zwizwa and zzz)

  - solve data dependencies of dotp: data is on kurk only.


Entry: databases
Date: Wed May  6 16:25:18 CEST 2009

2 things to do:

* apache logs
* ramblings

Let's try apache logs first since they can be easily separated.  Now
installing untyped/snooze.

Let's have a look at its dependencies first:

cobbe/contract-utils            
jaymccarthy/sqlite              access to sqlite databases
ryanc/require                   library indirection
schematics/sake                 building automation
schematics/schemeunit           unit testing
soegaard/galore                 functional data structures
untyped/unlib                   misc untyped
cce/scheme                      scheme programming utilities


Entry: faster ramblings parsing
Date: Thu May  7 10:51:41 CEST 2009

It might help to work on byte files instead of character files.

Ok.  This works fine: read the whole file as a byte string, then
perform a regexp-split to segment articles, then lines, then parse the
attributes and lazily parse the body.


Entry: speeding up inner parser
Date: Thu May  7 14:24:32 CEST 2009

http://localhost:8181/servlets/ramblings/staapl/20090212-184818

cpu time: 1228 real time: 1588 gc time: 108
cpu time: 2056 real time: 2455 gc time: 52

Now that's simply ridiculous.

Let's see.. I think I need to have a proper look at the regexp syntax
instead of writing all these adhoc tokenizers.

Ok. All representation uses bytes now + _compiled_ regular
expressions.  This is what i get:

cpu time: 19 real time: 20 gc time: 0
cpu time: 12 real time: 12 gc time: 0

A little better ;)

Using strings instead of bytes makes it not more than twice as
expensive.


Entry: index
Date: Thu May  7 18:03:52 CEST 2009

Now.. what about indexing the files?

Ok.  I've added machinery to perform registration of links and words.
Care needs to be taken however to properly transport them across
promises, since they are parameters.

Then, I wanted to force articles in a separate thread to build an
index while booting up.  But apparently there is no built-in way to
synchronize on a promise being forced..


Entry: composing regexps
Date: Thu May  7 19:45:12 CEST 2009

It has to be possible to use some kind of abstraction to compose
regular expressions.  Let's have a look at the reference manual
again..

TEST:

(foo)
(http://foo)  <- parens are valid in http:// urls
'http://foo'
"http://foo"
<http://foo>
{http://foo}

http://zwizwa.be


Entry: forcing + sync
Date: Fri May  8 15:33:55 CEST 2009

How to make sure that an expression that needs an update won't get
updated twice by two different threads?  Let's do this for
force-dynamic instead of force.

A node can be in these states:

* not accessed
* cache check
* cache update

It's probably simplest to use a semaphore for this.  syn


Entry: viewing scribble docs
Date: Sat May 16 01:02:12 CEST 2009

one of the things i keep finding annoying is to have to press refresh
in a browser window.  how to fix?  i'm not versed in this xmlrpc
stuff, but is it possible for a server to send a message to the client
without it polling for new data?  probably.


Entry: scribble
Date: Tue Jun  2 15:37:14 CEST 2009

Maybe the blog parser should be able to parse scribble docs?  They are
fairly readable as text in case a parser is not available..

Given a string representing a module, how do you evaluate it into
xhtml that can be straigth embedded?


Entry: library
Date: Fri Jun 12 10:24:37 CEST 2009

The problem: my growing collection of locally cached electronics
papers and books is getting quite large.  I'd like to construct an
interface for it + solve the problem of making sure it is available
everywhere.

Rationale for locally cached library:

  * Not all content is available on the web.
  * I'm not always online.
  * The total size is managable.

Rationale for specific storage structure:

  * Data is read-only
  * The only operations are add and delete.
  * Want to avoid (exact) duplicates.
  * Not all machines are always on. (reason for distributed system)

Ideally I'd like this to be a reference pool so it's easier to add
references to papers.  This is something that can grow however.  It's
best to get meta-data automatically from the web, and focus on caching
the data, and linking the metadata.


Some problems:

  * A web interface
  * Meta data format?
  * Organization + Search?

Practical problems and solutions:

  * Only single files are indexed.  This works best for ps.gz, pdf and
    djvu.  Multiple files, use tar.bz2 archive + figure out how to
    unpack this in the viewer.

For storage, maybe check here[1].  I'd like to move to an
implementation where each file is indexed as an MD5 file.  This would
make it possible to tap into MD5 content hash databases.  


[1] http://en.wikipedia.org/wiki/Content-addressable_storage

Entry: Datasheets
Date: Fri Jun 12 12:58:41 CEST 2009

Testing my new library referencing in sweb[1].  The PIC18F252
datasheet[2].

[1] http://zwizwa.be/darcs/sweb
[2] lib://85708a5accb49b829264d8556bcc853e


Entry: library update
Date: Fri Jun 12 14:31:45 CEST 2009

what works now:

  - "library" package with some scripts to modify and view the store
  - papers/md5.txt for description of md5-addressed content
  - an "About" button that googles for the ID of a ramblings post

todo: an easy way to view a file in the local library cache.

since this is actually allowing the execution of a command on the
local machine it needs to be done with a bit of care..  the simplest
way is to define a file type that can be passed to firefox.  or a
protocol handler.  i can't get those to work though.. i don't
understand firefox: it's gotten quite closed down over the years.

Ok.  Editing the mimetypes.rdf file in my firefox config tree,
disabling the system defaults and telling it to ask made it work.
Apparently this is a bug[1].

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=428658


Entry: multiple tags
Date: Sat Jun 13 13:39:04 CEST 2009

It might be best to allow both the standard 8-digit date based
indexing, and any of ISBN or MD5 hash indexing.  Because of different
NB digits these won't clash.


Entry: Ramblings format features
Date: Sat Jun 13 13:44:05 CEST 2009

  - WYSIWYG: stick to fixed-width text which allows ad-hoc formatting
    (useful talking about code) and is easy to use in emacs.

  - Add a reference mechanism to allow URIs to be tucked away at the
    bottom of a post, not disturbing the flow of text.  However, they
    are still visible instead of hidden as in HTML.  There is too much
    information in the visual representation of a URI to not show it.

  - A ramblings file is a sorted list of entries. "^Entry:" is the
    split tag, and it is not allowed to be part of a post text.  There
    is no escape mechanism, so you can't quote the description of an
    entry within the body of an entry.  The header of an entry is a
    collection of "^<name>: <value>$" pairs.  These have some meaning
    in the parser and mainly direct indexing.

  - Ad-hoc body parsing.  There is room for extension, but the main
    idea is that standard URIs take precedence.  The '{' and '}'
    tokens are still free for hierarchical grouping, but they are
    currently not used.  The unit of parsing is the word.  This means
    that URIs with spaces are not supported (replace them with %20).
    For extensions, it's probably best to use reserved words that
    don't clash with english or any code that would be present in a
    post body.


Entry: LaTeX
Date: Sat Jun 13 16:40:04 CEST 2009

I'd like to revive my math blog, but in a way that preserves the LaTeX
layout.  I.e. to have the convenience of the ramblings format, but
with latex formatting.

Let's see about some math blogs on the web.  One is the unapologetic
mathematician[1].  I'd like to do it in the following way: have a
default www output, but allow for the destillation of a PDF.

Hmm.. I tried several applications (html2tex, tth, tex2ht, latex2rtf)
but none of them give proper output and they are slow.  I'm thinking
it's best to go back to the basics: latex + dvipng.  (dvipng is a
separate debian package).

dvipng foo.dvi -o foo.png -T bbox

Conclusion: with a bit of massaging this could be made into a simple
way to display formatted tex on a page.  Basicly, what you want is a
single long page, and slice it into segments so browsers will display
it properly.  Then put the .tex source also on the page so it can be
indexed properly.


[1] http://unapologetic.wordpress.com/

Entry: Image test
Date: Sat Jun 13 20:35:42 CEST 2009

Inline images now work.

img://mathworld.wolfram.com/images/gifs/Rule110Big.jpg

Next: how to display the .tex stuff?  It would be nice to do this as a
service, but this will lead into url length limits.  So.. Maybe try to
turn the latex|dvipng command into a script and integrate it with sweb
for inline image generation.

Let's split it in two: allow to extract a post as a plain file, then
create a transformer for such extranctions (restricting all
conversions to locally defined content to prevent use/abuse).

Ok.. I didn't use the ? thingy in the url yet.. The subdirectory
structure is nice for giving context to the requests, but per post
there can be different operations tagged to this ? construct..  Let's
have a look at the scheme docs.

Ok. got queries as: 

(require web-server/http/request-structs)
(require net/url-structs)
(url-query (request-uri req))))


Next: capture dependencies + store intermediates..  The question is
probably, do we store the .png in the memory image, or leave it on
disk?  

In general, the problem is a disk-caching strategy: it's ok to have a
live program manage dependencies between data, but the data itself can
be stored externally.

The same could be done for the parsed form of the ramblings
file.  Instead of keeping the parsed intermediates in memory, place
it in disk storage.

Also, it would be interesting to be able to split the problem in two:
external programs take data from memory, and place them back into
memory, but the actual location of the files could be made hidden.

So. it's not just a cache, it's also a database of (mime-tagged)
binary data.

In short: 

  * write a transparent storage mechanism that uses scheme objects on
    one side, and a directory+file structure on the other side.

  * this database should be accessible over http (without sweb
    running) meaning it needs a fully consistent on-disk
    representation for forced data, and a simple way to display
    promises.
 

Entry: juggling binary objects -> caching continuations
Date: Mon Jun 15 12:18:59 CEST 2009

Using the latex + dvipng the problem isn't really computation time.
Rendering is fast.  So why cache it?

What about this: create an abstraction that map the operation of
creating a directory with files onto the creation of a list of
objects.  I.e. a hash table.

Then, in sweb when these files get transferred to the client, they can
be garbage-collected.

It would even be better if the .dvi could be cached in memory (since
it's not that large), but the dvi2png can extract individual pages.

Actually, this completely solves the problem.  The hairy part is the
fact that a .dvi or .tex represents a _collection_ of pages, but http
requests are always about individual items.. So in modeling data
structure, you need to think about dependencies and then apply
memoization there.  Simply put: dvipng can use indexed addressing
easily.

Then the 2nd problem: doing it this way the filesystem storage used as
a scratchpad during the execution of a program can be abstracted
completely.  This is what makes things a whole lot less messy.


Conclusion:

  * HTTP requests are about objects.  This should be reflected in the
    in-memory model.  

  * Some documents have a _logical_ hierarchical structure.  I.e. html
    + embedded images.  This can be reflected in the in-memory model
    using dependencies.

  * Intermediates can produce multiple objects which are requested
    asynchronously through http.

The problem is the asynchronous nature of http.  Because it has no
concept of containment (which sucks if you ask me) this containement
needs to be modeled elsewhere.  Because of the production of multiple
objects, some memoization is a good idea.  As long as the memoized
data isn't too large (in this case .dvi files are only slightly larger
than .tex files) it can be kept around in memory.  If not, some disk
caching strategy might be necessary.

The real insidiuous problem here is that you can't really
garbage-collect anything: the client might request sub-documents, or
might not.  This is the central problem in server-side continuation
management for instance..  You really want to transfer all this
information to the client.

Hey.. Can this be done for intermediate data also?  I.e. instead of
keeping the memoized .dvi around, can't we just dump the .dvi in its
entirety to the client, then ask the client to give us the .dvi it
wants rendered?

It's the same thing as you'd want to do with continuation storage.
The problem is really that continuations themselves tend to be large,
and passing them back and forth between client and server is not a
good idea..  So, caching is in order, and that is where the problems
start.

So essentially to solve the web continuation problem you just need a
caching approach that works.  That's all.  But definitely not trivial
as it's hard to define what a good caching strategy is..

[1] entry://../compsci/20090615-131905

Entry: calling latex and dvipng
Date: Mon Jun 15 13:47:07 CEST 2009

Problem: given binary data represented in-image, call a script that
processes it (possibly using local filesystem cache) and transfer the
resulting file(s) back into image.

Core idea: file systems do not support garbage collection (by design,
since references to files cannot be tracked due to symbolic
representation / lack of type information) so they need to be
abstracted as local state[1].

So what's the essential abstraction?  A 1-1 map between scheme objects
and filesystem object.  Whenever the object disappears, the filesystem
object will be deleted.

In PLT Scheme this is handled by finalizers[2][3].

> (require scheme/foreign)
> (define (finalize it) (printf "finalizing: ~a\n" it))
> (register-finalizer (cons 1 2) finalize)
> (collect-garbage)
finalizing: (1 . 2)


NEXT: find an api to make this easy.  I.e. define a standard file
system interface and some scheme functions to create and reference
objects.


[1] entry://../compsci/20090615-134849
[2] http://groups.google.com/group/plt-scheme/browse_thread/thread/9ae6c5a6c331431b
[3] http://www.cs.brown.edu/pipermail/plt-scheme/2005-May/008898.html
[4] http://download.plt-scheme.org/doc/4.1.5/html/foreign/foreign_pointer-funcs.html#(def._((lib._scribblings/foreign/unsafe-foreign..ss)._register-finalizer))


Entry: Flat view of ramblings
Date: Mon Jun 15 16:03:26 CEST 2009

Shouldn't be too difficult to implement.  Add a virtual "aggregate"
topic which can 

  1. access all posts by simply searching them.

  2. create a table of contents, sorted by key.


Ok. 1. is implemented.  2. shouldn't be so difficult but is for
another time.

Now I'm wondering, maybe this should really be the default?  Get rid
of sections in the data representation, but use them only for the
index + in-topic navigation.  The constraint (make sure keys are
unique) doesn't seem too restrictive.

In-topic navigation could be implemented differently.  Since this is
just a filtered list an interpretation should be simpler.  This
however requires to move from a graph to a tree structure: posts no
longer have a unique previous/next post...

But.. From a human p.o.v. keeping sections is a good idea..  If not
only to separate the sane from the unsane, the edited from the raw.

One interesting property is that posts can move without breaking
outside references, by writing aggregate indexing as a http redirect.


Entry: tex -> png
Date: Tue Jun 16 14:03:56 CEST 2009

Implemented.  Currently images get rendered on demand.

It's probably better to render them all at once so a wrapper xhtml
file can be generated (it needs nb of pages).


Entry: pdf
Date: Wed Jun 17 11:21:43 CEST 2009

So.. now that -> png works for quick display, let's add a PDF button
too, so google can index it.


Entry: Making web links explicit
Date: Wed Jun 17 11:27:54 CEST 2009

The problem: images and other embedded objects are handled separately
by the server.  Is there a way to somehow unify this view?
I.e. construct an object that can be traversed by the server to pick
out individual objects?

The idea is that the server's addressing mechanism is of no concern to
the document/node compiler.

So:
        * document compiler buils graph structure 

        * server queries graph structure.

Server needs to provide addressing information to the graph compiler
so proper addresses can be embedded in the documents.


Entry: Hash Tables Of Closures
Date: Wed Jun 17 11:50:38 CEST 2009

I'm trying to make explicit the design used in sweb.  The document
structure is a graph, where each node (accessed by a symbol)
represents either a server response object or something else.

It is a combination of the following principles:

  * late binding
  * lazyness + caching (= "non-functional" lazyness)
  * protype-based programming

This is a common[1][2] pattern in web programming.

Message sending happens by abstracting both nodes and messages as
objects (hash tables).


[1] http://www.paulgraham.com/noop.html
[2] http://lispy.wordpress.com/2007/07/09/closures-hash-tables-as-much-oop-as-youll-ever-need/


Entry: Latex Rendering
Date: Wed Jun 17 16:31:45 CEST 2009
Type: tex

\LaTeX $ $ rendering is now impelemented, including references which
are fished out of the comments.

%[1] http://zwizwa.be/darcs/sweb


Entry: Latex Multiple Pages
Date: Thu Jun 18 17:03:48 CEST 2009

It might be handy to split a latex post into multiple (html) pages:

 * there is no easy way around formatting quirks when concatenating
   images top to bottom.

 * 2-column display might be feasible.

 * .tex file needs a single namespace, so splitting it into different
   posts doesn't work well.

 * for large papers, loading all the images is wasteful


Entry: the md5:// links in firefox
Date: Sat Jun 20 10:50:23 CEST 2009

Currently it's necessary to edit mimetypes.rdf manually.

  <RDF:Seq RDF:about="urn:schemes:root">
    ...
    <RDF:li RDF:resource="urn:scheme:md5"/>
  </RDF:Seq>

  <RDF:Description RDF:about="urn:scheme:md5"
                   NC:value="md5">
    <NC:handlerProp RDF:resource="urn:scheme:handler:md5"/>
  </RDF:Description>


  <RDF:Description RDF:about="urn:scheme:handler:md5"
                   NC:alwaysAsk="true">
    <NC:externalApplication RDF:resource="urn:scheme:externalApplication:md5"/>
    <NC:possibleApplication RDF:resource="urn:handler:local:/home/tom/bin/md5"/>
  </RDF:Description>

  <RDF:Description RDF:about="urn:handler:local:/home/tom/bin/md5"
                   NC:prettyName="md5"
                   NC:path="/home/tom/bin/md5" />

  <RDF:Description RDF:about="urn:scheme:externalApplication:md5"
                   NC:prettyName="md5"
                   NC:path="/home/tom/bin/md5" />

In addition you need to add it in the "about:config" panel.

network.protocol-handler.external.md5 = true
network.protocol-handler.expose.md5 = true
network.protocol-handler.app.md5 = ...


Doesn't work..
What a piece of opaque crap.

Ok I got it to go again but in a frustrated manner so I don't know
exactly what happened, but at this moment the configuration above,
with alwaysAsk set to true did work after I made sure that the script
was executable.  I don't know if the about:config stuff is still
necessary.  It's filled in my current config, but didn't have any
effect when I filled it in.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=428658


Entry: Split .txt files
Date: Sat Jun 20 15:58:14 CEST 2009

Make it possible to gather a single ramblings section from multiple
text files, to separate "constant" from "editable" data.  Currently it
takes too long to update large files (around 5000 lines it becomes
problematic on my 1.8Ghz Pentium-M).


Entry: auto-discover ramblings
Date: Sat Jun 20 16:00:59 CEST 2009

Make ramblings files auto-discoverable: take the first line of the
file as the description and re-read the directory on each request.

Funny how this seems all quite trivial, but is really about learning
how to manage events.  


Entry: separate indexing from storage
Date: Wed Jun 24 15:14:15 CEST 2009

Before it's possible to create a single store for all the posts, it's
probably better to separate the indexing from the post store.

post -> section -> index -> next/prev

Solved with a "toc" object that can perform tag indexing + relative
offset.

Entry: removed random link generator
Date: Wed Jun 24 16:12:51 CEST 2009

  ;; Add a random page generator (dynamic node)
  (n! page-links 'random
      (dynamic
       (list-ref-random node-list)))

  (define (list-ref-random lst)
    (list-ref lst
              (inexact->exact
               (floor
                (* (random)
                   (length lst))))))


Entry: moving toward a single pool
Date: Thu Jun 25 10:28:14 CEST 2009

What's necessary is to separate indexing from posts.  However, in
order to be able to navigate from one post to another using any
indexing mechanism, the posts need a link to the index.

First: replace nodelist->grap with nodelist->toc

Next: separate out index building.  done.


Entry: atexit
Date: Sat Jun 27 10:42:50 CEST 2009

http://list.cs.brown.edu/pipermail/plt-scheme/2009-March/031530.html

looks like there is no simple solution, so it might be best to just
stick with the current one.

however, running a script in tail positiion in atext() could be the
simplest solution.


Entry: apachelogs
Date: Fri Jul  3 09:26:12 CEST 2009

65.55.106.157 - - [12/Apr/2009:00:47:02 +0200] "GET /shop_content.php?coID=47&XTCsid=v6ldqlfhml4a9vv90igsmbj6sis9n77u HTTP/1.0" 404 16 "-" "msnbot/2.0b"

The current parser chokes on this input, and I don't understand why.
Probably best to use regular expressions.

Ok, that seems to work fine.  I've created 2 parsers, for combined +
common log formats.

parsing takes _way_ too long.. 

Ok, i tried to convert it to scheme reader format and it's even worse.
I guess that's why one would use databases: the data is parsed.

So, how does one go about managing such datasets?  One thing is that
these logs are very redundant.  Simply indexing every field using
32bit integers would already do wonders.  This would allow the graph
structure to be represented separately from the node data.

What am I interested in?
  - date
  - ip
  - referrer
  - agent (for bot filtering)
  - request


Entry: logfile to graph
Date: Sun Jul  5 11:27:50 CEST 2009

I wrote some code for this before.  Let's have a look at it.

It's in refs.ss

Moving it to plt.ss


Entry: apache logs
Date: Thu Jul  9 12:24:37 CEST 2009

So..  This is the first thing I've dealt with in a long time where
performance really matters.  Reading the database from disk in text
format, either directly from the Apache logfile syntax or preprocessed
into scheme syntax is really too slow.  Some other database storage
mechanism needs to be used.

Now.. Can standard methods be used?  Using an sql database with proper
shared data might really be enough.  The basic idea is to have the
graph structure represented in a form that is easily accessed.  In the
end, it is nothing but a bunch of integers.

So I wonder: does using standard methods have _any_ advantage here?
One would be external tools could use the database.

Let's get the numbers right first.  How many table entries for a
year's worth of logs?

291 logfiles
1065149 entries

206M   uncompressed
17.7M  bzip2, one file
25.6M  gzipped, one file / multiple files

What strikes me is that zcat is very fast reading the files.

tom@zni:/opt/apache-logs/access$ time bash -c 'zcat * | wc -l'
1065149

real	0m1.147s
user	0m1.072s
sys	0m0.072s

Why is parsing in the current implementation so incredibly slow?  I
guess I'm seriously underestimating regex complexity.

Anyways, what's needed is:

         bits
date     32
ip       32
request  32
referrer 32
agent    32

20 bytes per row that's about 20MB of indexed data = quite close to
the compressed sizes!

Ok. Regex. Maybe the main problem is that my expressions are not
anchored?  No.. I feed it lines.

Hmm.. Let's start encoding it into something that loads fast.  It
looks like a specific structure for representing the logfile is going
to be way more interesting than a dumb regexp based approach.
Especially since requests themselves are so structured.  Basic idea:

  - turn the logfile into a graph with as little redundancy as
    possible.  this will show the "real" structure of the data.
    basicly: define what a "token" is.

  - write queries on the graph structure


Entry: git for deployment
Date: Thu Jul  9 19:54:33 CEST 2009

I'm using darcs for a while now, and I like it.  I don't use much
advanced features, but it's mostly unvisible, except for not
intelligently merging my ramblings files (for which it is not really
to blame).

But.. It's slow.

I'd like to give git a try, but only for deploying the zwizwa website,
to see if fast rollback is possible: if it doesn't work, rollback
immediately and possibly automatically.


Entry: dotp
Date: Thu Jul  9 22:30:46 CEST 2009

Re-included dotp logic in sweb so i don't need two instances running.


Entry: regexp matching
Date: Sun Jul 12 10:02:45 CEST 2009

Changing the regexps so all matches are minimal matches dramaticly
sped up the search.


Entry: Databases
Date: Wed Jul 15 09:55:57 CEST 2009

I don't understand databases[1]: 

  In the database world, developers are sometimes tempted to bypass
  the RDBMS, for example by storing everything in one big table with
  two columns labeled key and value. While this allows the developer
  to break out from the rigid structure imposed by a relational
  database, it loses out on all the benefits, since all of the work
  that could be done efficiently by the RDBMS is forced onto the
  application instead. Queries become much more convoluted, the
  indexes and query optimizer can no longer work effectively, and data
  validity constraints are not enforced.

This is exactly what I would do.

Some terminology[2]:

  - a table represents a relation (subset of a product space)

  - a functional parameter dependency of A -> B means that given
    parameter A, there is a unique B (the relation is a function of
    A).

  - a candidate key is a minimal superkey.  a supperkey is a
    collection of attributes that uniquely determines a row  (there is
    a functional dependency  key -> row).
   
About performance[3]: In [4] is mentioned that first, you should start
from a anormalized[2] design and optimize it, then you can start using
indexing for columns that are frequently used as important selection
criteria, sort criteria, and/or used in joins.


[1] http://en.wikipedia.org/wiki/Inner-platform_effect
[2] http://en.wikipedia.org/wiki/Database_normalization
[3] http://en.wikipedia.org/wiki/Index_%28database%29
[4] http://www.15seconds.com/Issue/040115.htm
[5] http://www.troubleshooters.com/littstip/ltnorm.html


Entry: Passwords
Date: Wed Jul 15 11:11:09 CEST 2009

How to handle authentication in the PLT webserver?


Entry: firefox on unix sockets
Date: Thu Jul 16 12:53:46 CEST 2009

http://support.mozilla.com/tiki-view_forum_thread.php?locale=cs&comments_parentId=97358&forumId=1

Yes..  It would make a lot of authentication problems a lot simpler..


Entry: tired of bad quality scans
Date: Thu Jul 16 13:50:35 CEST 2009

I want to make a document viewer with custom image postprocessing in a
web browser (serving .png files).

The only part that's missing right now is random page access to pdf
and djvu files.

What I wonder is how pdfs that encapsulate bitmaps can be convinced to
expose these bitmaps in non-resampled form.  All pdf rendering i've
seen asks for a DPI setting.


Entry: md5 -> isbn index
Date: Thu Jul 16 14:31:11 CEST 2009

move all books in the library to an isbn index.

better still, use a single protocol handler to reference all kinds of
extensions to firefox.

i.e.  stuff://isbn/123...

these could then be replaced by somewhat helpful links for the
external view, and directly link to functionality in the internal
view.

basic idea is that it's a pain to add protocol handlers or type
handlers to firefox, so to do it once for all types is maybe best.


Entry: check-syntax
Date: Thu Jul 16 17:56:49 CEST 2009

Is it possible atm (i mean little work) to convert a bunch of source
files to a cross-linked datastructure or html document?


Entry: Databases
Date: Fri Jul 24 11:51:03 CEST 2009

Persistance.  Even though I like the idea of using ad-hoc data
structures to represent data, it does seem that SQL-based storage is
not going anywhere, so I'm going to spend a couple of hours getting
something setup in a standard way.

Probably best to start with a simple interface to reduce db ignorance
before using the more involved untyped ORM `snooze'.  I'm using [1] as
a guide.

PLT Scheme + SQLite

(require
(planet "sqlite.ss" ("jaymccarthy" "sqlite.plt"))
(planet "sqlite.ss" ("soegaard" "sqlite.plt" 1 0)))

Mini SQLITE cheatsheet:

   .help
   .tables
   .dump <table>
   select * from <table>;

Ok... This should be enough to try something with the apache db.  It
looks like it is a quite straightforward bridge between SQLite's SQL
dialect and Scheme.  Now I just need to learn SQL.


[1] http://scheme.dk/blog/2007/01/introduction-to-web-development-with.html
[2] http://www.sqlite.org/sqlite.html

Entry: PLaneT
Date: Fri Jul 24 12:01:36 CEST 2009

I'd like to figure out how to:

  - install offline

  - find dependencies

From the sqlite deps, what is

     sake: build tool (like `make')


Entry: Reverse parsing of ramblings files
Date: Sun Aug 30 15:24:56 CEST 2009

A lot of code would be simpler to specify lazily if only the ramblings
files would have the more recent articles at the top.  The application
is really only interested in recent entries, with the tails computed
on-demand.

However, maybe this isn't so important as the initial indexing can be
performed quite fast in a strict way: a simple regexp search for
"^Entry:" will do.


Entry: Removed database files
Date: Tue Sep 29 09:27:08 CEST 2009

hunk ./web-root/lib/db.ss 1
-#lang scheme
-(require (planet untyped/snooze:2:6)
-         (planet untyped/snooze:2:6/sqlite3/sqlite3))
-
-(define-snooze-interface
-  (make-snooze (make-database (string->path "/home/tom/test.db")))) ; TODO: arguments...)))
-  [_$_]
-(provide (all-from-out (planet untyped/snooze:2:6))
-         (snooze-interface-out)
-         (all-defined-out))
-
-         [_$_]
rmfile ./web-root/lib/db.ss
hunk ./web-root/lib/ramblings-dispatch.ss 58
-    (with-input-from-file path
-      (lambda () (read-line))))
+    (with-handlers ((exn:fail:filesystem? false))
+      (with-input-from-file path
+        (lambda () (read-line)))))
hunk ./web-root/lib/ramblings-dispatch.ss 68
-               (b blurbs))
+               (b blurbs)
+               #:when b)
hunk ./web-root/lib/sqlite.ss 1
-#lang scheme/base
-(require
- (planet "sqlite.ss" ("jaymccarthy" "sqlite.plt"))  ;; McCarthy + Welsh
- (planet "sqlite.ss" ("soegaard" "sqlite.plt")))
-
-
-(define db-file "/tmp/test.sqlite")
-
-(define db #f)
-
-(define (create-db!)  [_$_]
-  (begin
-    (with-handlers ((void void)) (delete-file db-file))
-    (set! db (open (string->path db-file)))))
-
-(define (create-table-entries)
- (exec/ignore
-  db
-  #<<SQL
-CREATE TABLE entries (
-  entry_id  INTEGER PRIMARY KEY,
-  title     TEXT,
-  url       TEXT,
-  score     INTEGER )
-SQL
-  ))
-
-(define (insert-entry title url score)
- (insert db (sql (INSERT INTO entries (title url score)
-                         VALUES (,title ,url ,score)))))
-
-(define (dump)
-  (select db (sql (SELECT (title url score)
-                          FROM entries))))
-
-(create-db!)
-(create-table-entries)
-
-(insert-entry "woa" "foofoo" 123)
-(insert-entry "mak" "wabba"  567)
-
rmfile ./web-root/lib/sqlite.ss
hunk ./web-root/lib/test-db.ss 1
-#lang scheme/base
-(require "db.ss")
-(define-syntax-rule (db . exprs) (call-with-connection (lambda () . exprs)))
-
-;; When updating the struct, add do this: "sqlite> ALTER TABLE post ADD date;"
-
-(define-persistent-struct post [_$_]
-  ((title type:string)
-   (date  type:time-utc)
-   (body  type:string)
-   ))
- [_$_]
-(define (all)
-  (let-alias ((P post))
-             (db (find-all (sql:select #:from P)))))
-
-(all)
rmfile ./web-root/lib/test-db.ss


Entry: Ramblings File Representation
Date: Tue Sep 29 09:32:44 CEST 2009

Instead of building the index page statically, it might be better to
generate it from an enumerator/stream.

Ok, aggregation itself isn't so difficult, the problem is with
propagating cache refresh signals up: the cache mechanism needs a
compose operation.

Hmm..  I suspect this is probably more a problem with my
representation.  It would be nice to figure out a way to detect
structural changes (i.e. add/delete article) as opposed to content
changes, and then simply updating this structure instead of
regenerating it from scratch.

What I really want is each article to be a separate entity.  Can this
be abstracted in that way?

Alternatively, content could be kept completely on-disk, with
expensive operations (like tex->png) cached on top of that, i.e. based
on some hash of the text.

Roadmap: abstract ramblings files as objects with the following
interface:
  - header
  - node-list
  - node ref (symbol -> key.value hash)

where data always comes directly from disk, and internal indexing is
regenerated when the file changes.

Entry: SXML planet library
Date: Tue Oct 20 15:27:34 CEST 2009

[1] http://planet.plt-scheme.org/package-source/lizorkin/sxml.plt/1/1/doc.txt


Entry: restructuring
Date: Thu Dec 24 10:21:08 CET 2009

I'd like to restructure the app as a functional program:
i.e. eliminate the graph structure of links, and make this implicit.
Most code that actually does something is already structured as
memoized/lazy/cached data.


Entry: restructuring cont..
Date: Wed Jan  6 08:23:03 CET 2010

This is not simple.  Essentially, this requires to get rid of the n!
node assignment.  Conceptually the design is ok, but it should be made
clear which links are static (single assignment), and which are not.

The model I'd like to use is still that of a graph, but constructed
using lexical variables, so that at least the binding information is
known at run-time.  Currently it's just a bunch of nodes -- quite
unstructured.

The feeling I'm trying to obtain is that I can make an isolated object
(an article) that can be represented not linked to the rest.  So one has:

- raw article
- article views (html, rendered .tex)
- list of articles (a ramblings.txt file's default order)
- index
- navigation


Entry: more restructuring..
Date: Sat Jan  9 09:03:21 CET 2010

I don't want to kill the current ramblings implementation.  I wonder
if a full rewrite following a different model isn't a better approach.
The graph n@ n! code is embedded quite deeply.

The problems ramblings.ss actually solves:

- article parsing
- article views (html, rendered .tex)
- list of articles (a ramblings.txt file's default order)
- index
- navigation
- data management (memoization or dependency management)

I also miss a decent emacs interface, and a definition of a single
aggregated context object (i.e. text + embedded non-text objects), and
a coupling to version control.  Also, authentication would be nice.

I need an object model and a concrete data store.  The store should be
simple to update, and hopefully human-readable.  Large text files are
nice, but not quite managable.

So, what is an article?  Essentially an object with different views.

Wait.. Can the current representation be converted to
single-assignment form?  I like the memoization & cache approach, but
a static name structure would be simpler.

Also, indexing needs to be separated from construction.  Maybe that's
the place to start first?  Add one layer of indirection there?

Yes, this is really to complicated.  The cross-linked objects model
seems to work in practice.  However, I can't prevent myself from
thinking it's a ball of mud.  You can't cleanly separate out a part,
but mayble that's the point?

This is really for later..  I need a clearly motivated goal to tackle
this.


Entry: Relational Lenses: Updatable Views
Date: Thu Jan 21 15:35:10 CET 2010

[1] http://www.cis.upenn.edu/~bcpierce/papers/dblenses-pods.pdf


Entry: sweb and emacs
Date: Sat Feb 27 15:10:09 CET 2010

I'd like to tie sweb to emacs.  In emacs: find the first occurence of
'Date:' and pass it to sweb, which will turn it into an ID.

This needs:
  * index by ID only (not section/id).
  * index by `date` string


Entry: Hmm.. not happy
Date: Sun Mar  7 10:10:20 CET 2010

This graph business doesn't seem right.  Isn't there a way to
represent the code such that it is purely functional?


Entry: The graph business.
Date: Sun Mar 21 12:01:13 CET 2010

I've been whining about it for a while now.  Time to get it over with.

What is the problem?  I'm building a full graph (pages linked
together) but I would prefer data to take the form of a tree or
directed acyclic graph.

I.e. to structure most of the data management code as a function (a
compiler) instead of a database.

What does the sweb site actually do?

     - parses ramblings files
     - creates indexes
     - wraps a tex->png/pdf renderer

What are the problems?  The graph/node implementation is too stateful.
Components can't be isolated easily.

Solution: get rid of the graph/node implementation.

Roadmap: start with ramblings parsing, and propagate the graph/node
code dependencies up the module hierarchy.

Starting point: parse-ramblings-fast.ss depends on graph.ss

Concrete strategy: create a module entry.ss that abstracts all the
concrete structure in a ramblings file.

This seems to work: I can separate function and data (parsing and
representation of content) from UI presentation (web page structure).


Entry: Tex rendering
Date: Sun Mar 21 14:06:35 CET 2010

The low-level parsing works.  Now let's make the tex parsing into a
functional compilation step.  Currently the tex formatting uses
self-access of attributes, i.e. OO style.

A tex file is created from a source file and produces: dvi, png, pdf.
There are two complications:

      * dvi is an intermediate for both pdf and png

      * There are multiple png and xhtml files; they need to be
        represented and cross-linked.

      * where to put the laziness and caching (recompile) ?

Laziness can be embedded in the function.  Caching is probably best
handled on a higher level.

The remaining problem is to mesh external references with internal
representations.  This is a problem that's best handled generally
since I can forsee it popping up elsewhere (I've seen it in the past).

So the goal is simple: create a tex->png/pdf renderer independent of
the external access methods:

       LINKER (INDEXER)                  COMPILER (RENDERER)

The compiler needs to be parameterized by an index method.  It
produces html that refers to other html and png files.  Currently this
is handled in a woven way: see `node-query->link'.


Entry: Linking
Date: Sun Mar 21 17:22:58 CET 2010

So, how do you compose a compiler that constructs a graph, and a link
mechanism for that graph?

( Actually, there are 2 problems: external link mechanism + internal
storage management )

More concretely.  The result is a collection XHTML file with holes,
and a collection of leaf nodes (i.e. png images), where the holes
represent <a/> link elements.

Higher order syntax?

Let's unify nodes.  Each object has a unique id (both xhtml and opaque
nodes are represented as type-tagged binary blobs).  The result is then:

      * repr :: id -> (mime, stream)      -- provided by compiler
      * link :: id -> url                 -- provided by linker

Constraint: the urls need to be permanent (don't break external links)
and pretty/meaningful.  Currently the tex->html/png compiler has:

    math/20090619-105812?page=4
    math/20090619-105812?png=4


Entry: Stateful crap!
Date: Sun Mar 21 19:54:21 CET 2010

Hmm.  I can't get it back to work.  Something is wrong with the
permissions or path or whatever.

The latex hangs, probably waiting for input, but when I start it
manually all just works fine.

There is also nothing in the logs.

Ok. it works manually; running ../start as sweb user.  Not through
runit though: all works fine except latex hangs.  How to get at its
stdout?

It seems the problem is that the output is not read.  That's weird.
Something wrong with the logger?  The runit/log/run file had wrong
permissions.  Weird..  Who touched it?


Entry: Lazy vs. Reactive
Date: Tue Mar 23 15:45:20 CET 2010

So.. What do we have?  3 kinds of nodes:

     - lazy evaluation: eval once when needed
     - reactive: re-eval when needed (dependency-based)
     - thunk: re-eval always

The lazy scheme works quite well.  The dependency-based caching
however is a mess.  It currently doesn't compose; it should propagate
requests to its leaves and trigger recompile if necessary.

How to make it composable?  It is a simple issue of making
dependencies explicit.

  ( The main point for sweb is that you can't place a reactive
    component inside a lazy one. )

Functional reactive programming (FRP) usually takes one of 2
approaches:

        - pull: each query propagates to primitive sources and
          recomputes as necessary.

        - push: each event pushes through the dependency network to
          create new high-level events.

Since the web is a pull architecture, I'm not sure if push is so
interesting.  However, if push events are available, one could
re-compute to lower latency on the pull side.

Ideally, at an exit node, you want to know _all_ its dependencies and
check if it needs to be recomputed.  

What you want is for dependency information to travel separately from
the value path.  Specifying it explicitly is tedious.  How to automate
this in Scheme?  

Probably having some dynamic environment to record the evaluation
trace is a good approach.  Can we assume the trace to be static?  That
way it can be computed the first time a full evaluation is made.

Alternatively: can it be done in syntax? (fully static dependency).


[1] http://en.wikipedia.org/wiki/Functional_reactive_programming
[2] http://en.wikipedia.org/wiki/Incremental_computing

Entry: Pull FRP
Date: Fri Mar 26 08:36:58 CET 2010

What about a special form that expands to a dynamic "trigger",
i.e. check type types a t run time and recompute values if
necessary.

Problem: make the up-to-date check 1-shot.  To avoid exponential
explosion re-checking dependencies down the three, in a single
evaluation, only check once for every node + cache result.

This means it should be clear what a "pull" event is: it's a global
concept.

In other terms: a pull, and all its child pulls, happen at a single
instance in time.

Given a reactive network N, and a set of output nodes O, compute the
value of the output nodes by recomputing at most once each node in the
network.

The question seems to be: how to make the input nodes transactional?
I.e. if the pull happens at T0, any changes that happen after T0 do
not affect the value of the output.

The problem is that this information is not completely available;
i.e. the network is not transactional, but there is a definite order
of events (i.e. updates of the file system).

So, let's summarize:

    * SHARING is important: you do not want to update a node multiple
      times, as this can lead to exponential complexity.

    * How do you know if a node is up-to-date?  I.e. an input could
      change during a network computation, and be queried both before
      and after the update.  This suggests you need some kind of
      TRANSACTION or logical time.

I suppose in the `make' utility it is assumed that the inputs do not
change during the evaluation of the network.

So the question seems to be, how can sweb be instrumented to guarantee
glitch-free operation?  In[1] it is suggested to combine

  - push: limited-range invalidation; push invalid only when valid.

  - pull: lazy evaluation.

This requires access to the input events.


[1] http://en.wikipedia.org/wiki/Reactive_programming#Evaluation_models_of_Reactive_Programming


Entry: Reactive programming
Date: Sat Mar 27 17:52:59 CET 2010

So, the code in rp.ss [1] seems complete.  It implements a generic
dataflow network with bi-directional linking (one direction for pull
functional dependencies and the other for push invalidation
propagation).

The main idea to make this fit well embedded in Scheme seems to be: to
use ordinary lexical scope for the functional dependencies, and weak
hashes for the inverse dependencies.

[1] entry://../plt/20100327-160826


Entry: Moving the code to reactive implementation
Date: Sat Mar 27 21:08:35 CET 2010

This means:

     - all functionality should have a purely functional core,
       including the indexing. OK
     
     - reactivity is added to the mix to provide the stateful web app.
       the only state is cache. OK

     - root objects should have file system notification so
       invalidation can propagate. OK

     - abstract url generation and dispatching (tree linking)

     - navigation (cross-linking)


In essence: trees + (lazy) cross-references.  It seems that the itch I
have due is due to the cross- referencing in the current imp.  It's
all about binding; the rest is simple rewriting.


How to do the file system notifications? Use jao's mzfam[1].

So what about indexing?  It is just a function.  Care needs to be
taken to still perform a partial parsing step: parse article headers
and construct a suspended body parse.  The index then depends only on
the headers.

Url generation is also simple.  Storage is a tree of data structures.

Cross linking (navigation: next/prev) is separate.  This can be
delegated to the dispatcher: if it has an index, it can add re-directs
for all symbolic (non-permanent) references.


[1] http://planet.plt-scheme.org/display.ss?package=mzfam.plt&owner=jao


Entry: FAM & Swindle
Date: Sun Mar 28 10:42:11 CEST 2010

Hmm.. mzfam[1] doesn't work on current plt.. (is from 2007).

Ok, replacing (lib "swindle" "swindle.ss") with `swindle' seems to do
the trick.  In fam-base.ss:

(module fam-base swindle

  (provide (all-from swindle swindle/misc))
  (require (lib "async-channel.ss") swindle/misc)

Nope.. then `mappend!' can't be found.. adding the swindle/misc
doesn't seem to help.  I'm confused and tired, for next time..

Ok, this is the mutable lists problem..  Shall I fix it?  Replaced
mappend! -> mappend and sort! -> sort.  Patch sent to jao.  [edit: Jao
fixed it; new version in planet.]


[1] http://planet.plt-scheme.org/display.ss?package=mzfam.plt&owner=jao


Entry: Cross linking
Date: Sun Mar 28 11:59:41 CEST 2010

It seems that most of the core problems are easy to solve for a
tree-based datastructure.  The missing component is cross-linking.
The problem is that this violates hierarchical composition: a network
no longer has a recursive "collection of lower level entities"
semantics; it's a big ball of mud instead.

Can the linking be orchestrated by a single global linker entity?

    linker :: doctree -> graph

That doesn't work as the doctree itself has (open) embedded
crosslinks.  So, essentially we need to view a doctree as an open
term.

    linker :: docenv -> doctree -> graph

So, how to represent open terms?  Two things need to be unified:

 1. the original source has some representation of links
    (i.e. standard http:// links or custom entry:// links) that needs
    to be mapped to the identifiers used by the naming scheme.

 2. identifiers need to be mapped to raw links.

Let's start the lib/link.ss module to implement this behaviour.  The
first modification is in the article body parser.

Following the remark above we need to distinguish the ramblings syntax
from the link mechanism we want to abstract.  Essentially, the
ramblings syntax has two kinds of links: external ones (standard http
or transformed links like isbn or img) and internal cross links.  The
latter ones are important to capture.

It seems this is not really necessary for the ramblings docs.  The
references used are already compatible with a tree + up-dir
representation.  So maybe this is only for indexing?


Side note: see [1] for links about representing graphs in functional
languages.  The first one is a zipper rep.

[1] http://lambda-the-ultimate.org/node/3195


Entry: Cross-linking re-take: ".." introduces graph structure
Date: Mon Mar 29 19:59:16 CEST 2010

It's a strange problem, especially since you don't really notice it
with all these urls floating around in documents.

I.e. why is something like entry://../foo/123 not a good idea?

Can we count ".." as a tree reference, or is it a graph reference?

This is used so much that you really don't think about it.

It's definitely a graph (check i.e ../plt/../plt/../ etc).  It breaks
encapsulation.  I.e. it assumes the current context is part of some
larger context.

So, the conclusion is: compiling to a structure that has ".."
references is possible (i.e. a set of html files).  But, since this is
_already_ a graph structure, it might be wise to abstract also those
kinds of links generally, and fall back on the ".." links for off-line
compilation.


Entry: Scaling (why jango sucks)
Date: Mon Mar 29 22:42:04 CEST 2010

Lots of babble, but this is what seems interesting:

- normalization is good, denormalization sometimes makes queries
  faster.  automate the denorm (cached db views?)

- sessions: stick it in a cookie (you really don't want to keep track
  of user state for big apps)

- "SELECT *" is fast

- JOIN doesn't scale


[1] http://www.youtube.com/watch?v=i6Fr65PFqfk


Entry: Cheap thrills
Date: Thu Apr  8 10:32:44 EDT 2010

Something to kick-start the intrinsic motivation; something to have a
taste of control..  Some practical issues: PLT scheme and server
restarts.


Entry: Representing links
Date: Thu Apr  8 12:58:58 EDT 2010

Is there anything useful to know about graphs in general from my
perspective?  I.e. I don't really care about structure that much, more
about representation and direct connectivity. i.e. neighbours of one
page, not neighbours of neighbours.

It seems that the basic idea is closure.  If a document contains a
link, this can either be pointing into a closable neighborhood, or to
an outside resource.

We'd like to keep track of both: 

  - represent neighborhood references such that wrong links can be
    detected statically.

  - gather open links to enable link checking and queries over links


Let's have a look at this[1] zipper-based rep.  It mentions that in a
functional representation, for every cycle there has to be a point of
decoupling which can be solved either by mutable cells or a finite map
combined with unique identifiers.  I suppose this is true if the graph
can't be constructed in a single recursive `let'.


[1] http://www.cs.tufts.edu/~nr/pubs/zipcfg-abstract.html


Entry: To SQL or not to SQL?
Date: Sun Apr 11 17:07:50 EDT 2010

I have no good intuition about dealing with databases.  Let's look at
the design space when using databases:

    1. query language (SQL, Scheme, ...)
    2. storage medium (memory, disk, server, ...)
    3. update vs. query patterns

Where 1. isn't a real issue (as long as you can express what you want
to know it's fine) and 2. is pure implementation.  However, 3. is
quite a constraint.  

If there are no updates, data representation can be heavily optimized
(compiled) to fit the need of the queries.  (like in ramblings.ss)

If there are updates, consistency becomes a serious problem, and needs
to be solved properly, i.e. the ACID[1] principle, while caches are
usually too complicated to keep up-to-date.  The reason you want to
sqeeze things into a single SQL query is exactly that.

So that's one axis: mutable vs. immutable (where ramblings.ss is
mutable but the representation uses a cache).

Another axis is relational vs. non-relational.  See next post.

[1] http://en.wikipedia.org/wiki/ACID


Entry: NoSQL?
Date: Sun Apr 11 18:04:52 EDT 2010

From [1]:

    NoSQL is a movement promoting a loosely defined class of
    non-relational data stores that break with a long history of
    relational databases. These data stores may not require fixed
    table schemas, usually avoid join operations and typically scale
    horizontally. Academics and papers typically refer to these
    databases as structured storage.

From [2]:

   So a lot of this NoSQL movement can be boiled down to 'avoid
   schemas that require joins'.

It seems there is no real benefit to using NoSQL if there is no
scaling involved, except for the ``sloppyness'' it allows.  The main
issue is, as in all _scalability_ issues, is locality of reference
(stacks: cache recent access & streams: use predictable access).
Joins are not local.


[1] http://en.wikipedia.org/wiki/NoSQL
[2] http://www.reddit.com/r/programming/comments/b7b1c/ask_proggit_why_the_movement_away_from_rdbms/

Entry: Reactive Ramblings
Date: Fri Apr 16 07:12:12 EDT 2010

When does this work?  To re-iterate one of the previous posts: this
approach is about _caching_.

  * If there is _significant computation_ between the model data and
    the query results, and this data can be re-used, caching (lazy
    evaluation) is a valuable option.

  * If in addition the model data changes _infrequently_ the lazy
    evaluation can be combined with a reverse-dependent invalidation
    step to get a _reactive_ program.

Note that when updates are frequent wrt. queries, it probably makes
little sense to keep an intermediate (cached) representation.


Let's move on with the reactive model for the ramblings.txt display.
The basic idea is this: given a set of files as input, and a
functional (data-flow) network that eventually ends in server queries,
construct a servlet that manages this structure.

Separate concerns:

  - abstraction of reactive network 
    (naive pull/push implementation: plt/rp)

  - abstraction of functional dependencies

  - file alteration monitor

  - controller for setup


Problem: what to do with create/delete?  Current assumptions are that
the dependency network is static.  Is there a way to propagate
"not-found" upwards?  Maybe invalidation is enough + handling of
"open" errors by the lazy eval.


Problem: how to encode the 2-step parsing?  I.e. I'm splitting a file
into a collection of atoms, with each atom depending on the original
file.  How to express this dependency?

The problem is that it is never allowed to "unpack" a reactive
variable.  Similar to monads..  So rv-force should be private!

Solution is to indeed 1. only unpack at the toplevel/output which
counts as _outside_ of the reactive network in the same way that file
alteration events are and 2. represent a collection as a finite
function / dictionary.


Entry: Storage
Date: Fri Apr 16 17:17:18 EDT 2010

An advantage of the current representation (functional dictionaries)
is that they can be serialized to disk.  This opens up the possibility
to reduce the memory image and use the file system for bulk node data,
keeping only the connectivity in image.

However, is this data storage really a problem?

Good question.  How much space do these data structures take?
Apparently it's currently dwarfed by the code image.


Entry: Next
Date: Fri Apr 16 17:42:07 EDT 2010

Anyways.. What's next?  Fanout OK, so let's abstract the fold method
which is necessary to build the index.

Maybe it's best to create the accessor and fold at the same time?


Entry: Separating pure functions from reactive computations
Date: Sat Apr 17 13:10:15 EDT 2010

From actual usage it is really ok to just have `rv-app' or `rv-apply'
and build abstractions on top of those.  

Having to abstract the pure part into a single value also helps with
separating the side effects (reactive value nodes) from the pure
computations that connect them.  For one, this allows the pure parts
to be tested and reused independently.

Conclusion: at this point it seems useful to use the reactive value
abstraction to explicitly structure caching and laziness in a
functional program.  The obligation to distinguish strict and reactive
computations isn't necessarily bad, however, it is best to separate
functionality from evaluation strategy and write reactive modules in
terms of strict, pure functional modules that contain all the real
work.


Entry: Next
Date: Sat Apr 17 13:13:37 EDT 2010

- abstract external link structure (higher order syntax).

- implement tex + default xhtml rendering in the reactive approach.

- connect it all and replace current graph-based implementation


Entry: No representation
Date: Sat Apr 17 13:30:19 EDT 2010

Seems there's a conflict.

  1. Things like latex rendering definitely need to be cached.  Can't
     have this be re-computed on every access.  However, it should be
     possible to invalidate the cache also by other means.

  2. Other data might be represented only as computations.  Why not
     simply index the raw ramblings text files and have all formatting
     computed on the fly?


Maybe it's not intuitive enough yet.  Maybe what is needed is a wait
to "point an shoot" nodes that should be cached and have the rest be
lifted automatically.  That would probably require a type system
though..


Entry: Overdesigned
Date: Sun Apr 18 12:50:09 EDT 2010

Anyways...  This is getting a bit overdesigned as is usual for a labor
of love.  The patterns are interesting though.

Caching and representation are important in practice in the sense that
they are not important at all (and code complexity can be minimized,
focussing on solving the main problem) or they are a big problem and
code structure needs to incorporate representation as a cross-cutting
concern that is difficult to isolate and can easily dominate the
structure of the overall solution.

Important conclusions:

  * Concrete intermediate data structures (as opposed to functional
    representation or thunks) are an optimization.  It's probably best
    to keep the design functional and representations abstract.

  * Data indexing can easily be the most complicated step if the
    representation is not well thought-out, leading to a necessity to
    keep an intermediate representation to get decent performance.
   
It might help to tailor the basic representation such that this step
can be performed at "edit time" to provide for a finer incremental
update footprint.

I.e. for the ramblings format: while having a single flat file as
basic representation is easy for editing in a text editor, but a pain
to navigate as the index has to be completely rebuilt on every edit.
The reactive dependencies make this pretty clear.


Entry: Caching for ramblings.ss
Date: Mon Apr 19 09:49:15 EDT 2010

There seem to be only 2 important parts to cache:

  - the initial parse (indexing).  

  - the tex rendering


In diagram form, only 3 kinds of nodes.

       [ TXT ] -> [ ARTICLES ]  -> out
                                -> [ PNG ] -> out

  TXT:      The file node, invalidated by the FAM daemon

  ARTICLES: Indexed (segmented) file.  Can serve raw text to output:
            we don't cache normal xhtml results.

  PNG:      Tex rendering takes some time and produces multiple pages, so
            it's also cached.


This caching strategy should be abstracted in a single module:
rv-ramblings.ss

It looks like implementing this, once it is made explicit like this,
is quite straightforward.  However, in order to get the whole thing to
work, the representation (what _is_ an article?) needs to be fleshed
out.  For simple text it's straightforward (an xhtml document), for
tex rendered it is more than that (a key-value store).

Caching xhtml output does simplify the architecture, that way the
pipeline becomes:

  [ TXT ] -> [ ARTICLES ] -> [ query -> RESPONSE ] 


Entry: Full circle
Date: Tue Apr 20 16:07:32 EDT 2010

So, with the caching structured fleshed out as 2 level: 

    txt -> articles -> (key -> response)

The only remaining problem is to make explicit quat a query is.
Currently that's a bit problematic.  It might be simplest to stick to
a very raw representation of a query, i.e. a dictionary of name.value
pairs, where names are symbols and values are strings.


To obtain an assoc list like this use:

   (url-query (request-uri req))

The `req' can be obtained from dispatch-rules, which will deconstruct
an url and pass the req as the first parameter.

So there's no problem here, a symbolic key (the entry ID) and a
dictionary of optional parameters is all that's necessary for a query.

The problem is memoization though: we need to extract keys to
construct the final (req -> response) memoization, where response is
an rv.


Entry: Finishing the RV formatter
Date: Sun Oct  3 11:49:47 CEST 2010

I broke it off abrubtly half a year ago so I have a bit of trouble to
know what was actually done.  It seems that the formatting didn't work
yet, so let's build that first.

Can the formatting be shared between old and new code?  The file
render-ramblings.ss only uses node query, not storage, so that should
work.

  * abstract a node as a finite function.

  * abstract toc access (prev/next) as function


Entry: Lights go back on
Date: Sun Oct  3 14:07:21 CEST 2010

So I think I start to see the idea again: everything is a function
operating on raw data, and in the toplevel you define the caching
structure by wrapping values into dataflow nodes.


Entry: Weird bug
Date: Sun Oct  3 14:29:07 CEST 2010

Somehow the fam events don't propagate correctly when running inside
the webserver.  Running the module in the snot sandbox works fine.

Maybe it runs multiple instances?

rv-invalidate is executed, but something goes wrong.. maybe it's
struct types?

this is weird..
something behind the scenes that kills state or so..


i tried to use a different value for the #rv-invalid tag but that
doesn't work either..  it really is not set in the value that is
accessed by the fam, and set in the other one, so there have to be two
instances.

let's protect all RVs with a single semaphore: the one that guards the
whole network update.

Ok, that fixes the problem.  Now can I understand why please??


Entry: Hashing intermediates
Date: Sun Oct  3 17:14:37 CEST 2010

So what is cached?

  * Segmentation: text file -> dictionary
  
What needs to be cached?

  * latex + dvipng


This means that there needs to be a hash table of rvs to implement the
cache, and possibly some thread that periodically clears the cache.


Entry: Done except..
Date: Mon Oct  4 13:47:01 CEST 2010

- topics

  should be trivial, copy from other and remove any stateful code.
  maybe addition: allow directory change notifications for reaload of
  topics?

- navigation (prev / next)

  it's simple to add this to the index generation, but i'd like to
  also create nav for combinations of different sources.  maybe it's
  still best to put it in the index then.


Entry: Racket update
Date: Fri Mar 18 12:23:56 EDT 2011

facade.ss: 
- make-response/full
+ response/full

Then also the contract seems to have changed:


Servlet (@ /ramblings.ss/topics) exception:
self-contract violation: expected <can-be-response?>, given: (html (pre " " " " " " " " " " " " " " "[" (a ((href "about\n")) "about") "] " "About the http://zwizwa.be web logs." "\n" " " " " " " " " " " " " " " ...))
  contract on start from 
    /home/tom/pub/darcs/sweb/web-root/./htdocs/./ramblings.ss
  blaming 
    /home/tom/pub/darcs/sweb/web-root/./htdocs/./ramblings.ss
  contract: 
    (-> request? can-be-response?)
  at: /home/tom/pub/darcs/sweb/web-root/./htdocs/./ramblings.ss


Entry: Flat namespace
Date: Tue Aug  9 13:09:32 CEST 2011

I want to do the following:

  - get a sorted list of all article IDs by date.  should work with
    cache: don't update this list if the underlying files didn't
    change.

    this probably can be done by storing the ID list in the contents
    node, next to the index proper.

  - query article using ID only.  this needs an ID -> section map and
    requires all IDs to be unique.


To get at the list I had to:

  - Change `format-index' in entry-index.ss to collect (Index . ID)
    pairs and store them in the index node under IndexIDs.

  - Create a function get-index-ids:

;; Get all Index->ID maps.  (Index is parsed from Date in parse-article)
(define (get-index-ids [sections *sections*])
  (apply append
         (for/list (((section get-article) (in-dict sections)))
           (dict-ref (rv-force (get-article 'contents)) 'IndexIDs))))
    

Now, how to use:

  - Don't put it on the front page.  Use a separate link,
    i.e. "recent" to get a chronological page with entries like:

    20110809-130932 [sweb] Flat namespace

  - Use a redirect for section mismatch.


Now, to get at the article titles without forcing body compile, it's
necessary to fish out that information in the first parse, because a
query for 'Entry on the rv value will trigger a full body compile,
including latex etc.

So, I'm moving the index-gathering step to make-index.


Entry: dotfiles
Date: Fri Dec 14 15:48:41 EST 2012

I'd like to hide some files from the index.  Let's use a dotfile
approach, such that .xxx.txt still maps to xxx but is not included in
the index.

Hmm... doesn't look like a good idea.  Let's just add a banlist.


Entry: How to do style sheet syntax in s-expressions?
Date: Fri May 24 12:18:20 EDT 2013

pre {

  padding: 0 3px 2px;

  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;

  font-size: 12px;

  color: #333333;

  -webkit-border-radius: 3px;

  -moz-border-radius: 3px;

  border-radius: 3px;

}


Entry: Testing raw xhtml
Date: Fri May 24 12:53:03 EDT 2013
Type: xhtml

<div>
<p> Testing raw XHTML </p>
<p> Another paragraph with a bullet list </p>
<ul> 
<li> with </li>
<li> bullets </li>
</ul>
</div>


Entry: Replies
Date: Sun Aug 11 11:03:16 EDT 2013

I like this email-oriented blog:

http://www.acooke.org/cute/WhyandHowW0.html


Entry: sweb is dead
Date: Fri Jan  3 01:37:28 EST 2014

Basically, I would like to add features, but I don't really like the
architecture.  The reactive thingy is cute, but it is not necessary,
and requires too much infrastructure.  This is the age of static block
generators and git deployment ;)

Requirements:
- static generator
- latex -> mathml  http://math.etsu.edu/LaTeXMathML/
- better indexing
- rss / atom 
- comments


Entry: zwizwa.sty not found
Date: Wed Apr 30 19:22:52 EDT 2014

problem was sh -> dash ( after upgrade? )
the TEXINPUTS variable didn't propagate

-> nope

Something is messed up in conjunction with runit..
Currently starting manually from screen


Entry: How does the txt file list update?
Date: Wed Apr 29 09:55:30 EDT 2020

I'm dumping a new set of txt links in web-root/txt, but I dont' see
anything showing up.

I completely forgot how this all works...
First: where are the logs?

The runit log is here:

/var/log/sweb/current

But there is another log.

root@tomweb:/var/log/sweb# ls -l /proc/122/fd
...
lr-x------ 1 sweb sweb 64 Apr 29 16:01 11 -> /home/tom/pub/darcs/sweb/web-root/banlist
l-wx------ 1 sweb sweb 64 Apr 29 16:01 12 -> /home/tom/pub/darcs/sweb/web-root/log
...

Looks like I have two trees:

root@tomweb:/home/tom/pub/darcs/sweb/web-root# readlink -f .
/home/tom/pub/darcs/sweb/web-root
root@tomweb:/home/tom/pub/darcs/sweb/web-root# readlink -f /var/www/zwizwa.be/darcs/sweb/
/var/www/zwizwa.be/darcs/sweb

And it's starting the one in /home/tom

So let's change that with a link.