Musings about using Scheme, PLT, and the zwizwa/lib toolbox. Library contents: * zwizwa/lib/mfile: Scheme objects with a 1-1 correspondence to filesystem objects. Convenient for representing unix utilities that operate on files as scheme functions and garbage-collected objects. * zwizwa/lib/rv: Reactive Values. A simple lazy update, strict invalidate reactive programming abstraction. The lib contains more as zwizwa/lib/x/* but those are considered unstable or private (only used in other zwizwa/* packages). If you find one of those useful, inform me so I can promote it to a stable public API. Entry: Sequences, generators and lazy lists Date: Thu Mar 26 17:02:12 CET 2009 Looking for a standard lazy list implementation that will work well as an intermediate step to transform generators -> sequences. The problem is in these conflicting requirements: * 'make-do-sequence needs to lookahead in the sequence: know if there's an element before the next is generated. This needs a 1-level deep lookahead buffer when translating generators to squences. * generators are easily built using prompt/control but can't know end-of-sequence beforehand. Maybe this is best replaced with the lazy list structure in lazy scheme: it uses delayed constructors, (not delayed tails). The code then becomes: ;; Convert generator to lazy list (define (generator->lazy-list g [done? not]) (let next () (delay (let ((item (g))) (if (done? item) '() (cons item (next))))))) ;; Convert lazy list to sequence. (define (in-lazy-list ll) (make-do-sequence (lambda () (define (ll-car x) (car (force x))) (define (ll-cdr x) (cdr (force x))) (define (ll-more? x) (pair? (force x))) (values ll-car ll-cdr ll ll-more? void void)))) ;; Composition. (define (in-generator g [done? not]) (in-lazy-list (generator->lazy-list g done?))) The deeper conclusion is this: lazy lists (static) are better behaved than generators (dynamic). Entry: Using control/prompt Date: Thu Mar 26 20:46:22 CET 2009 Using prompt and control it's possible to turn a data traversal into an element generator. For example, iterate over all elements in an xml tree, and prodice a sequence of (list path element): ;; Generate all elements + (reverse) path. (define (element-generator top-e) (define (next) (let down ((e top-e) (p '())) (control k (set! next k) (list p e)) (for ((e (element-content e))) (when (element? e) (down e (cons (element-name e) p)))) #f)) (lambda () (prompt (next)))) Now I wonder, can we create a sequence directly without mutation and going from generator -> lazy list? The (element . k) pair can probably be instantiated as a data item. ;; A glist is either a pair of (element . generator-thunk) or '() ;; The thunk generates the tail of the list. (define (element->glist top-e) (prompt (let down ((e top-e) (p '())) (control k (cons (list p e) (lambda () (prompt (k))))) (for ((e (element-content e))) (when (element? e) (down e (cons (element-name e) p)))) '()))) (define glist-null? null) (define glist-car car) (define (glist-cdr gl) ((cdr gl))) This can be combined with memoization by wrapping the continuation in a promise instead of a thunk. ;; Directly convert a traversal program into a lazy list, a memoized ;; version of the glist above. (define (element->ll top-e) (prompt (let down ((e top-e) (p '())) (control k (delay (cons (list p e) (prompt (k))))) (for ((e (element-content e))) (when (element? e) (down e (cons (element-name e) p))))) (delay '()))) Abstracted with macros this becomes: (define-syntax-rule (ll-yield x) (control k (delay (cons x (prompt (k)))))) (define-syntax-rule (ll-begin . body) (prompt (begin . body) (delay '()))) (define (element->ll top-e) (ll-begin (let down ((e top-e) (p '())) (ll-yield (list p e)) (for ((e (element-content e))) (when (element? e) (down e (cons (element-name e) p))))))) This allows construction of lazy lists from lazy list comprehensions, giving a simple composition mechanism: (ll-begin (for ((x (in-lazy-list other-ll))) (ll-yield (+ 1 x)))) Entry: Zipper Date: Fri Mar 27 12:16:42 CET 2009 Using lazy lists as in the previous example only allows the consumption of a data structure. How can we use this to update/create new trees? Starting from a map function instead of a for-each functions, we create the zipper data structure. http://okmij.org/ftp/Scheme/zipper-in-scheme.txt (define-struct zipper (element yield)) (define (collection->zipper map collection) (reset (map (lambda (el) (shift k (make-zipper el k))) collection))) So, is it fair to say a zipper is a symmetric (bi-directional) coroutine and a lazy list an assymetric (uni-directional) coroutine? At the traversal side, the other side is represented by the mapped procedure, while on the zipper side the iterator is represented by the delimited continuation. Entry: Union types Date: Mon Mar 30 10:05:36 CEST 2009 Are union types possible in PLT scheme outside of typed scheme? No: elements of a union are types in themselves, and exist independently of the union. Entry: Structs Date: Tue Mar 31 12:40:52 CEST 2009 Is it possible to create a derived struct given a prototype? (define-struct a (x)) (define-struct (b a) (y)) can we make an instance of struct b without knowing a? This probably needs delegation (a 'super field) instead of subclassing. Entry: Visitor Date: Sun Apr 12 16:27:15 CEST 2009 http://calculist.blogspot.com/2008/05/functional-visitor-pattern.html This simple traversal language instructs the fold either to continue (Cont), to skip all descendants of the current node (Skip), or to abort the traversal entirely (Stop). Entry: Generator end of sequence Date: Fri Apr 17 10:04:50 CEST 2009 Generators need a unique value to indicate end-of-stream. #f is convenient but probably not a good idea. Entry: Multiple values and sequences Date: Fri Apr 17 11:09:46 CEST 2009 For lazy lists, it's not possible (by their definition: linked cons cells contain only a single value in the CAR position). PLT's sequences (which are generators) do support them. As mentioned before: the problem with multiple values is channeling the end-of-sequence value and the multiple values through the same channel. The scheme's sequence api has different functions for these. The simplest solution seems to be to: - stick with lazy lists as basic abstraction. - use an "unpacking" iterator: (in-lazy-list ll values) I'm adding a form "in-producer" which couples a multiple values generating body in terms of (yield . args) with a multiple values producing sequence. Entry: Zipper and sequences Date: Fri Apr 17 11:25:34 CEST 2009 Sequences are uni-directional, zipper is bi-directional. However, is there a way to unify both in the same construct? Something like for/list or for/fold? This is not the right question. The right question is: how to relate processes that communicate through channels with zippers and lazy data structures. Entry: Using fold vs. explicit tail recursion Date: Sun Apr 19 00:49:38 CEST 2009 I often find myself writing tail recursive loops, accumulating results in (multiple) stacks, then reversing them in the end. Is this pattern necessary? I guess what I'm looking for is an "unpacked" fold, where individually named state objects get passed during iteration. This makes sense for a left fold. Does it also work for a right fold? Entry: Hygiene: syntax-case & introduced identifiers Date: Fri Apr 24 09:27:24 CEST 2009 A good thing about syntax-case is that it can introduce identifiers whenever a quoted syntax object does not refer to a pattern variable. However, if such a symbol is is a reference position, you will find out only at expansion time of the macro that the identifier is not defined. Why is this? Why can't this be known at macro compile time (i.e. the for-template includes should have it, no?) Is this because the compiler can't know whether the identifier is in a binding position (in which case it is legal) or a reference position (in which case it needs to be defined in the context in which the macro is expanded) ? Entry: Imperative struct update Date: Sun Apr 26 13:14:18 CEST 2009 I've been looking for this for a while.. Using introspection it should be possible to create a macro that updates a struct functionally, but with field accessors.. I think it's in machine.ss Entry: Immutable cyclic structures Date: Sat May 2 23:15:35 CEST 2009 self-refence with immutable structures is only possible with lazyness. however, PLT scheme contains the 'shared form, much like 'let but for data structure bindings. as long as references are guarded by list/struct constructors, this form can create cyclic structures. (shared ((ones (cons 1 ones))) ones) This works with the (immutable) native data types and mutable structs. Entry: Immutable hash tables Date: Tue May 5 10:22:50 CEST 2009 From the PLT Scheme reference manual: 3.13 Hash Tables "A hash table (or simply hash) maps each of its keys to a single value. For a given hash table, keys are equivalent via equal?, eqv?, or eq?, and keys are retained either strongly or weakly (see Weak Boxes). A hash table is also either mutable or immutable; immutable tables support constant-time functional update. I didn't know about the immutable constant-time functional update. That's pretty cool! I wonder how it is implemented. Red-black trees [1] [2]. [1] http://groups.google.com/group/plt-scheme/browse_thread/thread/d6da66d2bb3855e6/445f23588930fc90?lnk=raot [2] http://en.wikipedia.org/wiki/Red-black_tree Entry: Code vs. Data (abstract or concrete instructions?) Date: Thu Jun 11 12:21:17 CEST 2009 In Scheme, I am tempted to use code instead of data as the result of functions. This is akin to an actor[4] style of programming. Instead of letting functions return data that needs to be interpreted by the caller, have them return behaviour: abstract entities that only leave the freedom of execution time, not freedom of interpretation. This can lead to some very concise code, but when used over-zealously it can complicate matters. Sometimes intermediate data stucutures that require (centralized, explicit) interpretation are easier to understand than objects that delegate functionality behind the scenes. This seems to be about two conflicting types of abstraction: message passing (GOTO with arguments/continuations), vs static descriptions = concrete representation based approach. This popped up in the construction of a dataflow language (DFL) compiler, which transforms a parallel description of nodes and processes into a serialized runnable Scheme form. I was throwing abstract (lambda () ...) all over the place until I realized that maybe an abstract data structure to be interpreted later was a simpler approach. Often however, the abstract approach is better as it allows to specify some functionality locally to where the decision needs to be made, instead of in a central place. Looks like this is very much related to the "dependency injection" principle. If you _can_ do it concretely, please do as this is often easier to understand. However, sometimes things need to be abstracted out and decided elsewhere. Late binding can be useful (and when used as a central organizing principle[5] it can be quite powerful). But I'd like to say in general: don't make it too flexible! Note that abstraction can be a (resource use) optimization. I.e. in Forth, explicit data structures are difficult to handle due to lack of local namespace support and lack of automatic memory management. Implementing those would be a prohibitive complexity explosion going agains the grain of the "simplicity" module. Representing things as behaviour instead of data structures is usually a valuable and efficient option. In Forth, memory is usually best managed where it is used, which means stacks or dynamic variables (which are essentially stacks). This management is then easily hidden behind a procedure. This is "don't set a flag, set the data" from Leo Brodie's "Thinking Forth"[1]. Or is this quote from Dijkstra[2][6]? I can't seem to find the original reference. This is related to Chuck Moore's early binding[3] philosophy. Aiming for minimalism, the best strategy is to not leave things open for (re)interpretation. Minimalism makes rewriting easier: so you aim for full re-implementation instead of trying to anticipate future behavioural modifications. [1] http://prdownloads.sourceforge.net/thinking-forth/thinking-forth-color.pdf?download [2] http://dereksorensen.com/?p=45 [3] http://www.ultratechnology.com/forththoughts.htm [4] http://en.wikipedia.org/wiki/Actor_model [5] http://en.wikipedia.org/wiki/Smalltalk [6] http://c2.com/cgi/wiki?EwDijkstraQuotes Entry: Abstract interpretation ('eval in macros) Date: Thu Jun 11 13:51:03 CEST 2009 In Scheme low-level macro programming, most of the time 'syntax->datum or 'syntax-case will be enough to allow for compile-time computations necessary to construct output syntax. However, sometimes you need to actually "bring the input syntax to life" once before deciding how it will be transformed finally. In general this is called abstract interpretation [1]. When this interpretation process is similar to interpreting Scheme (i.e. it includes dealing with lexical variables to construct graphs) it can be easier to just invoke 'eval directly, and encode the desired alternative semantics as macros. The dataflow pattern of such a syntax transformer is: scheme code input syntax -------------> modified syntax eval modified syntax -------------> value scheme code input syntax + value -------------> transformed syntax In PLT Scheme this kind of trickery can be performed by using (anchored) namespace objects passed to 'eval. Namespace objects map identifiers to semantics. In normal macro programming you don't need to use explicit namespace objects since everything has only one meaning. When you want _alternative_ interpretations, you need to either perform interpretation manually (i.e. using 'syntax-case or 'syntax->datum) or use the Scheme interpreter as a whole by invoking 'eval in a specically constructed namespace. This problem popped up when trying to compile a parallel program in a dataflow language (DFL) to a staticly scheduled sequential Scheme program. In order to construct a properly sequenced code list, the dataflow graph needs to be constructed and analysed for dependencies. However, I implemented graph building on top of Scheme's lexical scoping mechanism using macros, which means that the graph data structure itself is only available _after_ expansion. This allows me to avoid having to construct the graph manually. However, building a sequential program from this requires a two pass algorithm: build the graph and interpret it abstractly to find a serialized code sequence, then compile everything to plain scheme in the right order. The first pass can then be performed using 'eval on an alternative interpretation which yields the proper code sequence. This can then be used together with the original syntax to construct a compilation to sequential Scheme code. In Staapl I did run into this problem a couple of times before. One other case is the automatic wrapping of Scheme functions, which needs information about their arity which is only available after executing the functions. Another occurance happened when trying to compile the dependency graph of the cyclic forth bootstrapper, which involved several alternative interpretations of the code: as code running on a threaded target interpreter, and as code running inside the host compiler on a different VM. [1] http://en.wikipedia.org/wiki/Abstract_interpretation Entry: Unix utilities Date: Mon Jun 15 17:10:52 CEST 2009 I'm looking for a way to easily abstract unix unitilities as Scheme functions, in a way that abstracts the use of the filesystem. The idea is simple: * keep a 1-1 map of objects managed in the Scheme memory model, and data stored on-disk. * provide a mechanism for external programs to access the files: external programs are modeled as "directory transformers". * tie into the Scheme GC using 'register-finalizer * add a storage size monitor to trigger global GC. The way to do this is to manage all files in temporary storage, and move files around into temporary directories that are the "context" of the external programs. In general, files are owned by Scheme, and essentially invisible to the external world, until they are made available during a "procedure call". Note that this mechanism doesn't work for files that are accessible from the outside. (However, filesystem reference counts / hard links can be used for this). Since files are now tied into the global garbage collection mechanism, they are subject to long life if collection doesn't get triggered. Therefore on each operation we might want to keep track of the size of the files, and trigger a collection whenever a preset limit is reached. Entry: Mfile cleanup Date: Tue Jun 16 11:06:25 CEST 2009 - mfile is now abstract. it supports conversion into input/output port and can be passed to external programs. - mdir is a hash table of mfiles - i've included latex.ss as an example. this code is factored into 2 parts: command wrappers that call format/system (fsystem) and filesystem wrappers that use these together with mfile and mdir abstractions - with-mdir now looks like a let-expression, which resembles its semantics. however it still returns a directory object (external programs are mdir->mdir maps) which then can be queried using mdir-refs (define (tex->dvi tf) (mdir-refs (with-mdir `(("doc.tex" . ,tf)) (lambda () (latex "doc.tex"))) "doc.dvi")) - added mfile->bytes and bytes->mfile Entry: Weak References Date: Thu Jun 18 13:56:58 CEST 2009 Next: add weak references[1] to accelerate mfile->bytes. [1] http://docs.plt-scheme.org/reference/weakbox.html [2] http://docs.plt-scheme.org/reference/ephemerons.html Entry: Apache logfile parsing Date: Sun Jul 12 11:19:31 CEST 2009 After figuring out that minimal regexp matching is a lot faster than what I had before, it's time to work on incremental updates. I suppose the regexp problem was due to the date matcher causing excessive backtracking. I used "(.*)/(.*)/(.*):(.*):(.*):(.*)" instead of "(.*?)/(.*?)/(.*?):(.*?):(.*?):(.*?)". In "share.ss" allow for the dictionaries to be passed in so they can be updated. Entry: About 'eval at expansion time Date: Sun Jul 19 09:39:53 CEST 2009 (to plt-scheme@list.cs.brown.edu) Dear list, I've read a number of posts on this list about replacing the need for 'eval by appropriate use of macros. Overall, I've found the way macros are used in PLT Scheme fascinating, and it has been a good guide to clarify some practical problems I've been struggling with in the past. However, recently I've run into a pattern that confuses me. It has to do with the observation that 'eval seems to sometimes be the simplest way to turn source code into an informative data structure at macro expansion time, such that this can then be used as extra input to a syntax transformation. (Abstract Interpretation) While this feels like a necessary use since 'eval actually does something that can only be mimicked by implementing something like 'eval, it also feels like a dirty hack in a way I can't really describe. Concretely, this is about a macro that compiles a dataflow language to Scheme. The input syntax is interpreted twice: once to construct a simplified interpretation as a dependency graph, which is then used to sort a list of subforms in the original syntax so it can be expanded a second time into a serial Scheme program. Concrete code for the two stages can be found here [1][2]. I use 'eval together with namespace anchors to implement this. I guess I'm looking for a name for this pattern or a reason to not use 'eval. It seems that any alternative would involve higher order macros in a way I don't quite grasp.. Ideas welcome, Tom [1] http://zwizwa.be/darcs/staapl/staapl/machine/dfl.ss [2] http://zwizwa.be/darcs/staapl/staapl/machine/dfl-compose.ss Entry: Re: About 'eval at expansion time Date: Sun Jul 19 14:28:13 CEST 2009 Reply from Matthew Flatt[1]. Also, see at the end of this post. The trick is to expand to `let-syntax'. I feel I understand the basic idea, but I have to think a bit more about this though. I read the reply, slept over it, now I'm going to try to explain it. The macro that does the magic is: (define-syntax (show stx) (syntax-case stx () [(_ e) #'(let-syntax ([exp (lambda (stx) (with-syntax ([v e]) ;; (1) #`(printf "~s yields ~s\n" 'e 'v)))]) ;; (2) (exp))])) The `let-syntax' form[2] is quite straightforward. It binds an identifier to syntax. The `with-syntax' form[3] is like `syntax-case' but matches all patterns. No magic there. The magic happens by using `e' in two different positions. One that makes it verbatim into the eventual expansion of the macro (2) and one that will be evaluated at expansion time (1). Or: (1) is behind one `syntax' and (2) is behind two, where one level of syntax quoting is removed by the expansion, which serves the role of `eval' at expansion time of `show'. A neat trick indeed! [1] http://list.cs.brown.edu/pipermail/plt-scheme/2009-July/034623.html [2] http://docs.plt-scheme.org/reference/let.html#%28form._%28%28lib._scheme/private/letstx-scheme..ss%29._let-syntax%29%29 [3] http://docs.plt-scheme.org/reference/stx-patterns.html#%28form._%28%28lib._scheme/private/stxcase-scheme..ss%29._with-syntax%29%29 ====================================================================== Matthew's full reply: If I understand, then the following little program (in two modules) is analogous to yours. It defines a `show' form that takes an expression `e' to evaluate at compile time, and it prints the quoted form of `e' followed by is compile-time result. The compile-time form can use a `plus' binding, which stands for any sort of binding that you might like to use during compile-time evaluation: ---------------------------------------- ;; "arith-ct.ss" #lang scheme/base (define (plus a b) (+ a b)) (define-namespace-anchor a) (define (show* e) (eval e (namespace-anchor->namespace a))) (provide show*) ;; use-arith.ss: #lang scheme (require (for-syntax "arith-ct.ss")) (define-syntax (show stx) (syntax-case stx () [(_ e) (with-syntax ([v (show* #'e)]) #`(printf "~s yields ~s\n" 'e 'v))])) (show (plus 1 2)) ;; expands to (printf "~s yields ~s\n" '(plus 1 2) '3) ---------------------------------------- Here's how I'd implement the `show' form, instead: ---------------------------------------- ;; "arith.ss" #lang scheme (define-for-syntax (plus a b) (+ a b)) (define-syntax (show stx) (syntax-case stx () [(_ e) #'(let-syntax ([exp (lambda (stx) (with-syntax ([v e]) #`(printf "~s yields ~s\n" 'e 'v)))]) (exp))])) (show (plus 1 2)) ;; expands to (printf "~s yields ~s\n" '(plus 1 2) '3) ;; To use `show' outside this module, we need to ;; export `for-syntax' any bindings intended ;; to be used within `show' expressions: (provide show (for-syntax plus)) ---------------------------------------- This second implementation expands (show e) to (let-syntax ([(exp) .... e ....]) (exp)) where `e' is in an expression position in the right-hand size of the `let-syntax', so it gets evaluated at expansion time. The main trick is to invent a local name (in this case `exp') to house the expand-time expression and trigger its evaluation in an expression position. As you suggest, this approach requires a macro-generating `show', which is in some sense a "higher order macro". For a problem like this, I'd avoid `eval' because the other approach composes better. Consider a module that imports `show', but also adds its own compile-time extension `times': #lang scheme (require "arith.ss") (define-for-syntax (times a b) (* a b)) (show (plus 1 (times 2 3))) This wouldn't work if "arith.ss" were replaced by "use-arith.ss", because the `eval' in "arith-ct.ss" wouldn't know where to find the module that has the `times' binding. The problem is that module is in the process of being compiled, and so it hasn't been registered for reflective operations like `eval'. Your example doesn't seem to involve any bindings like `plus' or `times', and, offhand, I can't think of another concrete reason that "arith.ss" is better than "arith-ct.ss" plus "use-arith.ss". Less concretely, though, the reflection in the latter seems to me more difficult to reason about. (As it turns out, I correctly predicted that the `times' example would not work with "use-arith.ss", but I mispredicted the specific reason.) I hope I've understood your actual problem well enough that the above examples are relevant. Matthew Entry: Double fork Date: Mon Jul 20 14:06:51 CEST 2009 The idea is this: whenever you spawn a process from mzscheme using the ``subprocess'' function[1], and the child process dies, a zombie process is created. A zombie is essentially a tiny wrapper around the process' exit() value[2]. Until the mzscheme process calls `subprocess-wait' to collect this value, the zombie sticks around. The way this is usually solved in a unix environment is to make sure that the parent of the fork() call dies _before_ the child. In that case the child is inherited by the ``init'' process, which will collect its return value when it dies, preventing it to become a zombie. In most cases however you want the spawning process to stay alive. To solve this one can use a ``double fork''. An intermediate process (i.e. a `spawner') is created which spawns the child process and dies so the spawner's parent can collect the spawners error value. [mzscheme]--fork()-+-------wait()-----------------+---------> | | +--cleanup()--fork()-+--exit()-+ | +--exec()-[child]---> Currently it is: [mzscheme]--fork()-+----------------------------------------> | +--cleanup()------------exec()-[child]---> The code seems to be in plt/src/mzscheme/src/port.c Original thread[3]. EDIT: problem is now solved by handling SIGCHLD [1] http://docs.plt-scheme.org/reference/subprocess.html#%28def._%28%28quote._~23~25kernel%29._subprocess%29%29 [2] http://en.wikipedia.org/wiki/Zombie_process [3] http://list.cs.brown.edu/pipermail/plt-scheme/2008-July/025614.html Entry: Avoiding macros with 'eval Date: Wed Jul 22 09:17:34 CEST 2009 Let's abstract the pattern in [1] since it seems quite useful in general. What I'm interested in is to trigger evaluation of input syntax to guide expansion. The basic form that does what I need is: (define-syntax (foo stx) (syntax-case stx () ((_ e) #`(let-syntax ((m (lambda (stx) (let ((v e)) ;; (1) (if (zero? v) #'zero #`e))))) ;; (2) (m))))) The `foo' form expands to a form that defines a local macro using `let-syntax' and triggers its expansion. This macro has the original syntax object, represented by the `e' syntax variable, occuring in two positions: (1) in an expression position valid at the macro's run time, and (2) in a quoted form that will make it into the output syntax of that macro. The net result is that both a syntax object and its evaluated form are available to a macro body. This can be captured in a `let-staged' form: (define-syntax (foo stx) (define-syntax-rule (let-staged ((n v) ...) body ...) #`(let-syntax ((m (lambda (stx) (let ((n v) ...) body ...)))) (m))) (syntax-case stx () ((_ e) (let-staged ((e-eval e)) (if (zero? e-eval) #'zero #`e))))) Basicly, `let-staged' behaves as `let' with the exception that the `v' forms are made up of syntax that will be evaluated. This is useful because these forms can refer to syntax variables in the enclosing `syntax-case' context. [1] entry://20090719-142813 Entry: Phase namespace problems Date: Wed Jul 22 11:21:27 CEST 2009 It's about time I understand these things: /home/tom/staapl/staapl/tools/stx.ss:125:5: compile: unbound identifier (and no #%app syntax transformer is bound) at: let-syntax in: ... There are two sides of the coin: the _importer_ needs to make sure that the identifiers it wants to use at different phases are loaded correctly. The _exporter_ needs to make sure that all identifiers referenced at all stages are available at the definition site. What I forgot was a (require (for-template scheme/base)) at the definition site, which made the `let-syntax' identifier unknown. TODO: find a set of rules to debug such statements. Look here[1] at the for-meta form. Turn this into a question: why is a "(require (for-template scheme/base))" necessary if the use point has the scheme/base already imported? [1] http://docs.plt-scheme.org/reference/require.html#%28form._%28%28lib._scheme/base..ss%29._require%29%29 Entry: Parsing mpeg4 files Date: Thu Jul 30 13:02:20 CEST 2009 see entry://../davinci/20090730-105349 Entry: Trees and paths Date: Fri Jul 31 12:23:28 CEST 2009 I'm looking for a nice abstraction that combines tree structures (filesystems, XML, quicktime/mpeg4, ...) with zipper-like iteration. Frankly I'm confused again by slight nuances between different tree forms. The operation I want is mostly path access (single item) or path globbing (list). [1] http://en.wikipedia.org/wiki/Xpath [2] http://en.wikipedia.org/wiki/XQuery [3] http://en.wikipedia.org/wiki/XSLT Entry: Direct staging vs. the 'eval trick Date: Fri Sep 4 12:46:50 CEST 2009 In [1] and related posts I described a trick to express the following code staging flow: stx -+-> value ---\ | | | v \--------> modified-stx I.e: the original syntax `stx' is first interpreted to yield `value', which is then combined with the original `stx' to yield `modified-stx'. The reason for doing this was to be able to use Scheme's binding operations to create a network (the value) which can then be used to process `stx'. (For the data flow language DFL). In the staged constraint propagation language (CPL) I'm taking a more traditional direct staging approach: stx ---> modified-stx by interpreting `stx' manually: the network is builtup sequentially from a set of nodes and a set of rules. I wonder if there is some more fundamental reason i ``shouldn't'' use the former approach. One part of me says the former approach is just a way to implement the latter (the cross-stage mixing performed using a ``hygienic eval'' implemented with a staged macro doesn't have any side-effects). Anyways.. Unresolved but probably not important. I have the feeling this is quite an arbitrary artifact that is a consequence of implementing some functionality as macros, and other as functions operating on syntax.. [1] entry://20090722-091734 Entry: Pure hash tables Date: Sun Sep 6 10:36:12 CEST 2009 I need an updatable (pure) finite function for identifier -> value mapping to make it easier to add backtracking to a constraint propagation algorithm. I'm looking for `hash-update' described in the PLT reference manual[1]. [1] http://docs.plt-scheme.org/reference/hashtables.html#%28def._%28%28lib._scheme/private/more-scheme..ss%29._hash-update%29%29 Entry: Param -> pure functions using delimited continuations Date: Sun Sep 6 12:07:52 CEST 2009 For the general principle see [1]. This post is more concerned with finding a nice abstraction. Leaving out some steps I arrive at: (define-syntax-rule (shift/parameters parameters k . expr) (let ((ps (list . parameters))) (apply (lambda (x . pvs) (for ((p ps) (pv pvs)) (p pv)) x) (let ((pvs (for/list ((p ps)) (p)))) (shift _k (let ((k (lambda (x) (_k (cons x pvs))))) expr)))))) This saves all parameters from dynamic to lexical context before continuation capture and restores them at invokation. [1] entry://../compsci/20090906-105507 Entry: Prefab Structures Date: Sat Sep 19 12:28:21 CEST 2009 Apparently it's possible to use structures without defining them beforehand, using the "#s(" syntax. Entry: dynamic-data and caches Date: Tue Sep 29 10:30:19 CEST 2009 There is a need for aggregation of cached objects, whch requires a logical composition of the `dirty?' flag. Entry: Hash tables with custom equality test Date: Thu Oct 8 17:30:18 CEST 2009 Q: Why do hash tables in PLT scheme do not have a parameterizable equality operation? Answer: there is a `make-custom-hash'[1] function. However, the hashing operation depends on the equality operation, so this needs more than just an equality op. Q: Is there a hash in terms of `bound-identifier=?' ? Entry: Naive Reactive Program Date: Sat Mar 27 16:08:26 CET 2010 I want to implement a reactive programming system to serve as a website cache. I.e. `make' combined with input events. - PUSH: do not propagate invalid inputs past invalid nodes; a network invariant is that - PULL: only compute what is necessary Network invariant: a valid node can never depend on an invalid one. The PULL part is easy: it's essentially a lazy value (delay/force). The PUSH part is more difficult in practice. There are 2 problems: - reverse the linking to enable the PUSH propagation - unlink when a value gets collected (finalizers). Setting up the dependencies in the first place seems to be the hardest problem. Let's focus on that first. It seems it's best to solve registration at node creation time. A node is a computation that is parameterized by a set of nodes. Let's curry the procedure so it deals with all non-node (constant) arguments. The form then becomes. (rp-app fn n1 n2 ...) This takes a strict function fn and a number of nodes. The nodes are evaluated and the struct function is applied to it. Due to lexical scope, a node does not need an explicit store of its parents: this is available in the rp-app_ function. For the rp-register-child! function we need weak references. It seems a weak hash table is the easiest way to implement this. Apparently finalizers aren't necessary: the weak hash table automatically removes references that are no longer valid. Entry: RP: race conditions Date: Sat Mar 27 18:00:37 CET 2010 So, what happens when a pull and a push interfere? Or can't they? It seems they can; i.e. invalidation can take a ``different branch'' compared to a parallel evaluation step. Suppose the computation needs to evaluate: a + b which both depend on a common value c which is about to be invalidated, but the invalidation is interleaved with the evaluation like this: force: cache a x inval: clear a inval: clear b force: compute b x then the value a, b seen by the force correspond to different histories, leading to possibly inconsistent data. Another problem is that concurrent evaluations might interfere. I.e. a two threads might start re-evaluations in parallel. The same goes for concurrent invalidation. How to solve these problems? It seems like synchronization on both sides separately isn't much of a problem: node locking should be enough. However, for push/pull sync I don't see how to do this except for locking the whole network. Entry: Resist the urge to use lazy lists Date: Sun Apr 11 13:12:22 EDT 2010 For day-to-day small dataset conversions, it is probably best to use strict lists in Scheme. Lazy lists need some machinery to work well in Scheme; and it complicates programs. So, note to self: don't use streams unless the data sets are really large. PLT scheme's list comprehensions are probably simpler to use. Entry: FAM and finalizers Date: Fri Apr 16 10:54:49 EDT 2010 Problem: I want files wrapped in reactive objects that are GC-able. Problem: The FAM keeps a reference to all files monitored, but there is no direct correspondence from wrapped reactive object to filename, i.e. two distinct ROs can point to the same filename. Looks like this needs a level of indirection + resource management. Solution: - keep a *paths* function that maps filenames to a set of reactive objects. the set of ROs is a weak hash. Entry: Finalizers are bad Date: Fri Apr 16 12:03:05 EDT 2010 Why? Because they are hard to get right. However, it is possible to build correct (low-level) abstractions on top of finalized objects, in case some layer of management is needed. In general however, it seems that this only works well if the resource that is managed really behaves like memory; meaning it is not too scarse. Otherwise an explicit link is necessary from the depletion event to `collect-garbage'. Entry: Files and GC Date: Fri Apr 16 12:23:05 EDT 2010 With rv-file.ss there is now a nice collection of file interfaces that make it possible to integrate functional reactive scheme code into a stateful unix world. rv.ss reactive values & computations rv-file.ss monitored files as RVs mfile.ss managed tempfiles as scheme values These should now completely replace the dynamic.ss abstraction. Let's start changing sweb. Best approach seems to be to do this in parallel. Entry: Should rv-force be private? Date: Fri Apr 16 15:33:04 EDT 2010 I.e. it is not allowed to ever "unpack" a reactive value, except at the top-level = exit point / end point of the DAG. I'm not sure how to express that yet, but what I do know is that an unfold of an RV should be a list of RVs. Reactive lists? What I'm really looking for is a way to construct a collection of computations that all depend on a single value: parse a file into a list of articles, but make sure that when the file gets updated the list gets re-computed. The list can never be represented concretely (it's size is not known). The best way to access it is abstractly through index functions. This is an interesting problem. Let's make it more concrete: how to abstract a list of reactive values by a finite function that performs a lookup. The index will be rebuilt whenever the file changes. Additionally, each result depends on the input. It seems simplest to abstract reactive collections as dictionaries (finite functions). (define (rv-lookup rv-dict) (lambda (key) (let ((node (rv-delay (dict-ref (rv-force rv-dict) key)))) (rv-register-child! rv-dict node) node))) One problem: the result of a lookup is a freshly computed rv. Maybe it's best to add caching here too, so the rvs are shared. This needs a memo-function abstraction. Seems to work, but how does this extra persistence behave in the face of errors? RV's that where once defined but now trigger errors will not be collected. I.e. this doesn't work for transient data. Entry: Reactive Dataflow & GC Date: Sun Apr 18 20:21:43 EDT 2010 Is it possible to have intermediate values disappear when all their dependencies have read the value? This is PDP! Funny.. Went full circle there ;) How do the computation thunks reference the reactive values? (define (rv-apply fn parents) (let ((node (rv-delay (apply fn (map rv-force parents))))) (for ((p parents)) (rv-register-child! p node)) node)) Esentially through `parents' in the `rv-apply' function. The thunk is the expression in the `rv-delay' form. The right question seems to be: Is it possible to let go of the value reference to the parent nodes value slot, but keep track of the node references? I.e. keep the connectivity alive, but only keep values as long as they are needed. A possible answer could be to keep a semaphore as a reference count: every time a node is created, initialize the semaphore to the number of clients the node has. On each read of the node through an rv-apply, decrement the semaphore. When zero, delete the node. It feels a bit ad-hoc to do that though.. Also I can't really see whether that introduces problems elsewhere.. Edit: it's indeed not that simple. Supoose we have a value y = a + b, and a gets invalidated, invalidating y. When y gets pulled, a is recomputed, but b is pulled from the cache. Point being: even if there is only a single dependency b -> y, the value of b, it might still be useful to keep b around after y is computed. Entry: Experience with using the cache abstraction (rv.ss) Date: Tue Apr 20 15:04:26 EDT 2010 It seems to work quite well, allowing this workflow: 1 focus on solving the problem using struct, pure functions without worrying about storage. i.e. build a vocabulary. 2 identify nodes in the computation graph that would benefit from cached/lazy operation. 3 write functions that relate those nodes. 4 lift the latter functions into the reactive value domain using `rv-app'. 5 use `rv-force' at the toplevel to pull values from the network. What I miss though after initial Haskell brainwash (how quickly the world owes me something..), is a type system that tells me whether a computation is pure or not. I.e. you can't see what's behind an # without applying `rv-force'. The idea of "lifting" pure computations into the effectful world is a very powerful one. A type system that can understand the differences between pure and lifted values gives some pretty direct feedback while manipulating code. Entry: Syntax for cached values Date: Tue Apr 20 15:19:38 EDT 2010 Something like: (lambda/rv (a b (rv: c) d (rv: e) ...) which then would translate to: (lambda (a b c d e) (rv-apply (lambda (c e) ...) (list c e))) Useful for other kinds of value lifting. Entry: How to collaborate on a PLaneT package? Date: Sat Oct 2 17:36:41 CEST 2010 I'd like to see if I can help out Dave to make some fixes to his c.plt package. He's been so kind as to create an archive at github: git clone git://github.com/dherman/c.plt So how can I test this? Essentially you need to install a local version. I use this in Staapl Makefile[1]. Modified a bit but seems to work now. Is there a way to do this scaffolding cleaner in scheme? Anyways, seems Dave did fix some bugs but didn't release them yet. This doesn't fail any longer: typedef int (*fn) (int*); Nope that's fixed in (3 2). The following isn't: void foo(void *x, void *y) { int i = foo(x, y); } void bar(void *x, void *y) { int i = bar(x, y); } So, let's see, what do I need: - Parse files in libprim - Pretty-printing compilable C syntax [1] http://zwizwa.be/darcs/staapl/Makefile.planet Entry: Release of zwizwa/plt lib? Date: Sat Oct 2 20:12:40 CEST 2010 Maybe it's time to publish (I'm going to need it for c.plt). Let's see what needs to be done first. * name: zwizwa/plt is not a good name. renaming to zwizwa/lib is probably best. * contents: -mfile: wrap tempfiles in variables -rv: reactive values Entry: Debian ready for Racket? Date: Tue Dec 28 14:54:17 EST 2010 Hmm.. Maybe not yet. [1] http://ftp-master.debian.org/new/racket_5.0.2-1.html Entry: Switched to Racket 5.1 Date: Sun Mar 20 10:18:19 EDT 2011 I'm using the 5.1 branch in [1]. This is still a work in progress: the 5.1+dfsg1 version isn't ready yet. To generate it do: git archive origin/dfsg | gzip -9 > ../racket_5.1+dfsg1.orig.tar.gz See here[2] for how to use gitpkg. [1] http://git.debian.org/?p=collab-maint/racket.git [2] entry://../pool/20110306-122548 Entry: Racket FFI Date: Sat Mar 26 10:38:43 EDT 2011 I'm having some trouble getting the libusb.ss code back online after upgrade to Racket 5.1 Basically, I forgot everything ;) I've reconstructed the following basic example: #lang scheme/base (require scheme/foreign) (unsafe!) (define-cstruct _a ((next _a-pointer/null))) The thing to note is that recursive data structures always need the "`-pointer" or "-pointer/null" suffix. So I wonder, how did this work before? Oki, I see the problem: (define-cstruct _usb-device ([next _usb-device-pointer/null] [prev _usb-device-pointer/null] [filename _path-type] [bus _usb-bus-pointer-dummy] [descriptor _usb-device-descriptor] [config (_cpointer _usb-config-descriptor)] [dev _pointer] [devnum _uint8] [num_children _uint8] [children (_cpointer _usb-device-pointer)])) I get the message that the identifier `usb-device-config' is already defined. The reason is that `define-cstruct' now defines accessors for the struct members, and from the "descriptor" field it generates the clashing name. Entry: Racket FFI : packed structs Date: Sat Mar 26 11:15:44 EDT 2011 How does the FFI distinquish between packed and unpacked structs? Entry: regexp -> pregexp Date: Tue Nov 1 17:41:21 EDT 2011 Apache log parser broke. Trouble seems to be related to the use of `regexp'. Using `pregexp' at least some things work.. Overall the thing is that this is really too brittle and hard to debug. Let's just make something that reads one item at a time in a more automatic way. It seems quite straightforward to dispatch on the first character: - " string - [ date - ? space-separated word It works pretty well with just "read" but as I remembered that was horribly slow, which was the main reason why I used regexps. So let's stick to that decision and find a way to debug better. What I really want is composable regular expressions with named variable binding (not position mapping). Anyways, I fixed the bug in the match string and went on to use the (test) routine, which filled up memory after 4meg lines, using apparently about 1k per line. That's a bit over the top.. Total number of lines is 4080622 which takes a couple of minutes to parse. Estimate about 25k lines/second. That's fairly reasonable. Now what about hashing it so the main table can be dumped out as a table of indices so we don't need to let the database do this. tom@zoo:~/plt/lib/x$ time mzscheme -it apachelog.ss -e '(begin(test)(exit))' tom@zoo:~/plt/lib/x$ /opt/apache-logs/sorted | mzscheme -t apachelog.ss -e '(test)' 848597 date: (#(struct:exn:fail "find-secs: non-existent date (inputs: 0 17 2 13 3 2011)" #) #"00" #"17" #"02" #"13" #"Mar" #"2011") tom@zoo:~/plt/lib/x$ Looks like a bug: (find-seconds 1 17 2 13 3 2011) find-secs: non-existent date (inputs: 1 17 2 13 3 2011) Let's see if it's in current snapshot. Still there. Sent email to list. See next post. EDIT: was a daylight saving thing.. Entry: What happened on March 13 2011? Date: Wed Nov 2 00:16:57 EDT 2011 Hello, I found this while parsing some log files: Welcome to Racket v5.1.3. > (require racket/date) > (find-seconds 1 17 2 13 3 2011) find-secs: non-existent date (inputs: 1 17 2 13 3 2011) === context === /home/tom/racket/collects/racket/private/misc.rkt:85:7 ( Or is it an obscure Grateful Dead reference? http://en.wikipedia.org/wiki/March_13 ;) Entry: Status Date: Wed Nov 2 10:41:34 EDT 2011 Got parsing working. Parses 700 megs worth of logs (4M lines) in a couple of minutes. Got sharing working also, feeding the logs in as a generator instead of a list. It's still getting quite large even with sharing, so I wonder if that's actually working as it should. Maybe hashes are using eq? instead of equal? 438016 -> 294m virtual (hash-equal? (make-hash)) => #t Doesn't look like it... It does seem to flatten a bit: Lines -> virtual 438016 -> 294m 808300 -> 446m 970609 -> 542m 1473112 -> 736m 1859149 -> 880m 2420298 -> 1180m Seems to be +- 50% memory savings, which is not as much as I hoped. It also seriously slows down at this point. I see it also stores backreferences which maybe are not necessary.. Anyways, the basic idea does seem to work. Let's try it on a subset of files and get it to spit out an SQL database. There are 2 roads: - put it in a standard MySQL / SQLite database and use SQL queries - keep everything in memory and write a small query language in Scheme Entry: Hashing the strings? Date: Wed Nov 2 11:01:49 EDT 2011 Given that the actual string values are not so interesting by themselves, what about using a hash that has a relatively low collision rate, and perform the inverse lookup only when necessary? This would avoid keeping track of the large strings which are the bulk of the memory. What about just interning them as symbols? Right there is a hashing mechanism that's already quite useful.. Anyways.. Let's just focus on making it work for a smaller dataset, then running it on the big one. Once it's converted, it can be updated in-place. Entry: Next: apachelog Date: Wed Nov 2 15:57:14 EDT 2011 * Persistence: most important requirement. The parsing step needs to be cached. * Query hacks & representation. Do I really care? It's probably best to get the damn data into a simple database and play with that a bit. Next: replace Entry: Table size Date: Wed Nov 2 16:18:09 EDT 2011 reading 1/16 of the data set I get the following numbers: rows: 219121 ip: 5901 date: 193871 req: 69166 status: 14 ref: 2228 client: 2466 So apart from date it makes sense to collect the value of the fields in separate tables as there are indeed quite some dupes. I wonder why req is so high though.. I also wonder if it's not easier to just pipe this straight to mysql as SQL syntax and be done with it. Let it do the ID generation[1]. [1] http://dev.mysql.com/doc/refman/5.0/en/example-auto-increment.html Entry: Streaming Date: Thu Nov 3 11:59:31 EDT 2011 Something that's important probably is to stream out the data as soon as it is available, istead of dumping out the whole hash at the end. Entry: TODO Date: Thu Nov 3 12:47:40 EDT 2011 - dump main table - sanitize strings (pff...) - don't keep so much state - stream everything? Entry: Eliminate "share.ss" hash stuff Date: Thu Nov 3 13:49:30 EDT 2011 Might be simpler to do it in a more straightforward way. Done. Got it to stream now. Worked around the sanitizing so it's not 100% correct. It looks like mysql can keep up just fine. The main hog is racket. 22844 tom 20 0 89180 60m 3784 R 98 1.1 0:43.41 racket 12044 mysql 20 0 229m 28m 6360 S 18 0.5 1:24.60 mysqld 22848 root 20 0 33604 2556 1988 S 7 0.0 0:03.05 mysql The fully streamed version: $ time (/opt/apache-logs/sorted | racket ~/plt/lib/x/apachelog-dump.ss | tee /tmp/access.sql | sudo mysql -D apachelogs) Entry: TODO apachelogs Date: Thu Nov 3 14:54:39 EDT 2011 - fix date: currently still hashed - fix the string parser: currently not correct as it doesn't parse escapes correctly. see [1] \"(\\.|[^\"])*\" [1] http://stackoverflow.com/questions/249791/regex-for-quoted-string-with-escaping-quotes Entry: Parsing strings Date: Thu Nov 3 17:30:17 EDT 2011 All these hacking attempts are really hard to parse! This one doesn't work for the string matcher: "giebrok.zwizwa.be:80 83.101.57.157 - - [26/Aug/2011:00:18:08 +0200] \"GET /WebID/IISWebAgentIF.dll?postdata=\\\"> HTTP/1.1\" 302 417 \"-\" \"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)\"" Or in unquoted form: giebrok.zwizwa.be:80 83.101.57.157 - - [26/Aug/2011:00:18:08 +0200] "GET /WebID/IISWebAgentIF.dll?postdata=\"> HTTP/1.1" 302 417 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)" (define q "\"((\\.|[^\"])*)\"") ;; quoted string (quotes not included) Looks like that regexp is wrong, some quoting is missing: (define q "\"((\\.|[^\\\"])*)\"") Another problem: looks like this one is corrupt: zwizwa.fartit.com:80 81.52.143.34 - - [24/Nov/2010:22:44:06 +0100] "GET zwizwa.fartit.com:80 208.115.111.244 - - [25/Nov/2010:07:05:50 +0100] "GET /robots.txt HTTP/1.1" 302 360 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.dotnetdotcom.org/, crawler@dotnetdotcom.org)" probably a cut-off write from: zwizwa.fartit.com:80 81.52.143.34 - - [24/Nov/2010:22:44:06 +0100] "GET zwizwa.fartit.com:80 208.115.111.244 - - [25/Nov/2010:07:05:50 +0100] "GET /robots.txt HTTP/1.1" 302 360 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.dotnetdotcom.org/, crawler@dotnetdotcom.org)" This parses but it shouldn't. How to fix that? INSERT INTO req (id_req, req) values (0, "GET zwizwa.fartit.com:80 208.115.111.244 - - [25/Nov/2010:07:05:50 +0100] "GET /robots.txt HTTP/1.1"); Entry: still crashing Date: Thu Nov 3 19:28:06 EDT 2011 tom@zoo:~/plt$ time (/opt/apache-logs/sorted | racket ~/plt/lib/x/apachelog-dump.ss | tee /tmp/access.sql | sudo mysql -D apachelogs) 1145249 ERROR 1064 (42000) at line 2190300: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'INTO entry (id_agent, id_date, id_req, id_ip, id_stat, id_ref) values (1650, 897' at line 2 1145308 error writing to stream port (Broken pipe; errno=32) real 9m43.920s user 8m43.313s sys 1m26.421s tom@zoo:~/plt$ Offending line: INSERT INTO date (id_date, date) values (897508, #f); It seems to be due to this one: WARNING: can't handle date (4350104328545217515939634383792078420781148110824235358325269992011008006050 44 22 24 11 2010 3600) Which I assume is this one, another botched line: 213.133.113.86 - - [24/Nov/2010:22:59:35 +0100] "GET / HTTP/1.1" 200 497 "-" "Hetzner System Monitorin213.133.113.84 - - [25/Nov/2010:03:05:53 +0100] "GET / HTTP/1.1" 200 497 "-" "Hetzner System Monitoring" Maybe i should start anchoring lines and throw out matches that don't work that way.. It runs to completion. $ time (/opt/apache-logs/sorted | racket ~/plt/lib/x/apachelog-dump.ss | tee /tmp/access.sql | sudo mysql -D apachelogs) 1145249 WARNING: can't handle date (4350104328545217515939634383792078420781148110824235358325269992011008006050 44 22 24 11 2010 3600) 4080617 ^[[C real 49m46.360s user 45m55.024s sys 5m20.604s Only 63981 unique IPs. That's a surprise. So I fixed up the date format to generate MySQL syntax. That makes it a lot faster too! $ time (/opt/apache-logs/sorted | racket ~/plt/lib/x/apachelog-dump.ss | tee /tmp/access.sql | sudo mysql -D apachelogs) 4080617 real 11m7.764s user 11m25.983s sys 0m48.835s Entry: Next: PLT MySQL bindings Date: Fri Nov 4 19:01:31 EDT 2011 Next is to get at the data. First set perms on the database: GRANT ALL PRIVILEGES ON apachelogs.* TO IDENTIFIED BY WITH GRANT OPTION; Maybe look here[1] to work without passwords using GPG auth.` [1] http://dev.mysql.com/doc/refman/5.0/en/checking-rpm-signature.html Entry: Writing opendocument spreadsheets Date: Thu Feb 16 19:17:53 EST 2012 While it's not too hard to write CSV (it's not too easy either) that still leaves some manual steps at import, like selecting separators, column types, date formats etc.. Isn't there a simple way to have typed columns, and put them in a spreadsheet? I guess that's called SQL, but it's not 100% standard either.. Anyways. I'm happy I got my data into racket by mining a GnuCash XML file. Now the next step is to put it into a spreadsheet. So I'm wondering about open document format[1] ods spreadsheets. There's content.xml in the zip which seems straightforward, but I don't really want to wade through all the rest. Is there a simpler format that has typed columns? I don't need so many bells and whistles.. [1] http://en.wikipedia.org/wiki/OpenDocument [2] http://www.openoffice.org/xml/general.html [3] http://stackoverflow.com/search?q=ods+xml&submit=search Entry: syntax-case Date: Fri Feb 22 13:44:42 CET 2013 Apparently, it's not a good idea to mix literals and macros. I.e. if `loop' is defined as syntax, the following doesn't work properly: (syntax-case stx (loop) ((loop a) #'a)) Entry: racklog Date: Sat Feb 23 19:28:50 CET 2013 Trying out racklog. (%which (x) (%= (list x 1) '(0 1))) The problem I have is evaluating a number of constraints over a set of nodes. So what I have is a list of: (nodes, node-predicate) The outcome should be a binding of all the nodes to the values mentioned in the predicates through %==. The problem I have seems to be a level problem: - I have a list of nodes, and a list of predicates (or meta: functions that generate predicates). - The which form takes identifier syntax, not nodes Does this need eval or a macro? In other words: the number of variables in my query is problem-dependent. [1] http://docs.racket-lang.org/racklog/unification.html Entry: geiser Date: Tue Apr 9 13:18:21 EDT 2013 Trying out geiser[1]. Some nice things: - ,enter : switch namespaces modes in repl - geiser-mode: identifiers are annotated in the minibuffer - C-c C-e RET : edit module [1] http://www.nongnu.org/geiser Entry: enter! fix Date: Wed May 8 09:40:27 EDT 2013 This needs a nightly build. Works ok with plt-5.3.4.7-bin-x86_64-linux-debian-squeeze.sh Entry: Shorter compile cycle Date: Wed May 8 09:41:04 EDT 2013 I need this: - reload module - execute command Bound to 1 emacs key, or possibly automatically whenever something changes. How to send command to repl? Entry: raco pkg git repository Date: Sat Sep 28 09:27:18 EDT 2013 Some questions - how do import names and git repository structurs correspond? - where to find a list of libraries? It seems that the info.rkt goes into a subdirectory? So a git repository contains a subdirectory with the import name, and that directory contains the info.rkt ? see: http://docs.racket-lang.org/pkg/how-to-create.html Entry: Edit a database in a browser Date: Mon Dec 16 12:26:09 EST 2013 I'd like to solve a simple problem that I've been struggling with for a long time. Provide a UI for data base editing that is better than a plain text interface + CSV files. For my purpose, a database is an _optimization_. I.e. something to run read-only queries on. All the data is easier to maintain if it is placed in a simpler form such as CSV. Entry: RacketCon Date: Tue Dec 17 22:21:28 EST 2013 http://con.racket-lang.org/2013/ Entry: Racket Package System - Jay McCarthy Date: Wed Dec 18 01:00:01 EST 2013 - no backwards incompatible changes. break API = create new package - breaking includes changing documented behaviour - when the version number goes up, you can add stuff - we're exposing what the core does, to everybody - it's possible to binary-compile packages (see also build-deps) - it's possible to maintain compatibility when breaking up a package - it's possible to undo mistakes by reverting - there's a catalog[2] [1] https://www.youtube.com/watch?v=jnPf6S0_6Xw [2] http://pkg.racket-lang.org Entry: Racket Database Connectivity Date: Mon Dec 23 10:30:43 EST 2013 Looks like MySQL wire protocol is now supported. It does not try to capture query syntax, instead it uses parameterized queries that can be "prepared". The main point is to get relations out of the db, as lists/iterators of vectors and embedding the values in scheme. [1] http://docs.racket-lang.org/db/ Entry: Parsing HTML Date: Tue Dec 31 12:35:06 EST 2013 Which is better. html or xml ? (require html) (define x (read-html-as-xml (open-input-file "..."))) (define h (read-html (open-input-file "..."))) I want to look for an element that matches this: How to query XML in racket? http://docs.racket-lang.org/xml/#%28part._.Simple_.X-expression_.Path_.Queries%29 (se-path*/list '(table) x) Ok this is a bit of a mess. xml, html, xexpr What's the problem? The file is not xhtml, so doesn't seem to be easily handled in racket. I.e. I'll need to do manual traversal on the html datastructure. How to do better? - use external tool to convert to xhtml, then use the xml / xexpr tools in racket - use the racket html datastructure anyway -> See next post Entry: Neil's html-parsing lib Date: Tue Dec 31 18:57:32 EST 2013 (require (planet neil/html-parsing:2:0)) (define x (html->xexp (open-input-file "..."))) http://planet.racket-lang.org/display.ss?package=html-parsing.plt&owner=neil Noticed a difference between (td (@ (class "gh-td")) ...) as produced by html->execp and (td ((class "gh-td")) ...) as used for other Racket x-expr tools. What is this? Ok, these are not the same! Two versions: - Racket's xexpr - Oleg Kiselyov's SXML http://lists.racket-lang.org/users/archive/2011-February/044456.html So the xml/path functions are not compatible with SXML. more here: http://www.neilvandyke.org/racket-xexp/ sxml stuff is nice!