Fri Jun 12 10:24:37 CEST 2009


The problem: my growing collection of locally cached electronics
papers and books is getting quite large.  I'd like to construct an
interface for it + solve the problem of making sure it is available

Rationale for locally cached library:

  * Not all content is available on the web.
  * I'm not always online.
  * The total size is managable.

Rationale for specific storage structure:

  * Data is read-only
  * The only operations are add and delete.
  * Want to avoid (exact) duplicates.
  * Not all machines are always on. (reason for distributed system)

Ideally I'd like this to be a reference pool so it's easier to add
references to papers.  This is something that can grow however.  It's
best to get meta-data automatically from the web, and focus on caching
the data, and linking the metadata.

Some problems:

  * A web interface
  * Meta data format?
  * Organization + Search?

Practical problems and solutions:

  * Only single files are indexed.  This works best for ps.gz, pdf and
    djvu.  Multiple files, use tar.bz2 archive + figure out how to
    unpack this in the viewer.

For storage, maybe check here[1].  I'd like to move to an
implementation where each file is indexed as an MD5 file.  This would
make it possible to tap into MD5 content hash databases.  

[1] http://en.wikipedia.org/wiki/Content-addressable_storage