Fri Jun 12 10:24:37 CEST 2009
The problem: my growing collection of locally cached electronics
papers and books is getting quite large. I'd like to construct an
interface for it + solve the problem of making sure it is available
Rationale for locally cached library:
* Not all content is available on the web.
* I'm not always online.
* The total size is managable.
Rationale for specific storage structure:
* Data is read-only
* The only operations are add and delete.
* Want to avoid (exact) duplicates.
* Not all machines are always on. (reason for distributed system)
Ideally I'd like this to be a reference pool so it's easier to add
references to papers. This is something that can grow however. It's
best to get meta-data automatically from the web, and focus on caching
the data, and linking the metadata.
* A web interface
* Meta data format?
* Organization + Search?
Practical problems and solutions:
* Only single files are indexed. This works best for ps.gz, pdf and
djvu. Multiple files, use tar.bz2 archive + figure out how to
unpack this in the viewer.
For storage, maybe check here. I'd like to move to an
implementation where each file is indexed as an MD5 file. This would
make it possible to tap into MD5 content hash databases.