Sat Mar 22 05:59:51 EDT 2014

Thinking about organizing data

Problem: some data is:
- large
- readonly
- private

Most of this is pictures, movies and scanned documents.
How to organize it better?

Stuff like this should not be part of a "home directory" under linux.
It should be in an abstract store.

A home directory is a working directory.
It should not be an archive..

Another part is deduplication.  i.e. pictures are often copied between
my home dir and my wife's.

As for storage mechanism, the distinction between "working data" and
"archive data" is that working data is often small, so it makes sense
to put it on an ssd.

Another thing about archive data is that it is not tied to a
particular machine.

So, some properties:

- unique global singleton
- large
- read only / add only  (includes git and darcs!)
- indispensible
- infrequent access
- often private

home, system:
- per machine
- small
- read / write
- dispensible
- frequent access
- often public

The global nature allows to answer the important question: how much
data do I need to carry around?

As for implementation, an archive can be just a directory structure on
an abstract disk.  However it should be indexed for consistency, and
have some kind of history.

It seems best not to re-invent the wheel here, though I am tempted.
There is something to say about a couple of local file systems that
are somehow synced to perform "eventually consistent" file operations.

Seems there is a lot of recent activity in this field due to cloud


The main idea seems to be that file store needs to be:
- centralized conceptually
- distributed for access speed, fault-tolerance.

One is named after my dog: