Wed Nov 2 11:01:49 EDT 2011
Hashing the strings?
Given that the actual string values are not so interesting by
themselves, what about using a hash that has a relatively low
collision rate, and perform the inverse lookup only when necessary?
This would avoid keeping track of the large strings which are the bulk
of the memory.
What about just interning them as symbols? Right there is a hashing
mechanism that's already quite useful..
Anyways.. Let's just focus on making it work for a smaller dataset,
then running it on the big one. Once it's converted, it can be