30 Jan 2012

The phenomenon of NoSQL databases, like Cassandra, BigTable, SimpleDB, etc, is odd in that they don't don't seem to have come about due to any new ideas. The main differentiator of NoSQL databases vs classic SQL databases is simply a reduction of features. This isn't a bad thing. Einstein said "Make everything as simple as possible, but not simpler." NoSQL took the parts of databases that were slow and just got rid of them. They didn't try to make them faster, they just made it so you couldn't do slow things, or if you wanted to you'd have to code them yourself on top of the database layer.

For example, NoSQL databases don't allow joins. Joins are slow so get rid of them. Locks are also slow, so get rid of those. Databases like Netezza get around the locking problem by associating a creation and deletion time stamp with rows. You don't ever modify a row, you simply set the deletion time on the row, and create a new row. If another query runs concurrently, it checks the creation time and deletion time to ensure the row it is looking at existed at the time of the start of the query.

Iterating through data looking for values that match a where clause is also slow, so just use a key-value store. Cassandra is just a hash of hashes. By making the database so simple, sharding becomes a lot easier for the database to do automatically.