To quote Gerald Weinberg, author of The Psychology of Computer Programming, “a system is never finished being developed until it ceases to be used.” A wave of innovative ideas throughout the 70’s and 80’s (see Codd’s 12 rules) culminated in the early 90’s into a set of exemplary patterns and practices that caused a dramatic shift in the way big business managed data, spurring on a rapid digitization of documents that would traditionally have lived in a filing cabinet and require several manual steps to be usable by anyone (or in a proprietary data store on a proprietary mainframe running a proprietary operating system that’s accessible by few). Without this automation the “big data” that’s driving the modern web would simply be unavailable, or at least require a lot more work to produce. The very success of these patterns and practices that have become deeply ingrained in modern relational database management systems (RDBMS) now presents an obstacle to the necessary evolution of how companies handle their data. I’m not advocating that everyone needs to rethink what’s working for them, but that for many companies substantial infrastructure is being committed to patching systems that are fundamentally broken for their use-cases.
Much of the data on the web is pretty darn flat, which presented an opportunity for the development of emerging database management systems that were built around a review of new approaches, while also taking notes from some of the early work at places like Xerox PARC. Google took such a leap in designing their BigTable non-rational data store that powers much of Google Earth/Maps, Blogger.com, Gmail, and several other aspects of their infrastructure. Likewise, data on the web tends to follow a logically distributed structure, which lends itself to the brilliant work by the Amazon Engineers on the fully distributed Dynamo key-value store; which among other technologies, implemented novel variations on Token Rings, Partitioning, Quorum Replication and Merkle Trees. The Cassandra database platform powering Youneeq’s analytics and real-time content personalization incorporates a hybridized evolution of these two platforms.
When we started down the road of real-time content personalization we were firmly imbedded in the world of relational databases. The first internal iteration a little ad hoc, but the next more strictly adhering to popular standards such as 4NF (fourth normal form), while adopting common patterns used to avoid the obvious (and not so obvious) performance pitfalls. This worked great when we deployed our alpha version product on small-to-medium traffic sites, but our first high-traffic test site became a painful experience in watching the system systematically fail over the course of several minutes. A possible option around our bottlenecks would be to cache everything in memory and hope the power never went out, and that our replication was sufficient to avoid stale info while being not so traffic intense between servers as to clog up the network. There are numerous in-memory databases, with the option even making its way into newer version of MySQL, but we also had to consider the management headaches associated with this approach and the replication issue was non-trivial if we truly wanted to service our clients with consistent and reliable results. Instead we took a risk on the ATS key-value store Microsoft provides in their Azure platform, while also utilizing an intermediate level of nodes for maintaining a coherent cache on frequently requested data.
Azure Table Storage (ATS) is a well thought-through basic NoSQL option for content that can be driven through many small parallelized requests, particularly when highly idempotent access patterns are involved. The system is metered such that table and account level performance is tuned to work well if you leverage the key structure efficiently and penalize requests and accounts that don’t. This one-two combo of highly parallelized requests and intermediate caching allowed us to scale an implementation onto large sites, while driving down the cost of committing shared access points for smaller clients. Not ideal but it meant we were back in the game of supporting very large sites on the web with real-time content personalization that adapts for every individual with every content item they view.
We subsequently made a series of changes to our underlying system that improved our flexibility and detection of performance constraints, and actively tested and reviewed database management platforms for consideration in replacing ATS. MongoDB was our next leap into new back-end territory, providing us with greater control of how we built out our data cluster, greater extensibility in the query model, and internal support for MapReduce. While this solution worked well for a while, larger sets of data began to degrade performance and the built-in master-slave clustering was clearly showing drops in scalability past a few database nodes.
This leads to our current Cassandra infrastructure, which involved the most planning and testing of all the solutions, but also the greatest reward for our efforts. Under testing we found the intermediate caching was redundant as the latency of the node between our web front-end servers and our data nodes was typically higher than the time it took Cassandra to service a request. One less piece in the architecture, but would we truly be faster by sticking less in local memory? After some early hiccups, and some assistance from the friendly folks at DataStax (who manage the commercial side of Cassandra) the answer is a resounding yes; we’re leaner, meaner and faster than we’ve ever been! Netflix has demonstrated that a 288 node Cassandra cluster can scale with similar efficiency to just a few nodes, supporting over 1 million write operations per second and latency to the client averaging around 11ms (see Benchmarking Cassandra Scalability on AWS – Over a million writes per second). I wouldn’t advocate this solution for everyone; for instance, although we’re producing similar efficiency to the stats on best of breed integrations it took months of design hours to get us to that level. Transaction support is also in its early stages, so if you work at a financial institution it’s likely not the best career move to advocate for Cassandra in the financial back-end…
A big thanks to the DataStax guys for helping to get us to where we are today, and for any database admin or developer finding themselves patching legacy systems more and more….consider that the times may be a changin.