I changed jobs and companies a very fast 90 days ago, moving from a huge (I like to say that like Bernie Sanders says it, starting with a “Y” – I’m from New York after all) technology company to the much smaller VoltDB. In my previous role, I worked on data warehousing. The challenges we solved were about Big Data, but really mostly about Volume and Variety. Other parts of the company’s vast portfolio tried to address the Velocity part, but our group helped customers store massive amounts of data and scan it quickly. That was analytics. We used all sorts of tricks to make our systems go faster to try to make them “real-time” and operational, such as trickle feeds and micro-batch ETLs, but our main focus was on the historical — look at old data and try to predict the future. “Actionable insights” – that was what we were trying to produce. The problem was, they seldom led to any “action”.
VoltDB is different on a number of levels. First, it is transactional. I’ll have much more to say about that aspect in a future blog.
Second, it is in-memory. That means you aren’t going to store petabytes or even hundreds of terabytes in it – and for the main use cases you don’t need to or want to.
Third, it is operational. This is partly due to its transactional focus, but VoltDB does its thing as part of everyday business operations. It is used for authorization, authentication, billing, fraud detection – functions that companies make money on (or count money with). And, as they say, time is money. VoltDB needs to be as fast as the speed of business, and that is getting faster all the time. Think about how even a relatively short delay in loading a web page causes you to leave a page in disgust, or how upset you get if your cell phone call doesn’t connect quickly enough – so much so that you give great consideration to changing carriers.
Its chief advantage is speed; speed and simplicity … simplicity and speed… our two advantages are simplicity and speed…. and fully ACID transactions … and an almost fanatical devotion to correct data … our three advantages are – well, you get the idea.
As I have gathered information about our customers’ uses of VoltDB to get a clearer understanding of their architectures and application capabilities “before” and “after” implementing VoltDB, I’ve seen some common things bubble to the top. One that was obvious was performance. Being in-memory, distributed and designed from the ground up for high performance transactions, that wasn’t a surprise. The degree of performance improvements was a bit surprising, particularly when taken with the fact that many of our customers reduced their hardware footprint in moving to VoltDB – like reducing a 100 node MySQL cluster down to seven nodes running VoltDB and still getting significantly faster performance. That hinted at another commonality – increased simplicity (or if you are a glass half empty type, decreased complexity). Replacing one product with VoltDB allows a reduction in the number of servers. But typically, especially when open source is the chosen strategy, multiple products/projects can be replaced with a single product (VoltDB), in addition to an overall reduction in the number of servers.
Consider someone building a streaming analytics solution with open source. They might use several components – Kafka for keeping track of the data streaming in, the required Zookeeper for coordination, Storm for in-memory operations, and Cassandra for long-term, distributed storage/persistence. Each of these might require clusters of anywhere from three to five nodes, so you are looking at least 12 nodes. Then there is glue code between the different components. You might be able to find just the right Kafka/Storm integration code out there in the open-source user community, but maybe your Storm/Cassandra coordination is a bit unusual so you have to write it yourself. And maintain it through new releases.
So whereas a typical open source “cluster collection” might look like this:
An equivalent solution using VoltDB could look more like this:
Much simpler. VoltDB runs on commodity servers just like most of the Apache open-source and other similar products but typically is much more efficient. Fewer servers, less complexity, lower cost, less maintenance – and more functionality. We have one customer who had been trying very hard to get around transactional “challenges” using Cassandra or HBase with Trident for a real-time customer billing application; they found a perfect solution in VoltDB.
Simplicity is also an aspect of doing business with VoltDB that some of our customers truly appreciate, especially those coming to us from Oracle. As I mentioned, we’ve had customers replace MySQL with VoltDB but we’ve also replaced Oracle TimesTen and even some core Oracle RDBMSs for applications that weren’t able to scale or to perform within required SLAs.
I’ll have more to say about simplicity in my next blog.
So ended my first/fast 90 days. With all of the new market interest in fast data solutions, it’s looking like the next 90 will be even faster.