First post in a series on features of VoltDB Version 9.2.
Version 9.2 of VoltDB introduces two new features that are highly relevant to people developing streaming applications. This is part one of a series of three blog posts:
- Introducing MIGRATE
- Introducing CREATE TASK
- An example of an application that uses MIGRATE and CREATE TASK.
This blog post will focus on MIGRATE. Use the link above to find a link to CREATE TASK and stay tuned for an upcoming post on an example of an application that uses MIGRATE and CREATE TASK.
VoltDB has long had a concept of an Export Stream. It’s defined in DDL and looks like a table. From a SQL perspective one can INSERT into it, but not UPDATE or DELETE it. While it looks like a table, it is in fact a reliable ‘at least once queue’ to a destination of your choice. VoltDB allows you to route rows inserted into an Export Stream to a variety of preexisting destinations and also exposes an API so you can implement new ones.
“MIGRATE” has been created to make working with Export Streams easier.
MIGRATE reduces the amount of code needed.
In practice we see a very common use case where as rows become old in a VoltDB table we want to send them somewhere using an Export Stream. While it was perfectly possible to do, it involved reading a row, unloading all the columns, using the unloaded columns to populate an insert into the export stream and then deleting the row we started with. MIGRATE does this in one step using a WHERE clause:
MIGRATE FROM drone_locations WHERE drone_id = ? AND event_timestamp <= ? AND NOT MIGRATING;
Note the “NOT MIGRATING” clause – as we discuss below it allows us to avoid repeatedly marking something for migration, and also choose whether we wish to see migrating records in SELECT statements.
MIGRATE only completes when we know the record exists somewhere else.
Export Streams are a highly efficient abstraction of an ‘at least once queue’, but operate at a totally different speed to the rest of VoltDB. Inside a VoltDB procedure we measure time in nanoseconds, but most third party queue destinations (such as Kafka or JDBC) think in terms of milliseconds or even seconds. Waiting for such destinations to finish writing a record would do horrible things to latency, and if we require acknowledgement from something outside VoltDB to finish a transaction we cease to be highly performant. So writing to an Export Stream doesn’t wait for the downstream destination to confirm receipt. Instead, it just confirms that you have queued a request. This means that just because you’ve finished inserting something into an Export Stream, it doesn’t mean that it has already reached its destination, any more than pressing ‘send’ on an email means the recipient instantly receives it. Note that this being VoltDB, once you’ve queued a request it will remain in the queue even if a node in the cluster dies – delivery is reliable. The worst case scenario is that the downstream destination will receive the record twice.
This asynchronous behavior ensured our transactions ran quickly, but created problems for VoltDB users who wanted to keep a live copy of something either in VoltDB or in another datastore. If they inserted into the stream and deleted the original from VoltDB, there would be a period of time where the record wasn’t visible anywhere – it would be in transit in the export stream plumbing.
Our implementation of MIGRATE solves this problem. MIGRATED rows don’t disappear immediately, but only when the downstream system has acknowledged their receipt. We’ve also added a SQL function so you can tell what’s currently Migrating:
SELECT drone_location FROM drone_locations WHERE drone_id = ? AND NOT MIGRATING;
You can also identify individual rows as Migrating using a CASE statement:
SELECT drone_location, case when migrating then 'Y' else 'N' end FROM drone_locations WHERE drone_id = ?;
Without this capability you run the risk of enthusiastically requesting migration multiple times for the same record. If a row has the status of Migrating there are no guarantees as to how long something will take to migrate, but once it does the record will be deleted.
Migrate allows us to reduce the work needed to move aging records out of VoltDB into much slower remote systems, without having to worry about data being ‘missing’ in the plumbing between VoltDB and the downstream system. Learn more about Version 9.2 of VoltDB for yourself and download a trial copy.