Traditional pub/sub systems such as Apache Kafka (and their numerous ever evolving stream processing avatars like Apache Samza, Apache Storm, Apache Flink, etc.) work fine for a bunch of simple use cases where there is a need for simple decisions, speed, scale and stability. However, for most real-life business scenarios in industries such as finance, AdTech, Healthcare IoT, telecom and others, your flavor of the day Apache solution just doesn’t suffice. Let me explain why…
Operationalizing Complex Machine Learning Algorithms
A real-time decisioning engine is essential for the success of modern apps. While, traditional stream processing platforms can do basic machine learning (ML) and pattern recognition, they lack the ability to:
- Make complex decisions using hundreds to thousands of variables with contextual state in milliseconds.
- Dynamically train and update ML models based on not just data from the stream, but also historical big data from a data lake/data warehouse.
In-memory NewSQL relational database management systems (RDBMS) were architected from the very first line of code for complex fast data processing. With RDBMS core functionality – such as User Defined Functions (UDF) and Stored Procedures – complex ML models customized for your business can be embedded in-database for real-time actions/decisions on streaming data. PMML models are automatically converted into an executable process as a UDF and implemented in production. Only a NewSQL in-memory RDBMS can offer the necessary combination of complex actionable decisions + low latency + high throughput required for modern web scale apps.
SQL is Still King of the Hill
SQL is a proven, long established standard to query data. Attempts to replace SQL with either MapReduce in the Apache Hadoop framework – or one of the many NoSQL tools – have failed miserably. Ironically, many “NoSQL” tools have pivoted back to adding SQL or “SQL-like” query languages. Apache Kafka jumped on the SQL bandwagon with “KSQL” for stream processing. KSQL, however, is far from standard SQL.
VoltDB on the other hand was born as an NewSQL RDBMS; it offers fully ANSI compliant SQL on streaming data, enabling a much wider range of queries and complex event processing. Additionally, ANSI compliant NewSQL provides developers with the familiarity, flexibility and standardization necessary to build apps that rely on fast stream processing of data. NewSQL provides the same scalable performance of NoSQL systems for Hybrid Transactional & Analytical Processing (HTAP) workloads without compromising on ACID guarantees.
A NewSQL RDBMS is not just fast, scalable, and easy to deploy, it also can accelerate and simplify app development with strict ACID guarantees. From an app dev perspective, ACID translates to:
- Simplified querying – Complicated single statement queries can be simplified by separating them into multiple statements and putting them into a transaction to get the same effect of a single-statement query. Strict ACID properties ensure that these statements remain independent.
- Easier testing – Each test can be a transaction that makes database changes, which can be rolled back at the end of each test. Isolation ensures that tests can run in parallel without unintentional side effects.
- Faster concurrency – Apps with concurrency will run much faster with ACID guaranteeing the isolation of each transaction. App developers do not need to write code for ordering actions within their app.
Traditional streaming technologies simply can’t match the ACID guarantees that a NewSQL RDBMS offers for stream processing; most embed a NoSQL datastore that can offer eventual consistency at best. Your data – and the insights / actions derived from it – are far too valuable for your business to compromise data accuracy and consistency. Data integrity issues could cost millions in losses and more importantly, tarnish your brand’s public image.
To learn more about smart stream processing with NewSQL, check out this white paper: The Evolution of the Smart Stream Processing Architecture.