Editor’s Note: This post originally appeared on November 1, 2017 on the InfoWorld blog. It is republished with permission.
Even Google and Amazon can’t process data instantly—here’s how to combat latency in your real-time application
Despite all the advances we’ve seen in data processing and database technology, there is no escaping data’s Public Enemy No. 1: latency, the time delay before a response is generated and returned. Even Gartner’s definition of a zero-latency enterprise acknowledges that latency can never actually be zero because computers need time to “think.”
While you may never truly achieve zero latency, the goal is always to deliver information in the shortest amount of time possible, so ensuring predictable, low latency processing is key when building a real-time application. Often the hardest part, though, is identifying the sources of latency in your application and subsequently eliminating them. If you can’t remove them entirely, there are steps you can take to reduce or manage their consequences.
Before, during, and after computing the response, there are number of areas that can add unwanted latency. Below are some common sources and tips for minimizing their impact.
Most applications use the network in some manner, whether between the client application and the server or between server-side processes and applications. The important thing to know here is that distance matters—the closer your client is to the server, the lower the network latency. For instance, round-trip latency between nodes within the same datacenter can cost 500 microseconds, while it can be an additional 50 milliseconds for nodes in California and New York.
What to do:
- Use faster networking, such as better network interface cards and drivers and 10GigE networking.
- Eliminate network hops. In clustered queuing and storage systems, data can be horizontally scaled across many host machines, which can help you avoid extra network round-trip connections.
- Keep client and server processes close together, ideally within the same datacenter and on the same physical network switch.
- If your application is running in the cloud, keep all processing in one availability zone.
Many real-time applications are data intensive, requiring some sort of database to service the real-time request. Databases, even in-memory databases, make data durable by storing it to persistent storage, but for high-velocity real-time applications, making data durable can add significant unwanted latency, and disk I/O, like network I/O, is costly. Accessing memory can be upwards of 10,000x faster than a single disk seek.
What to do:
- Avoid writing to disk. Instead use write-through caches or in-memory databases or grids (modern in-memory data stores are optimized for low latency and high read/write performance).
- If you do need to write to disk, combine writes where possible, especially when using fsync. The goal is to optimize algorithms to minimize the impact of disk I/O. Consider asynchronous durability as a way to avoid stalling main line processing with blocking I/O.
- Use fast storage systems, such as SSDs or spinning disks with battery-backed caches.
The operating environment
The operating environment in which you run your real-time application—on shared hardware, in containers, in virtual machines, or in the cloud—can significantly impact latency.
What to do:
- Run your application on dedicated hardware so other applications can’t inadvertently consume system resources and impact your application’s performance.
- Be wary of virtualization—even on your own dedicated hardware, hypervisors impose a layer of code between your application and the operating system. When configured properly, performance degradation can be minimized, but your application is still running in a shared environment and may be impacted by other applications on the physical hardware.
- Understand the nuances of the programming language environment used by your application. Some languages, such as Java and Go, use automatic memory management and use periodic garbage collection to reclaim unused memory. The processing impact of garbage collection can be unpredictable and introduce unwanted latency at seemingly random times.
When it comes to coding, there are some common core functionalities that can pose barriers to speed.
What to do:
- Inefficient algorithms are the most obvious sources of latency in code. When possible, look for unnecessary loops or nested expensive operations in code—restructuring loops and caching expensive computation results usually help.
- Multithreaded locks stall processing and thus introduce latency. Use design patterns that avoid locking, especially when writing server-side applications.
- Blocking operations cause long wait times, so use an asynchronous (nonblocking) programming model to better utilize hardware resources, such as network and disk I/O.
- Unbounded queues may sound counterintuitive, but these lead to unbounded hardware resource usage, which no computer has. Limiting the queue depths and providing back pressure typically lead to less wait time in your code resulting in more predictable latencies.
Combating the enemy
Building real-time applications requires that the application developer not only write efficient code, but to also understand the operating environment and hardware constraints of the systems on which your application will be deployed. Provisioning the fastest networking equipment and the fastest CPUs won’t singularly solve your real-time latency requirements.
Thoughtful application architecture, efficient software algorithms and optimal hardware operating environment are all key considerations for fighting latency.