Weighing the Benefits: Hardware Vs Software

In the pursuit of new ways to eliminate latency from the market data distribution and trading processes, vendors have invested in hardware-acceleration technologies, such as FPGAs. But with commodity chips now giving specialist hardware a run for its money, has that investment been wasted, or does each still have a role to play?

For the past 15 years, trading firms have poured money into the race for zero latency, investing millions in low-latency fiber routes, wireless connectivity, direct exchange datafeeds, and components of trading architectures that promise incremental improvements in processing time. One of these has been the introduction of hardware acceleration for certain parts of a rapid market data workflow, such as field-programmable gate array (FPGA) processors that speed up processing time for repetitive tasks like feed handling and book-building.

When trading firms agonize over microseconds of latency—the delay incurred in moving data from one point to another—finding a new technology that reduces that delay can be a game-changer. Additionally, using a slower technology can be detrimental to the solutions provided to clients, it can impact those clients’ ability to trade efficiently, and it can waste development efforts.

In the search for lower latency and greater efficiency, market data and technology provider Activ Financial adopted hardware acceleration in 2007, creating a market data appliance based on FPGAs to complement its existing software platforms. But while fintech vendors invested in FPGAs, the regular microchip manufacturers made giant leaps forward.

“We began to question our use of FPGAs when Intel released its Sandy Bridge processors [in 2011]. It no longer seemed clear whether FPGAs still had a performance edge,” says Activ COO Jim Bomer. “We have recently tested our solution on some old Sandy Bridge boxes and found that it outperforms the contemporary FPGA solution as far back as 2012.”

As a result, Activ has switched back to a pure software infrastructure—though it will still offer the hardware appliance if customers want it. So now, the question is, should other fintech vendors follow suit? Their rivals aren’t so sure.

FPGAs: The Latency Penicillin

FPGAs have been used to power some aspects of capital markets technology for almost 15 years—St. Louis, Mo.-based ticker plant and feed handler appliance vendor Exegy was the first to introduce a market data platform based on FPGAs in 2006—and have earned a reputation for being very fast, but tedious and expensive to work with. As a result, we may now be facing an inflection point: Some providers are doubling down on FPGAs for data processing, while others are turning back to pure software solutions, claiming that software and the latest iterations of commodity hardware can deliver sufficient—and in some cases, FPGA-comparable—performance.

Latency-sensitive firms face a constant challenge to decide what technologies will deliver the best results. But the options aren’t always clear.

One option for firms seeking clarity is to turn to technology testing and benchmarking body STAC Research. The company runs two benchmarks that measure network input/output (I/O) latency. The first is the STAC-N1 benchmark, which measures data latency using timestamps performed using software. Then there’s the STAC-T0 benchmark, a more accurate measurement that determines latency on the wire, and it can be applied to both software and hardware systems, though to date it has been used to measure latency of FPGA-based systems.

Peter Lankford, founder and director of STAC, says the company is seeing demand to perform tests on both CPU-based and FPGA-based platforms. “There are a lot of firms using FPGAs, but a lot of applications still using CPUs,” he says.

In addition, the competing claims made by vendors about their differing strategies can cause confusion. Industry association FISD’s World Financial Information Conference last October saw a flurry of announcements on both sides of the fence from datafeed technology vendors. Feed handler and datafeed provider Vela—with hardware partner Enyx—and Exegy both announced latency gains from new hardware-accelerated services, while Activ Financial unveiled a software-only version of its ticker plant.

Despite different approaches, each believes it will deliver better performance and value for its client base.

For example, by integrating Enyx’s nxFeed FPGA network interface card into servers running its software Ticker Plant Appliance, Vela is able to deliver greater capacity using just one server, a server that the vendor ships to clients, who plug into the network ports and write to the API, while Vela manages the box remotely on their behalf—compared to needing two to run the software-only version, which the vendor will continue to offer.

The combination enables Vela to divide tasks between the most appropriate method for each. So, for instance, line-handling and book-building can be performed using the Enyx FPGA card, while tasks like normalization and downstream distribution are handled using Vela’s software.

ollie-cadman-vela-trading-technologies

“So we’re leveraging each for what they’re best suited to,” says Ollie Cadman, chief product officer at Vela in London. “We’re very well established in the software space, and haven’t used hardware acceleration before. We’ve done a lot around kernel bypass to improve and contain market data throughput as data rates increased. But there are some instances where hardware delivers the next step in terms of footprint and determinism for the 99.99999th percentile.”

Cadman says using hardware and software together in this way allows the vendor to take advantage of the gains made possible by using FPGAs, while retaining the flexibility of software.

“One of our clients’ biggest focuses is the operational cost and size of the platform, and we’ve made investments over the years to optimize that. But we also wanted to build faster and to re-use tools going forward,” he says. “And we use multicast so we can publish to unlimited numbers of downstream subscribers and applications with a reasonably low-latency profile, as well as for use in less latency-sensitive applications—so you can use this to power your whole trading stack.”

Over time, CPUs have developed to a point where they are more efficient than FPGAs for our workloads—for example, due to faster, more efficient cores, higher core counts, and adjunct technologies such as kernel bypass networking.
Steve McNeany, Activ

Ready to ’Ware

Activ Financial, on the other hand, has decided to focus its efforts on a return to an all-software ticker plant stack. Activ’s original ticker plant was software-based, built in three tiers—feed handlers, databasing (a last-value cache), and fan-out to APIs. The vendor adopted FPGAs for the databasing tier to address latency and throughput, and to reduce the physical server footprint required to run the ticker plant, or as CEO Steve McNeany puts it, to do more with less.

steve-mcneany-activ-financial

“Technology requirements change with update rates and user expectations, and we constantly re-evaluate the technical solution,” McNeany says. “Over time, CPUs have developed to a point where they are more efficient than FPGAs for our workloads—for example, due to faster, more efficient cores, higher core counts, and adjunct technologies such as kernel bypass networking.”

For example, he says, it’s easier, faster, and cheaper to deploy a software ticker plant infrastructure in the field, and to make changes to it over time. Additionally, the vendor has been able to exceed its benchmarks measured on FPGAs using commodity servers running the new generation of chips. Beyond the tests with old Sandy Bridge servers, “with more modern Intel devices, we’re seeing even bigger performance margins in favor of CPUs, and we are about to start testing with AMD’s new Rome processors,” Bomer says. “I would have liked to move down this path sooner, but at the time we had a working solution and many other priorities. On the other hand, I’m delighted that it is proving so effective now, as was demonstrated during the recent surge in Options Price Reporting Authority (Opra) traffic.”

Activ did actually experiment with porting its feed handler tier to run on FPGAs around 2010.

“We got it running, and saw good results, but it had a very high engineering cost, and feed handlers are the part that changes all the time. There are some use-cases that support that necessary engineering cost, but we don’t think that’s our space,” Bomer says, adding that the vendor serves a broader base of client types and business requirements than others who specialize only in low latency, and that software gives the vendor more agility to respond to client needs.

In addition, where financial firms perceive a distinct advantage to be gained, they often choose to build solutions in-house rather than buying off the shelf—at least, until the cost of development to achieve those incremental gains outweighs the benefit of the gains. For Bomer, chasing this fleeting demand isn’t a wise use of resources. “Why produce a vendor product when people on the leading edge of the low-latency spectrum will ‘roll their own’?” he says.

Cynics suggest that Activ’s decision to move away from FPGAs may be a result of Exegy suing the vendor for infringement of patents related to FPGA usage. But while unwilling to share specific figures, Activ insists that its decision was motivated purely by performance factors.

In the major developed equities markets and US equity options, I would guess the top market-makers are doing all that using FPGAs, but that the next tier of hedge fund, proprietary trading firms, and dark pool operators are using FPGAs to create a best bid and offer from direct feeds.
David Taylor, Exegy

The ‘Hard’ Way

Meanwhile, Exegy is still a strong proponent of hardware for specific functions, though the vendor is blending that approach with software to govern and manage different hardware-based tasks. Its Xero trading platform is an FPGA-based network interface card that can be installed on servers running trading applications, or integrated with the vendor’s Exegy Ticker Plant and Exegy Market Data System. Xero was released last year and can execute a trading algorithm in less than a microsecond. David Taylor, Exegy’s CTO, says its continued focus on achieving the lowest latencies is a response to the needs of clients as they expand strategies across asset classes.

david-taylor-exegy

“What we’ve seen from conversations with clients, starting maybe 18 months ago, is that they are really focusing on speed, especially now in derivatives and futures markets, such as CME, Eurex, and Intercontinental Exchange, and in Asian markets,” Taylor says.

He adds that one of the trends the company is still seeing is that FPGAs in front-office technology goes through cycles.

“First, it’s introduced as an alpha-generator. Then the top firms do it as best practice. Then, in the mature stage, it becomes table stakes. For some markets, FPGAs are entering that third stage of being table stakes,” he says. “In the major developed equities markets and US equity options, I would guess the top market-makers are doing all that using FPGAs, but that the next tier of hedge fund, proprietary trading firms, and dark pool operators are using FPGAs to create a best bid and offer from direct feeds.”

Like Activ, Exegy recognizes that some firms will want to “roll their own” technology where they perceive it gives them an advantage. But Taylor believes some aspects of FPGA ownership are where firms may draw the line.

“We had some clients who recognized that they need this level of performance and were looking to go out and hire FPGA developers to make it happen, but realized that we were the experts. Other clients already had FPGA teams, but didn’t want to distract them from other projects,” he says. “Part of our secret sauce is where you draw the hardware line: what should happen in FPGAs versus in software, and whether you are hard-coding or making something ‘parameterable’—i.e., making it more flexible.”

Indeed, Exegy provides a software API for Xero so users can quickly and easily set parameters for what is performed in FPGAs, so that firms concerned about latency can respond to changing market needs. For example, it’s not just about speed of execution. Sometimes you need to exit a market just as swiftly.

So the API allows firms to set up a “cancel all” function so that when Xero recognizes certain market conditions, it can cancel orders, giving the firm time to step back and evaluate how the market is moving, without having its orders picked off.

“Our intent is that software will always be governing the hardware,” Taylor says. “You can be ‘pure hardware’ for part of a solution, but software at some level will always be a part of every solution.”

They say, ‘OK, we’ll take the hardware approach.’ But by the time they’ve hired and trained people to write hardware solutions, the chip manufacturers have caught up and found ways to get the same performance from chips.

Software’s Shortfalls

That said, one of the challenges of performing latency-sensitive tasks in software is that whereas hardware robotically performs its assigned task, software is also responsible for performing other processes that can impact specific activities and functions, says Alex Wolcough, director of UK-based capital markets consultancy Greenbirch Group.

alex-wolcough-greenbirch

“The operating system can occasionally interrupt what you’re doing because it needs to do something else—for example, a Java Virtual Machine stops a program every so often to do ‘garbage collection.’ It only stops for a few nanoseconds, but it means that you’re not getting the data at the speed and the consistency that you want,” he says. “So one attraction of having something hardware-based is that you are entirely in control of the hardware and it won’t be interrupted unless someone literally turns it off.”

This challenge has not been lost on software engineers, says Vela’s Cadman. “We use C++ primarily because of performance and to avoid the interruptions that you get with other languages, such as Java—though we have a Java API. So in the last software version, we made some adjustments to reduce ‘garbage collection’ time, because we do still have clients using Java.”

Another feature of software solutions—notwithstanding the recent advances by chipmakers—is that once a firm has squeezed all the latency it can out of a process, it can appear that the only way to accelerate these further is hardware acceleration, says a former bank data technology executive who now works at a major hardware manufacturer.

“They say, ‘OK, we’ll take the hardware approach.’ But by the time they’ve hired and trained people to write hardware solutions, the chip manufacturers have caught up and found ways to get the same performance from chips,” the executive says. “We’re doing a lot with multi-core channel access. Most multi-core systems have bottlenecks around the amount of memory they can access. But you can architect the circuit boards around where you plug in processes, and design them so they can leverage large amounts of memory.”

STAC’s Lankford concurs, saying that advances in CPUs have made it possible for software platforms to handle drastically increased volumes of data, but says his tests demonstrate that FPGAs can operate with maximum latency measured in nanoseconds, compared to microseconds for systems that add network I/O and business logic in software.

“Because FPGAs are so customizable and can start working on data as soon as it comes off the wire, they have an inherent advantage,” Lankford says. “To get network data into and out of a software application, it first comes into the network interface card (NIC), which transfers it to the memory of the computer. Instructions running on the CPU analyze the data and create outbound data, which is then transferred back to the NIC to be transmitted. So there are several links in the chain. If you could embed your trading logic into the NIC, think how much latency you could save. That’s effectively what an FPGA enables.”

And in addition to being faster, FPGAs can also get a jump-start on their software counterparts, as FPGAs can start performing tasks on data as soon as the first byte of a data packet arrives, instead of waiting for the whole packet to arrive, he adds.

Hardware’s Hitches

While FPGAs may speed up repetitive processes, it can slow down the ability to get those processes up and running because hard-coding those processes into the hardware is more time-consuming than coding software, Wolcough says. As their names imply, software is flexible; hardware is rigid.

“When you have something repeatable that operates well at low latency, the problem with things being coded into hardware is that it is less flexible,” he says. “So, for example, if an exchange makes a slight change to its protocol, a hardware solution may need its chip updated. So there is quite a high overhead from a maintenance perspective … because making a change in software configuration is easy, but changing something coded on an FPGA card is more involved.”

patrick-flannery-maystreet

The high cost of maintenance is a concern shared by others, including Patrick Flannery, CEO of feed handler technology vendor MayStreet, who agrees, and questions the incremental value of hardware solutions.

“Our perspective is that hardware-based solutions have far too high a cost of ownership and a limited audience, and that their marginal value is very low,” Flannery says, adding that commodity microprocessors are delivering comparable performance within software to current FPGA solutions.

“We can build an order book in 2.1 microseconds. Hardware would be marginally faster, but it only provides meaningful advantage for a very small section of the industry, and that section is shrinking,” he says. “My opinion is that software will win in a big way within the next four years.”

However, FPGA manufacturers are addressing these concerns and making it easier for users to perform more complicated tasks on FPGAs, Lankford says, such as Xilinx and Intel-owned Altera developing libraries that allow users to code logic onto FPGAs using C++.

Another example of inflexibility is determining where the output data from the hardware-based process goes. “If an FPGA wants to take data and send it to an algorithm running on the same or on the next card in a server, that’s OK. But if you want it to send data to a lot of other applications, you need to have some kind of messaging layer,” which would likely be software, Wolcough adds.

But that’s not the only thing that makes them harder to work with: Monitoring the performance of FPGA-based systems is also harder than pure software platforms.

“If you’re market-making, you can receive data and price orders in hardware, but monitoring that, tuning risk parameters, and monitoring P&L—there’s no reason to build that in hardware. It’s more expensive and time-consuming,” says Exegy’s Taylor.

Guy Warren

Guy Warren, CEO of systems technology provider ITRS Group, whose Geneos product monitors systems for performance, stability, and uptime, says it can easily support modern configurations of underlying hardware supporting software-based solutions, but that FPGAs present greater challenges.

“Clustered databases, clustered and dynamic containers using Kubernetes, or parallel web servers … don’t create a problem for monitoring, except we need to know how the many—and dynamically changing—parts make up the whole application we are watching. Previously, we got this from configuration given to us through the user interface or configuration management database. With dynamic environments, we have to collect this metadata from Kubernetes, which we can do,” Warren says.

However, he says ITRS has not been asked to monitor any FPGA solutions, and “could only do it if the application was reporting the necessary data on an API,” adding that FPGAs are “much harder to monitor.”

Seamless = Success

Activ’s McNeany says some clients have even expressed relief that the vendor’s platform is no longer reliant on FPGAs—though it will continue to offer the hardware-accelerated version. “They just want us to be flexible and responsive … and there’s a cost factor [to using FPGA-based platforms] as well,” he says.

The former bank data technology executive also highlights cost concerns, noting that—while costing more to deploy—FPGA solutions can save on operational spend by compressing a firm’s hardware footprint, especially when some vendor solutions can require multiple servers just to handle US options markets data. “The FPGA solution may be better, but is there real return on investment in spending that extra money?” he says.

If firms for whom cost is a factor have not already concentrated their efforts on software platforms, the new technique of Extreme Ultraviolet (EUV) Lithography could put an end to any performance debate. Recently highlighted by Goldman Sachs Research as being able to produce more powerful chips at lower cost, EUV prints transistors using shorter wavelengths of light. Shorter wavelengths allow chipmakers to build chips with smaller components, and pack more compute power into a smaller space. Samsung is already using the process to build its own processors, and Intel and AMD are both also using it to create their next-generation chips.

peter-lankford-stac

Not to be outdone, FPGAs are also evolving, Lankford notes, with new generations increasing the amount of memory available. In fact, FPGAs have another inherent advantage: the clock speed of CPUs—measured in gigahertz—was the main driver of processing speed before the introduction of multi-core chips. “FPGAs started at a lower frequency, so they have further left to go,” he says.

In fact, both schools of technology are advancing, each trying to eat into the other’s strengths, driven by the need to increase both speed and intelligence of trading applications. “There is a tension between these, because applying more smarts generally takes more time. It also takes more code, and CPUs have traditionally been much easier to code than FPGAs,” Lankford says. “Broadly speaking, super-fast algos tend to be pretty simple, so slower firms can compete by making their algos smarter. But the playing field is constantly shifting, because both the CPU and FPGA ecosystems have been pushing further along their weaker axes. The CPU/software ecosystem has been chipping away at latencies, while the FPGA ecosystem is making it easier to put more smarts into FPGAs. As a result, we can expect to see smart algos getting faster and fast algos getting smarter.”

What’s most likely is that the industry will see a convergence of FPGAs and software for specific purposes. “The times are changing,” says Exegy’s Taylor. “Over the past 20 years, the top engineering schools have exposed students to hardware description languages, which are how you write circuits into FPGAs. It remains a fundamentally different discipline from software engineering. So, today, you have to be a good integrated circuit designer.”

But ultimately, clients shouldn’t need to be concerned whether a supplier’s system uses FPGAs or runs in pure software; they just need to know that it works and will deliver the speed, throughput, and capabilities they need. And whatever the underlying technology, it should be delivered as seamlessly as possible.

“Our clients push us towards the things they need to do,” Activ’s Bomer says. “They don’t tell us what technology to use to achieve them.”

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

Data catalog competition heats up as spending cools

Data catalogs represent a big step toward a shopping experience in the style of Amazon.com or iTunes for market data management and procurement. Here, we take a look at the key players in this space, old and new.

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here