Man Group revamps data science platform to tackle data deluge

The London-based investment manager spent four “long and intense” years rewriting its data science platform, Arctic.

Around five years ago, London-based Man Group ran into a problem that buy-side firms are all too familiar with: its tech and data science platforms were unable to cope with the avalanche of data that was being generated internally and was also flowing into its front office from external sources. The firm’s trading desks and quants were struggling to consume this data and create trading strategies from it.

“Imagine a Microsoft Excel spreadsheet with 100,000 columns and 10 million rows. Now, imagine 1,000 such spreadsheets and trying to build a trading model based on them,” says Gary Collier, head of Alpha Technology at Man Group.

At the time, the firm had a data science platform named Arctic, which was developed in 2013 after Man Group grew unhappy with its tick provider and decided to build its own. Arctic’s first iteration was a time-series database constructed entirely in Python, as the firm’s front-office systems were Python-based.

“We thought we could do better ourselves and process tick data in volume across a high-performance computing cluster. That was the genesis of Arctic,” says Collier, who was speaking on the sidelines of the TradeTech conference in Paris in May.

By 2017, however, Collier—who has been with the company since 2001—realized that Arctic was in need of an overhaul. Man Group required a faster platform that could process complex, “industrial-sized” data frames—tabular forms of data such as thousands of Excel sheets. Corporate bond traders, for example, must make sense of hundreds of thousands of columns, and the tick history of a single security could encompass millions of rows of information.

Man Group has a 400-strong business-wide technology unit. Within it sits the Alpha Technology team, which Collier leads, comprising 175 engineers and five business managers. The team set out to rewrite Arctic from the ground up—this time in C++. Collier says it took four years of “intense effort” to complete the version that exists today.

During a fireside chat at TradeTech on data science platforms, Collier said that currently available third-party solutions lack the capacity to deal with large volumes of data cleanly, reliably, and quickly—so they build internally.

Mark Stacey, head of business intelligence, data platform, and reporting applications at hedge fund Marshall Wace, agreed that vendor solutions are lagging behind. Speaking during the same fireside chat, Stacey said the processing speed and technical advancement of third-party technology are evolving much more slowly than the streams of data that investment managers must contend with.

He added that building or customizing more sophisticated data science tools internally is no longer seen as a “competitive advantage”. Rather, it has become the “table stakes” required just to be successful.

Man Group’s rewritten platform, dubbed Arctic Native, sits across the bulk of the asset manager’s front-to-back-office systems, and is designed to process large volumes of data. Most of the data volume running through it is generated internally, and stored, cataloged, and versioned within Arctic. The derived data is used to generate alpha signals, build portfolios, and run risk models.

Any user across the business can access Arctic Native’s data, including traders, portfolio managers, risk teams, and compliance, simply by typing “http//codex/” into an in-house browser. This prompts admission to the Codex data catalog, encompassing thousands of datasets that users can search using keywords by asset class, and view permissions.

“Vendor data generally have strict licensing conditions, and internally generated data generates its own IP protection considerations. Once discovered, the data can then be served up into a [separate] Python notebook,” says Collier.

He says the Alpha Technology team will continue to update and build upon the revamped Arctic platform. But he disagrees with the idea that technology platforms must be pigeonholed as either legacy or new systems, pointing out that tech solutions such as Arctic Native are usually made up of some components that are more advanced than others.

“Thinking in such a binary way is the wrong way of looking at it. Within technology, there are always a huge number of wavefronts of change at any point in time: the language and the language version in use, the operating system version, types of database technology, library dependencies, web frameworks, ensuring security patches are in place, and so on. A simplification such as ‘modern versus legacy’ trivializes one of the most challenging areas of technology management,” he says.

Learning from open source

Back in Arctic’s early days, when it was still just a time-series database, Collier was weighing up the benefits of open-sourcing the platform. He saw a great deal of value in open source for three reasons. Firstly, external technologists could contribute to the code, and his team could learn new ways of improving the tech from contributors. Secondly, it would drive name recognition. And thirdly, it could attract talented engineers to the firm.

He proposed the idea to senior management in 2015. They were initially taken aback by the idea of giving away the firm’s IP for free and took some convincing. But they came around to the idea of open source, and the first version of Arctic was published on Github in 2015. The version made public was written in Python and leveraged Pandas, a software library for the programming language.

The value of any piece of technology is derived from the whole finished product—the way in which individual components are combined, Collier says. As Arctic developed, Alpha Technology built more components.

“Arctic is just one component of this, which is partly why we were so relaxed about making it open source,” he says.

Man Group widely uses open-source packages as the foundational building blocks for the tech stack and has its army of engineers develop the added capabilities on top or fill in any necessary gaps in functionality.

The asset manager uses few outside vendors, and its ethos for the past decade has been to build rather than buy, Collier says, adding that he can count on one hand the number of enterprise software vendors Alpha Technology uses outside of its proprietary builds.

“Where there are gaps in open source, we fill the gaps. If it’s a significant technology gap, we fill it with something like Arctic. The rest of our time is focused on our financial IP—for example, building our alpha models and portfolio construction techniques,” he says.

For this reason, he adds, the firm has few concerns when it comes to over-reliance on one vendor, or vendor lock-ins.

Arctic Native also benefits from the fact that Man Group has built an internal cloud on OpenStack infrastructure and container orchestration system Kubernetes, rather than turning to the public cloud.

“If you go to a third-party provider, you’re not going to get the same type of high performance or flash storage, the same type of gigabit-per-second networking, or the smooth, seamless, interactive, fast-as-possible experience for our front-end investment professionals,” Collier says.

Bursting with savings

Marshall Wace’s Stacey told the fireside chat’s audience that while the cloud is helpful for big data use cases, it’s not essential. Marshall Wace turned to research clusters for internal big data projects, in which researchers run their own models on large datasets using clusters of shared resources, such as hardware and machinery.

However, when a firm needs to deploy an application quickly and at scale, but doesn’t have the technical means in-house to deal with those intense processing periods, it can use an approach called “bursting”, he said. Bursting occurs when a user configures their private cloud so that during times of excessive traffic, that overflow is directed to a public cloud. The public cloud absorbs the load, and the user experiences no interruption of service while also paying only for those resources when it really needs them.

“You might have the ability to scale out, but obviously that scale comes with the massive cost associated with it. So, in the cloud, what you need to do is put the tools and processes in place to manage your infrastructure and manage the workload you’re putting onto it,” Stacey said.

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

Data catalog competition heats up as spending cools

Data catalogs represent a big step toward a shopping experience in the style of Amazon.com or iTunes for market data management and procurement. Here, we take a look at the key players in this space, old and new.

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here