Escaping the ‘data swamp’: GoldenSource aims to improve data lake, warehouse analytics

As part of its cloud migration, the EDM vendor is launching an end-to-end ‘lake house’ service to help firms better manage their data.

GoldenSource is launching a service that aims to help buy-side firms better manage data within their data lakes and warehouses.

The vendor’s Cloud Data Service has two components: a data pipeline and a data schema. The data pipeline brings information from various data sources into and out of an asset manager’s data lake or data warehouse. The data schema helps organize and structure the data. The key element is flexibility, says Tom Stock, head of product management at GoldenSource: the service sucks in the data in all formats; it then cleans, refines and structures the data; and then the data is pushed downstream into the buy-side firm’s internal analytics systems.

“We are trying to [help firms] get out of that data swamp problem,” Stock says.

For the service, GoldenSource created a common orchestration layer using Apache Airflow, an open-source workflow management platform for data engineering pipelines, as well as the Apache Hive Metastore to manage its metadata. “We used a lot of out-of-the-box, open-source components to construct our end-to-end data ‘lake house’ offering,” he says.

Whatever type of analytics tools a customer wants to use, they should be able to deploy those tools on top of our data lake framework
Tom Stock, GoldenSource

The data pipeline allows consumers to bring in all types of data—vendor data or shared data from a cloud marketplace, as well as their proprietary data in both structured and unstructured formats—and deposit it into their chosen cloud data warehouse or data lake. So if a buy-side firm uses, for example, Cloudera, Snowflake, Databricks, or even Google Cloud Platform, the tool can pull in data from those offerings, and then clean and distribute it to be analyzed.

“We try to keep it very open so that whatever type of analytics tools a customer wants to use, they should be able to deploy those tools on top of our data lake framework to be able to utilize those and have a common experience across their organization,” Stock says.

He coins a portmanteau word from data lake and warehouse to describe the need for the service—“lake house”.

“So you’ve got a portion of the lake house, which is the left side—I call it more data lake-oriented, more in tune to rapid ingestion and large volumes of data for analytics—joined with the right side of the lake house, which is the data warehouse-type refined area, which is more structured, normalized across different sources of data,” he says. “If I’ve got position information coming in from multiple custodians, we normalize that and create a single model of that data. So you’ve got different types of data in there for different use cases. Plus, over the top, you have a consumption layer that will allow different ‘personas’ within the organization to access views of that data.”

Although data warehouses help firms with processing data, GoldenSource contends that they are light on that traditional data lake capability. “Cloud data warehouses provide a lot of analytical horsepower, but they didn’t have the full capabilities around rapid ingestion, which is what data lakes have. So what we did is marry those together,” Stock says.

Another key is the concept of data concordance, which means having common identifiers. “If I’m bringing in data—even in my raw zone for analytics—I need to be able to cross-reference that data that’s sitting within my refined zone that I’m using for operational reporting,” he says. “So things like instrument IDs—am I using Isin, or Cusip? If I’m looking at organizational entity data, how do I know the entity data that I have sitting in my ESG raw scores can be related to the entity master data and the security position data that I have within my refined zone?”

New challenge

While cloud has transformed the capital markets, it has also introduced new headaches.

Jeremy Katzeff, head of buy-side solutions at GoldenSource, says one of the challenges asset managers and hedge funds face is the expanding “personas” consuming datasets within cloud-based data platforms and warehouses. What he means by personas is that more units—and, thus, individuals—need more data, whether for portfolio construction, risk management, sales, marketing, or product development. Quant teams alone want more and more data, but the risk is that there’s wasteful spending on data, the wrong people have access to it, or different business units don’t have the same view of the data.

“It’s moving beyond the traditional personas of middle-office tech and ops that need a security master or some cleansed information that they can then give to the portfolio managers to raise and order and execute a trade,” he says.

Eiichiro Yanagawa, a senior analyst at research and advisory firm Celent, says that if GoldenSource’s new Cloud Data Services offering delivers, it will solve a key challenge for data managers in the capital markets—turning data into actionable insights. “If it goes beyond providing partial functionality to improve data lakes to a platform (multi-cloud, multi-user, multi-institution) that turns data into insight, then [GoldenSource’s] strategy is truly the future.”

A cloud of its own

Like so many other companies in the capital markets, GoldenSource is in the midst of its cloud migration journey. It is currently moving towards a serverless set up to take advantage of elastic compute and to make its system more modular and easier to deploy and maintain.

Tom Stock, the vendor’s head of product management, says the migration to the cloud was accelerated by its message-based event-driven architecture. One of the things it did was move away from being a strictly Oracle-based application and offer Postgres as an option.

“For cloud implementations, Postgres is a database that most of the cloud providers offer and it is widely used in public cloud implementations,” he adds. The other thing GoldenSource is doing is containerizing its application using Docker and having the infrastructure managed by Kubernetes.

Stock explains that GoldenSource has split its application into individual containers that can be easily scaled up for processing on the cloud. “Our business engines are in certain containers, [and] our data model itself is in certain containers,” he says.

The next step is using Kubernetes as the container manager to enable a serverless environment. “Over the last several years, we’ve made those steps to really start leveraging the power of the cloud. … Under the hood, we still have our application server that we use there. Our next step is to remove the application server components out of that and then take our engines that run underneath that application server and migrate those to microservices running on the cloud,” he says.

GoldenSource aims to finish most of that migration by early 2023.

“The traditional deployment that most people still have is you buy some virtual CPUs and memory,” adds Jeremy Katzeff, head of buy-side solutions at GoldenSource. “We want to move away from that model, so it’s more elastic and more of a pay-as-you-go model, because it’s more scalable. Whereas if you buy something fixed now, and in two years you’d have to do another exercise because someone hit the top of their capacity, it becomes an in-depth exercise to figure out what the next stages of growth are.”

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

Data catalog competition heats up as spending cools

Data catalogs represent a big step toward a shopping experience in the style of Amazon.com or iTunes for market data management and procurement. Here, we take a look at the key players in this space, old and new.

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here