SocGen to move datacenter footprint in Americas to AWS, Azure

SG Americas plans to significantly reduce and even close datacenters in the US as a result of moving to the cloud and defining controls around cloud and data governance.

In the capital markets, the shift of compute power and data storage from on-premise equipment or third-party datacenters to the cloud continues full steam ahead. However, establishing good data governance in cloud environments is emerging as a challenge to making the most of cloud’s performance-enhancing and cost-reducing potential.

Examples of recent cloud migrations among large banks and asset managers include: Fidelity’s asset management arm, which has migrated 98.8% of its applications exclusively to Amazon Web Services (AWS) and plans to stop using physical datacenters in the near future; UBS, which has moved 33% of its workloads to the cloud; RBC, which is deploying a private-public hybrid cloud model and—as of earlier this year—had already migrated 600 applications, with 80% of them using a private architecture; and JP Morgan, which expects to move between 30% and 50% of the bank’s applications and data to the cloud.

Now, Societe Generale is joining that list. Simon Letort, chief digital officer and head of innovation for the Americas at SocGen, says the French bank aims to close its datacenters in the US “over the next few years” as it continues to migrate applications and its internal data lake to the cloud.

“Having most of the US-based applications on cloud will allow us to close down or significantly reduce our US datacenter footprint and no longer have to manage our own datacenters,” Letort says.

SocGen Americas started its cloud journey in 2015 with a gradual process that centered around examining how long data stayed on cloud. The bank started with capabilities like the ability to “burst” to cloud, which the firm utilized to perform overnight batch processing for risk and profit-and-loss (P&L) calculations. Cloud bursting allows applications that typically run on a private cloud or data center to “burst” onto a public cloud when their computing capacity requirements spike. Once volumes stabilize, the data processing moves back to the private cloud.

The second step was moving pricing engines, such as for credit valuation, which allowed traders to launch computations on demand in the cloud.

“The third step is when you start to leave data on public cloud for a long period of time,” Letort says. “We started to do that when we migrated our client portal for research and analytics.”

The types of data stored on the portal include market parameters, market data, non-client data and non-trading related data—in other words, no highly-confidential data.

The fourth step involved moving into trading-related use cases, where trading applications would always be called on for high-reliability and trading production.

“It convinced us that this was a very beneficial investment, but it also showed us that this couldn’t scale,” Letort says. “We can’t wait months to approve an application that takes weeks to code.”

As a result, the bank had to look at new ways to improve the approval process. To address these inefficiencies, the bank set up a dedicated team to oversee “cloud control.”

“We want to get to a point where approving an app running on cloud or approving an app on-premise would be similar, and follow the same software lifecycle rules,” Letort says.

SocGen Americas utilizes both Amazon Web Services and Microsoft Azure clouds, but it took those services and “hardened” them for use, meaning that the bank would build additional features on top of those in-house—for example, encryption tools around those services to ensure a higher degree of security. After about three years of development, the cloud control team has made 50 services available for faster approval of applications to the cloud.

Lessons learned

One component of SG Americas’ overall migration was moving its data lake to the cloud, prompted by frustration with issues around processing and limitations created by quotas. Letort described the key challenges of migrating the data lake to the cloud at WatersTechnology’s North American Financial Information Summit in May.

“Data scientists were running tick-by-tick analysis that would take, like, eight hours,” he said at the event. More nodes could tamp down the run time, but that could also mean the bank needed to add more servers, which would make operations more costly. Instead, Societe Generale spun up a Google Cloud sandbox and supplied it with dummy data. Letort’s team used Google’s BigQuery data warehouse to run the test. The service allows analysis on petabytes of data and mirrors what the big data team at SocGen Americas was used to doing on-premise.

“In a few minutes, your cluster is spun up,” Letort said. The move to cloud was clear and the bank moved to shut down its on-prem data lake and move its wholesale data lake to Azure. The data lake had previously been housed on-prem in a datacenter in Paris.

“If we had decided to run the data lake on premise at SocGen, we would have had to duplicate the infrastructure and staff required to run big data both in the US and in Europe, which is expensive,” he said at the event in May.

Today, he says any further cloud projects will be governed by the lessons learned during its migration effort so far.

“We decided as a firm to take a careful approach in reviewing all the datasets we will be transferring from on-prem to cloud and this showed that we kind of lacked standards,” he says, adding that establishing data governance rules is the “last mile” to a successful migration. “We’re still in an environment where it does take a certain amount of manual review, human review, legal, compliance before being able to send out the data and ideally to fully benefit from the power of cloud, you will have that streamlined and kind of automated based on some clear rules.”

An industry problem needing industry solutions

To help financial firms along their journey to cloud adoption, data management trade association the EDM Council last year launched its Cloud Data Management Capabilities (CDMC), an assessment and certification framework that aims to develop and implement standards and best practices for handling sensitive data within cloud environments. The framework includes six core components—governance and accountability; cataloging and classification; accessibility and usage; protection and privacy; data lifecycle; and data and technical architecture—along with 37 capabilities and 14 key controls for managing sensitive data in the cloud. Societe Generale was among the banks that contributed to the development of the framework.

However, Letort says that while the framework is helpful, applying it at scale to petabytes of data, either structured or unstructured, can be difficult when the datasets aren’t cataloged or classified, to begin with. “We lack a decent catalog of datasets, and associated classification,” he says. “I would be surprised to see many firms having a very good detailed, granular inventory of all their tables, columns, and rows of data.”

One reason banks may not have a cohesive classification and inventory of their data is that data within a bank can often be siloed, reflecting the way an organization has grown over time, says Virginie O’Shea, founder of Firebrand Research. “You’ve got banks that have been merged from lots of different entities, and you’ve got different desks that operate completely differently from each other. Fixed income and equities may not be using the same terminology, and there may be different taxonomies underlying their data because they were separate businesses,” she says.

Connecting these separate business areas and their datasets into a cohesive whole could be solved by manual development, though Letort says this would be expensive and is not scalable long term. Instead, the industry needs to develop tooling for data discovery and data classification, he says, citing Goldman Sach’s open-sourced data management platform Legend as a tool that can work alongside the CDMC framework.

“I think this is the kind of approach that is needed in complement to the CDMC—and when you combine the two, you have something powerful,” he says.

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

Data catalog competition heats up as spending cools

Data catalogs represent a big step toward a shopping experience in the style of Amazon.com or iTunes for market data management and procurement. Here, we take a look at the key players in this space, old and new.

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here