Waters Wrap: Goldman Sachs and the facts and fiction of being data driven

Neema Raphael, CDO and head of data engineering for Goldman Sachs, explains what he believes it means for a firm to be data driven.

Credit: Paul Cézanne

By the fall of 2008, the bank dominoes were falling in rapid succession. Bear Stearns was dealt to JP Morgan for pennies. Fannie and Freddie were bailed out. Merrill Lynch was paired with Bank of America. Lehman went belly up. AIG was teetering on the brink. By the middle of September, no one knew what would happen next, and Morgan Stanley and Goldman Sachs were under considerable distress from withdrawals.

The historians and pundits have written—and will continue to write—about how institutions, regulators and individuals let the public down. But technologists and data professionals were not sitting at those boardroom tables when decisions were made as to which companies would be able to stay afloat and which would be left to sink. The best tech and data folks could do was help their institutions weather the storm.

Neema Raphael is a software engineer at heart. He graduated in 2003 from the University of California, Berkeley, with a degree in computer science. Rather than take his talent an hour down the road to Silicon Valley, he decided to join Goldman Sachs. Just five years later, he’d find himself in the thick of a global financial meltdown. He wasn’t involved in the conversations happening between then Goldman CEO Lloyd Blankfein and then US Treasury Secretary Hank Paulson or Warren Buffet. The job of Raphael and his engineering team was to get a handle on the firm’s derivatives exposure.

At that time, rumors were swirling that Goldman Sachs could be “the next one down, and everyone is trying to figure out, ‘OK, what does this mean for my bank; what does it mean for my risk exposure?’” Raphael recalled, while speaking on stage with me at last week’s North American Financial Information Summit, aka, Nafis.

There were horror stories of people across Wall Street going through their filing cabinets trying to find their Isda agreements with companies that could go under at, seemingly, any moment. The engineers at Goldman went about reworking the firm’s SecDB risk management system to get a handle on the entirety of its derivatives data so as to figure out Goldman’s exposure to the other surviving institutions across the globe.

“We spent a week doing that—getting all our data in one place, all our exposure, all of our derivatives contracts,” he said. “And at the end, people recognized this database as, literally, one of the things that saved the firm. At that point it was like, ‘Oh, wait, I get it—if you can be data driven, you might be able to save the firm, help a client, help generate revenue.’”

Yet, while the concept of these all-encompassing data and analytics platforms is nice, if the infrastructure isn’t properly put into place, the projects become money-suck boondoggles that solve for very little.

The problem

In today’s world, every single trading firm likes to say—or at least believe—that they are data driven. Maybe this is flippant to write, but I do believe it to be true: I think that most firms want to be data driven, but they don’t put in the dirty work to actually achieve that soaring (at times rhetorical) ethos. I’m sure there were executives at Credit Suisse and Silicon Valley Bank that believed their institutions were data driven—and now they’re deciding what the next stage of their careers will look like.

I like to think, though, that people who work in engineering and data management are, in fact, driven to be data driven. But as you rise up the ranks, the term becomes more of an amorphous ideal that’s not easily defined and, thus, properly funded and staffed. While much attention has been given to cloud, AI, APIs and other buzzwords, without a proper data foundation in place, those tools will not deliver the results that have been promised.

Take data lineage, for example. One data lead at a large asset manager recently told me that their organization has “still not cracked this nut completely.”

Another senior data official at a large bank said that they believe they’ve done “complete lineage on a majority of the estate,” but that they also “spent a ridiculous amount of money” on the project, and those efforts “grow stale because no one thinks about sustainability.”

Another data executive at a large bank told me that, “I guarantee we’re all struggling with this, but we’re just pretending that the foundation was there. [When talking with other data professionals], I feel like I’m smoking crack—I want someone to tell me who’s done this well!”

The bank source recalled a recent conversation with a regulator where the bank’s execs were told they are “not doing end-to-end lineage well, and our board wanted to say, ‘No one is!’ But I’m sure there are—we just have to talk about this.”

A different mindset

Indeed, this is a topic that people need to talk about. So while speaking at Nafis, I asked Raphael about his views on getting the data lineage and quality done right.

“I’m a software engineer. When I got put into this role, I came to data from a software lens. The first things I did was think about, how would an engineer solve these very hard problems: lineage is a hard problem; data quality is a hard problem; data organization, cleaning data, data management—all these things are pretty hard problems,” he said.

The lineage part is super hard because all the logic is trapped in 20-year-old code that no one knows, and 20 years ago no one was writing tests because they didn’t think tests were important
Neema Raphael, Goldman Sachs

He noted that financial institutions were early in on something we take for granted today—computers. It was the hardware that was the focal point for banks back then; people weren’t really thinking of data as an independent thing. As a result, there are numerous data siloes and “spaghetti code” interwoven throughout a bank alongside cutting-edge data analytics, risk and trading systems.

“The lineage part is super hard because all the logic is trapped in 20-year-old code that no one knows, and 20 years ago no one was writing tests because they didn’t think tests were important,” Raphael said. “So, you have no way of figuring out, ‘Okay, what does this data field coming to this data field actually mean?’ It could’ve gone through 25 different programming languages and 25 different hops.”

What Raphael learned is that you’re not going to be able to untangle all that at once—it can be a decade-long effort.

“So the big thing is, go into those systems without having to do heart surgery; rather, do higher-level skin surgery … That’s the biggest lesson we’ve learned.”

That lesson led to a platform now known as Legend. Raphael said that Goldman’s whole philosophy and data strategy is built on this open-source data management and governance platform. The point of the platform was to think of data as “a first-class concept, instead of an exhaust.” Or, back to the spaghetti problem and minor skin surgery, “[Users] can graft on Legend data models at each hop so that now you can at least describe it in a singular data model and in a singular way. Then you can start unpicking behind that each little piece, piece by piece.”

Legend also represents a sea change in the capital markets—the acceptance of open-source tools to solve industry-wide challenges (like data lineage and quality). Post-2008, as banks’ budgets and IT departments were slashed, the idea of looking to the open-source community started to take hold. The open-source ethos is one of give-and-take: If you’re just taking from the community and not giving back, you’re viewed as a vulture. This led to confusion and downright nasty legal cases.

But times are changing, as is evidenced by the work that financial institutions are doing with the Fintech Open Source Foundation (Finos). Raphael described himself as “a little bit of a hippy” and is a big believer in the give-and-take open-source ethos. It would appear that the industry is agreeing with him.

“When we built Legend it took us a while—it took us almost eight years to get to the point where we were like, ‘Okay, we’re going to open source this.’ There were a few things that led to the tipping point.”

He said that every client of Goldman was having the same issues of breaking down data siloes, something that Legend was built to accomplish internally. “They have the same data challenges we have, the same data siloes, the same data management problems, can’t understand the lineage, don’t know the providence, they have 50 versions of what a trade means,” he said.

While Goldman isn’t going to open-source its proprietary data models, Raphael said Legend is a platform that’s more of an industry utility, of sorts. And the bank wasn’t interested in being a SaaS provider, so open source was the way forward.

“It was really a play to help our clients. And then as our clients and other banks adopt it, hopefully it helps the industry on a data interop level,” he said. “I think that helps us from an operational level: the more we can standardize [as an industry] on data descriptions, data languages, how data is linked—well, a rising tide helps all boats is my philosophy.”

Of people and philosophy

During our conversation, Raphael kept hitting on two things that can come off as platitudes if not taken seriously: “We think of data as a first-class asset,” and, “We always think data first—be data driven.”

As head of engineering, Raphael’s team oversees the building of Goldman Sachs’ data platforms globally for all business lines. As chief data officer, he runs the firm’s data governance and quality operations.

“When you think of data as an asset, the world opens up,” he said. “The world has moved from code being the big thing to data being the thing. … Bringing an engineering focus to data and data management has been key for Goldman.”

When asked what being data driven actually means, he says it’s key “to attach yourself to business outcomes and business deliverables. … The ethos with us is we partner with our lines of business to actually build technology or data systems that run their business, and to have some tangible outcome that they care about.” Again, this sounds obvious, but the idea and the execution of that idea are where firms tend to stumble, especially if there’s consistent staff turnover and a lack of senior-level buy-in.

Make engineering and engineers part of your business, instead of some off-to-the-side thing. That helps everybody. Engineers are human, too—I guess. They want to feel connected
Neema Raphael, Goldman Sachs

And here’s another way that companies go astray: as data engineers become increasingly valuable at banks, there’s often a disconnect between senior management (i.e., non-technologists) and those writing the code and building the platforms. Once more, this comes down to connecting the engineering team with the firm’s business objectives.

“Make engineering and engineers part of your business, instead of some off-to-the-side thing. That helps everybody. Engineers are human, too—I guess,” he said with a laugh. “They want to feel connected.”

The reason Raphael went to Goldman rather than a hardcore tech company was that he wanted to learn a new specialty—in this case, finance.

“I knew nothing about finance, so let me use my skills so that I can then also learn something cool about the world,” he said. “So, the first piece of advice I’d give is make sure [the engineers] are attached to business problems. Make sure they understand the context—why are you doing something and for whom? Give them that opportunity to be part of the business. I hate this dichotomy of business/engineering—we don’t even talk about that at Goldman. Engineers are part of the business.”

The other thing to consider on the people front is getting the right talent in the right jobs. As an example, Goldman hired “a bunch of really smart data scientist people—quants, math and physics PhDs.” The problem was that they were spending most of their day “wrangling” the data so that they could finally get to the point of drilling into it.

“I’m like, this doesn’t make sense. These people are super, super smart—they should be focused on helping our clients make money. So, we sort of segregated data science and data engineering as an actual function, where the data engineers were incentivized to organize the data, clean it, make sure it’s great. And the data science team was then allowed to focus on, what are the things we need to do to help our clients make money?”

What have we learned?

Unlike Raphael, I’m not a software engineer at heart, but over the years, I’ve spoken with plenty of technologists and data professionals who are still struggling (organizationally, if not from a technical perspective) to break down their data siloes, implement efficient and sustainable data lineage structures, and, ultimately, improve the quality of the data that traders, portfolio managers, risk professionals and ops folks are using.

At face value, much of what Raphael talked about would sound obvious, and maybe it is, but it’s not easy. The fact that we have a conference like Nafis, which is centered entirely around data, is proof of the challenges at hand. And as technology continues to rapidly evolve, it’s easy to fall further and further behind, especially if you can’t retain talent and philosophies change.

People and philosophy. As my psychiatrist has asked me in the past, “Am I being honest with myself about the problems that I’m experiencing? Am I doing everything in my power to make constructive changes and stick to those changes?” Well, when it comes to being “a data-driven organization”, are you being honest with yourself when trying to achieve those lofty goals?

The image accompanying this column is “Still Life with Apples and Pears” by Paul Cézanne, courtesy of The Met’s open-access program.

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

Data catalog competition heats up as spending cools

Data catalogs represent a big step toward a shopping experience in the style of Amazon.com or iTunes for market data management and procurement. Here, we take a look at the key players in this space, old and new.

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here