Advancements in NLP bring focus to document insight

Vendors are looking to provide AI models to help financial professionals get more value out of unstructured data sources.

document insight

Natural language processing is a subset of artificial intelligence for parsing human speech and text. Large companies in the capital markets are investing in NLP-based tools and interfaces, hoping to market them to clients that must find meaning in large, unstructured datasets and documentation.

S&P Global, for one, invested in and then bought Kensho Technologies, a startup focused on AI for financial services, for $550 million in 2018. Kensho now bills itself as the innovation hub of S&P, pairing machine learning with the massive datasets of its parent company to provide automation and insights to customers.

And in June this year, Citi’s Sprint, an investment arm of the bank, bought an undisclosed share in Claira, a startup looking to apply AI tools to read financial documentation. Citi Sprint said at the time that it would collaborate with Claira to develop data analysis solutions to support its business, starting with municipal prospectuses and collateralized loan obligations. Claira is also working with Citi-backed CLO platform Octaura, to provide market participants with information that can be incorporated into pricing and risk management.

Blessing and Curse

S&P Global offers data on public and private markets, ESG scores, and news, which clients can access through its platforms. S&P Global’s most comprehensive platform is the recently rebranded Capital IQ Pro, which is underpinned by a massive database—about 200 terabytes, 135 billion data points, and hundreds of millions of documents covering some 40 million companies.

Warren Breakstone, head of desktop and channels for S&P Global Market Intelligence, says that for the platform’s users, big data is both a blessing and a curse. “The blessing is that there is so much data available to help answer your various questions and improve your models. The challenge is that there’s just so much out there, how do you find the right data for your particular needs? It’s overwhelming,” he says.

In response to this challenge and to help customers perform more efficient and revealing research, S&P last year launched a document viewer. This viewer uses Kensho’s proprietary Nerd (named entity recognition and disambiguation) technology, a machine-learning system that links textual data to sources of structured data. In this case, Capital IQ users can link unstructured information from regulatory filings, earnings calls transcripts, press releases and the like with the structured information in the massive S&P database. For analysts in capital markets, finding unstructured data, whether in text or audio, is just the beginning of the research process. The idea behind the document viewer is that analysts can enrich the data they find by linking it to past analyst coverage, pricing data, and news about companies or individuals associated with those companies.

A document viewer user could, for example, use the tool to populate a company’s10k filing with tags of the companies mentioned in the document, with those tags linking back to the Capital IQ database and to Wikimedia. The user could uncover associations with companies or individuals that might not be obvious from just the filing itself.

“It enriches the document with entities, with table identification, with topics that we generate, in order to allow you to quickly flow between various datasets and various instances of the data to make it easier to use,” says Abhaya Menon, head of desktop for S&P Capital IQ Pro.

The tool also has a layer of sentiment analysis that uses NLP to score, for example, earnings call transcripts on whether the company and analysts seemed positive or negative about its quarterly results. The sentiment score can be compared with those of previous calls to analyze how sentiment changed over time, and how results were affected by macro events like the Covid-19 pandemic or a recession. Kensho says Nerd can understand context, and so distinguish between, for example, two companies that have very similar names, inferring which company is the correct one to supply for annotation.

The tool allows users to combine Nerd with other proprietary Kensho tools, Link and Scribe. Kensho Link can algorithmically return links to entities in the S&P database even when data inputs are incomplete or contain errors. Kensho Scribe transcribes audio of such earnings calls or depositions.

Navigating the Ts and Cs

Claira was co-founded by Eric Chang, a former trader at Goldman Sachs, BlackRock and AQR, who went on to a role in AI product development at Morgan Stanley; Joseph Squeri, who was a CIO at Citadel and Barclays; and Alex Schumacher, an expert in NLP and a subset of NLP called natural language understanding (NLU). The three worked together at Exos Financial, a digital bank founded by former Credit Suisse CEO Brady Dougan in 2018.

At Exos, the three realized they needed a tool that provided insights into financial documents such as muni bond and CLO contracts. Such tools existed, but they needed pre-training and were all reliant on the same outdated approaches and open-source libraries. The three thought that applying computational linguistics—the science of getting machines to understand natural language—could be more effective. And so Claira launched in 2019 as a spin-off from Exos’s technology and venture arm.

Chang says most financial firms spend months going through the Ts and Cs as they negotiate deals and contracts. “Ironically, when securities are being traded or deals are being evaluated, because it’s so hard to read a 500-page legal agreement, those terms and conditions are essentially forgotten about or ignored until the very end,” he says.

Claira aims to save financial professionals all that time and effort by providing a pre-trained tool that gives them a better understanding of documents. Claira’s speciality is the kinds of documents that are heavily negotiated and bespoke, like CLOs, though it can also be used to parse 10k filings and the like.

Claira works by leveraging NLP and recent developments in NLU. The tool converts financial contracts into business logic—customized algorithms that define how a business operates. Legal documents have a specific legal structure, and that can be used to describe business logic in this context. Thus, Claira converts financial contracts into code, essentially, and makes the data available to downstream systems or human analysts to implement in their investment strategies.

Claira is therefore dependent on the “legalese”—the innate language and structure of these documents, Squeri says. “As long as the legal language doesn’t change, which it hasn’t for a few hundred years,” he says.

Users of the platform can upload documents or connect to an internal document system before Claira processes the selected documents and offers summary analytics. Information can then be downloaded as a CSV file or fed into an API to connect to pricing and risk models.

It was important to Claira that the tools it provided were pre-trained, to save clients the hassle and time of training and labeling. “We don’t want our client to spend six months going through menial data tagging,” Chang says. “We can understand what the client wants, configure our models, pull that out, and give them the insights they want.”

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

Data catalog competition heats up as spending cools

Data catalogs represent a big step toward a shopping experience in the style of Amazon.com or iTunes for market data management and procurement. Here, we take a look at the key players in this space, old and new.

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here