JP Morgan AM to Add Chinese, Japanese to Proprietary NLP Tool

The asset manager could one day make its Textual Analytics tool, which reads millions of documents to derive insights, available to the market at large.

paper

JP Morgan Asset Management (JPMAM) is looking to build out its Textual Analytics tool to read Chinese and Japanese documents. Further down the line, it could be a product that the firm commercializes, but it’s not at that stage yet, Kristian West, global head of equity trading and equity data science at JPMAM, tells WatersTechnology.

For now, Textual Analytics, which is used by the firm’s portfolio managers, has been trained only in the English language. It reads millions of documents, ranging from company filings and corporate event transcripts to employee reviews on sites such as Glassdoor. 

Will Coulby, global equity data science lead at JPMAM, says the team is debating whether it makes more sense to add language variance to the model, or if it’s better to translate other languages into English before passing them through the model.

“At the moment, we’re tending towards the translation, just because the scale of the documents we’ve got seems to work a bit better through this model. The core of this type of approach is that the model needs to have read a lot of information to gain value from it. So at the moment we’re still experimenting quite heavily, but we’ve got a Chinese language variant of the model—in Mandarin—that’s going to work especially well for us, and we’ve been looking at Japanese as well,” he says. 

As well as experimenting with these new languages, JPMAM continues to look for new datasets to train the model with. The team is paying particular attention to the platform’s prediction component. “[It’s] not just in terms of the text itself. We’re as interested in being able to predict something as we are having the text to predict it from,” Coulby says. 

While Textual Analytics was developed within the equity business on the basis that there is efficacy within the output, West and his team want to make it readily available to the organization in order to improve processes across trading desks.

“So, for example, our fixed income department [is using it] in terms of evaluating the quality of companies and their creditworthiness [and] this is something which is being used by those investment teams as well,” West says. “That is, the other big part of our asset management franchise is using it.”

The company could one day explore making the tool commercially available, perhaps for existing clients to use as part of a service offering. “We’re not at that point yet, but that’s definitely the opportunity for sure,” West says.

Quant vs Fundamental

The tool is used by the asset manager’s quantitative and fundamental investment teams. For quant funds, Textual Analytics generates a raw signal, which has demonstrated strong investment returns, Coulby says. The investment teams can then interact with the front end of the tool to better understand what is driving that score. 

“The work we have done here on model explainability is critical to its success as it allows investors to become confident they understand its decision-making,” he says. 

For fundamental analysts, the tool creates a dashboard they can use as a screening tool, and allows them to do a deep dive into a particular company’s holdings in a given portfolio. “While one element of this is obviously simple screening, we aim to build topics that help us track our long-term thesis. For example, if we had a belief that the introduction of automated cars might lead to lower car ownership, we might develop topics looking at sentiment to car ownership, sentiment to autonomous vehicles, etc” Coulby says. 

One thing crucial to the development of Textual Analytics is its feedback mechanism. Over time, Coulby and his team will add more data, topics (including a planned build-out of ESG-related features), and predictions to the model. “With every new piece of feedback, the model is able to better understand this data. Having an ongoing team [providing] specialist, targeted feedback gives us a real competitive advantage in model training. More importantly, though, it builds upon our approach of the model trying to read content like our analysts do,” he says. 

Beginnings

JP Morgan Asset Management’s Textual Analytics tool took about a year of “heavyweight research” to build before it went live in November 2019. The solution relies heavily on Google’s transformer-based model BERT—or Bidirectional Encoder Representations from Transformers—which allows the model to have context when understanding text in documents.  

“So if we looked at the classical setting, what we’re always finding is that [the model] would be reading the document, but it was always looking for things within the document, rather than trying to understand what the document said,” Coulby says.

However, the work that BERT has done in the natural language space has not been primed for finance, so JP Morgan added things to tailor it to make the model financially viable. “For example, if you’re building a quantitative back test, one of the things you’ll care about is: are my embeddings quantum time? Does the language model I’m using have an understanding from the future?” Coulby says.

[To read more about tools that advance NLP, click here.]

The Textual Analytics model relies heavily on BERT for the embedding element, and it has a reinforcement layer, which Coulby says makes the model “not necessarily a black box”. JP Morgan leverages its people to help mold the model. The organization is ”full of people who spend their entire lives reading financial documents, writing research, and so on. So we looked at how we can leverage those people and the things that they’ve been building, as well as we possibly can,” Coulby says. ”We’ve got 30 years of proprietary internal research, which is mapped out, so we obviously wanted to make the most use of that as possible. But we also have these people who specialized in that.” 

He continues: “We tried to say, ‘How can we build a model that reads and understands financial text?’ Let’s start with that as our backbone. And from there, how can we use that to produce a model that predicts earnings revisions in the short term, or price returns in the long term. But it’s also much more nuanced than that. How can we identify certain types of behavior? How can we identify if a company has a positive attitude to remote work?”  

He says JP Morgan has moved away from building a straight quant model to predict one thing and get returns from it. Instead, it’s using all its data science and research capabilities to either, on the quant side, help make investment decisions, or to enrich the analytics and internal analysis in areas such as ESG

Feedback Helps

A compelling characteristic of the model is that when users view the results of the analytics in the front end, they can tell the model if they don’t agree with it. While the model will not adapt to what the user says immediately, it uses that input as part of its testability.

“The feedback users are giving isn’t necessarily that, ‘You gave us a score of x, and I think we should be y,’—it’s much more than that. Instead, it could be either, ‘I think that was a really good decision on its part,’ or, ‘I’m not sure it necessarily understood the context’,” Coulby says. 

Things can start to get complicated when the model looks at more convoluted language such as is often found in company filings. Coulby says it struggles to understand exactly what’s going on if there are double or triple negatives in a sentence. 

He points to an example of a signal the firm has been trying to track: discussions around workplace safety. “We’ve always been interested in it, and it’s interesting on the fund management side, and also on the ESG side. And when we first started thinking about it, what you find is the model would highlight people that talked about job security. But job security and workplace security are fairly different things. So the ability to say, ‘Actually I think it’s got it wrong this time; this isn’t that kind of discussion,’ is very, very helpful for it because it points it in the right direction,” he says.

“And the key here is scale. Once you’ve got that kind of thing right, once you’ve got people consuming this information and giving their feedback, it doesn’t mean the model will always just follow that feedback; it doesn’t mean that people giving bad feedback will necessarily completely bias the model. But that’s why we’ve got adversarial layers in place, to make it a bit more robust to these types of things.” 

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

Data catalog competition heats up as spending cools

Data catalogs represent a big step toward a shopping experience in the style of Amazon.com or iTunes for market data management and procurement. Here, we take a look at the key players in this space, old and new.

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here