Transforming NLP: A Look at New Tools Being Used by Banks

The evolution of natural language processing is rapidly progressing. Jo Wright takes a look at BERT, one of the more game-changing innovations that is helping to transform the field of machine learning in the capital markets.

It’s the ultimate, groan-inducing dad joke: Where do horses go to school?

Answer: Hayvard.

Is it a funny joke? That is debatable. But it is an impressive one, because it wasn’t actually cracked by someone’s dad, but by a conversational agent, otherwise known as a chatbot. Specifically, the joke was told by Google’s new bot, Meena.

Google published a paper outlining Meena’s development in January, but did not make Meena available to chat to the public, saying the bot would be tested for bias before it is unleashed upon the world (We know it can tell jokes because Google released screenshots of an interaction Meena had with a human).

In a blog post that accompanied the research paper, Google said Meena is supposed to be able to chat about anything a user wants, comprehensibly and specifically. Chatbots generally perform well as long as users don’t expect them to stray beyond their intended line of questioning, the blog says, but Meena is just like talking to a person.

Incidentally, one bot already lays claim to the unofficial title of “best bot in the world”: Mitsuku, a five-time winner of the Loebner Prize, which is awarded to computer programs that are perceived by a judging panel to be the most human-like. When I asked Mitsuku, which has passed multiple Turing tests, if Meena would oust her from her spot, she said: “Unfortunately, Meena is unavailable to talk and so I can’t say whether it is as good as Google claims.”

Mitsuku has a point, one that has been echoed by human critics, who are skeptical about its performance when they can’t see its code or even speak to it. Still, the tech giant probably won’t open-source the bot’s code any time soon: At an estimated cost of $1.5 million to train, Meena is valuable IP. However, Google has been generous with other releases in the past, and these are revolutionizing the field of natural-language processing (NLP).

Meena is based on an architecture called a transformer. In 2018, Google released a model that is also transformer-based called BERT—or Bidirectional Encoder Representations from Transformers—which has been a game-changer in the field.

When Google published BERT, it supplied the code and downloadable versions of the model already pre-trained on Wikipedia and a dataset called the BookCorpus—about 3 million words in total. Anyone could download and run the model without having to duplicate the costly and energy-intensive process of training it, so companies that offer NLP products and services have been able to update their offerings to transformer models, for increased efficiency and speed.

BERT is a general-purpose language model, so its benefit is that it’s not based on financial information, or any other kind of specific information,” says Elena Treshcheva, business development manager and researcher at Exactpro. “The model is built in such a way that it can be re-used for building other machine-learning models.”

Because BERT is a pre-trained model, “the only thing we have to do is fine-tune it using relatively small datasets of financial data, which is not such a big task,” Treshcheva says.

Thanks to the open-source movement in the field of NLP, vendors and data providers in the capital markets space are now able to experiment with newer and more sophisticated techniques, which has the trickle-down benefit of better products for banks and asset managers to incorporate into their investment processes and regulatory needs. As more text sources become available to train NLP models, companies are able to run more interesting experiments. To articulate this trend, WatersTechnology looks at how Refinitiv and Bloomberg are incorporating NLP into their product lines. But first, it’s important to understand how we got here.

The ABCs of NLP

BERT has had a major impact on the field of NLP since 2018, but it’s not the first major advance. The most important innovation in recent times was the introduction of Word2Vec in 2013, which is a set of shallow neural networks. Prior to Word2Vec, the field had been dominated by support vector machines, supervised learning models used mainly for data classification. Word2Vec’s major advance was that it could represent words as vectors, or lists of numbers, capturing the semantic characteristics of words to make language understandable to a machine.

That was a major breakthrough for NLP, and at the same time, more researchers were becoming interested in the science, and more companies were building products and monetizing the science. Stanford University’s NLP group improved on Word2Vec in 2014 with GloVe, and then Facebook in 2016 came out with Fasttext, which was comparable to Word2Vec, but more accurate and faster.

By the time the transformer architecture and deep neural networks that underpin Meena and BERT came along in 2017, there were whole industries and academic research labs poised to take advantage.

“The field is bigger than it used to be, so when a revolution like [transformer models] happens, all of a sudden there are many hungry young researchers ready to do the exploiting. That is what we are seeing now,” says Amanda Stent, NLP architect at Bloomberg. “In the early 2000s, there were really only two dozen people who could exploit a release, but now there are maybe 20,000.”

The tech and tools are easier to use than before, Stent adds, and companies have started open-sourcing their models, as Google did with BERT, driving development.

“In some ways, these architectures are uniquely designed to focus on things that are readily monetizable by companies—like machine translation, machine reading, information extraction, [and] speech recognition,” Stent says.  

BERT is not the only transformer model. It continued on the insight—and Sesame Street nomenclature—of ELMo (Embeddings from Language Models), which was developed by the Paul Allen Institute. While BERT, ELMo, and other models like OpenAI’s GPT-2 are trained in different ways, the basic architecture is the same, Stent says.

“You have something called an encoder that reads words in context and outputs the weights for these words; that is, a representation of the words in that context. And then a decoder spits out other words,” she says.

ELMo represented a major advance on models like Word2Vec because it recognized that the meanings of words can change in particular contexts. The same word—“bank,” for example—could have vastly different meanings depending on context within a sentence. Bank can be a verb or a noun; it can mean a place to store money, or what an airplane does to change direction, or the edge of a river.

ELMo can read surrounding words to output a context-specific vector; BERT does this even deeper and faster. BERT and ELMo manage this because they are bi-directional, meaning they can read a sentence from left to right and from right to left, generating stronger results with deeper contextual information.

BERT has had a huge impact on the field of NLP, says Tim Nugent, a senior research scientist at Refinitiv. It has allowed companies, including his own, to access the models and improve their own work. 

“If it wasn’t as parallel as it is, it would only be Big Tech that would be able to run these sorts of models, or very well-funded industry R&D labs,” Nugent says.

Case Study: ESG Controversy

Firms like Refinitiv and Bloomberg have an advantage in NLP: access to massive amounts of data.

“The industry has had access to pricing and financial data, and other kinds of data, for a very long time. We are seeing more and more that firms are asking, ‘How do I differentiate myself in terms of pre-trade analysis, executing more effective trades?’ That area has been growing and driving our business of selling our news feeds and other document feeds to customers wanting to get an edge. We think there is a lot of value in this unstructured data, whether in pre- or post-trade,” says Nugent, who worked within Refinitiv’s Innovation Lab to build a transformer model that is now in use throughout the business.

Nugent has an academic background in life sciences, and he noticed that these fields have their own distinct discourses—different vocabularies, acronyms, and terminology. Life scientists have built their own BERTs—SciBERT and BioBERT—trained on bodies of field-relevant texts, on top of BERT’s already pre-trained texts.

Nugent believes that this kind of domain specificity is the best approach to make use of NLP, especially as the diversity of what is considered text is increasing.

“The diversity of text sources is increasing, and this makes it interesting for people building the models, because the more you look into it, the more you realize how variable language is between these different text sources. Twitter and social media are conversation—they can be humorous and sarcastic. Transcripts are not. So there are very distinct stylistic differences. To make most of those data sources, you need different models to maximize your performance for a given source of text,” Nugent says.

Building on this insight, Nugent thought BERT should act as a base that could be trained with texts relevant to the downstream tasks to which the model would eventually be applied. “The research questions was: Can we perform better at these business NLP tasks if we perform further pre-training using in-domain data—that is, business and financial data?” Nugent says.

The question then became, which data should he use? Refinitiv ingests huge amounts of news data for its Intelligent Tagging platform, which derives meaning from unstructured data, such as news articles, and processes millions of these document sources daily, making information available to customers on the Eikon terminal. Nugent had to think carefully about how best to use it.

“We have a news archive that goes back to 1996, covering 435 sources. So we could throw that in and train the model using all that. But there was a trade-off between quantity and quality,” Nugent says.

The news itself may have been high-quality as news, but was it high-quality in terms of its suitability for the downstream tasks it was to power?

“So rather than using all of the news archive, I decided to focus on Reuters stories, and specifically using rich material in Reuters news, I was able to pull out only articles that corresponded to business and financial news,” Nugent says.

Trimmed down, the dataset ended up at 700 million words, on top of the already pre-trained Wikipedia and BookCorpus data, for a total of about 4 million words. It required a lot of pre-processing and training time to get right, but Refinitiv is now applying the model to various use cases.

One of the first areas to which it was applied was environmental, social and governance (ESG) issues. Refintiv offers ESG scores to customers, and as part of these services, keeps note of controversies affecting corporations.

 “ESG is a hot topic; it’s a nice place to start,” Nugent says. “We have a lot of ESG data; a lot of it is annotated by our many analysts; we deem it to be high-quality; it has had a lot of human input. Analysts have scrutinized and classified it, and the specific controversy dataset into one of about 20 different ESG controversy types—things like privacy and environmental controversies.”

The way it works is a user shows the transformer model a news article and the model attempts to classify it into a controversy category.

“We ran this using standard BERT, and then we ran it using our domain-specific version of BERT. We didn’t worry too much about the absolute level of perfection—we were looking for the relative level of improvement between BERT and the domain-specific version of BERT,” Nugent says.

The former performed at 78% accuracy, while the domain-specific BERT scored 82%. “This is a 4% improvement in performance, and that is in line with the improvement in performance we saw in the life sciences research. So this confirms our suspicion that perhaps financial data is different enough from general domain text to justify a domain-specific model,” Nugent says.

Once this was verified, Refinitiv could widen the model’s application to the rest of the business. In content operations, for instance, deals and development teams are using it to extract relevant news from huge sets of documents for users. For example, a user might be looking for all the news on ESG-related M&A activity, and the model will help quickly and accurately identify which articles are actually about an ESG transaction. The World Check team, which collects information about financial risks of individuals, is also using it. And in Eikon, the model helps return the most relevant articles possible to users who search for news about companies in the terminal’s Investor Briefs.

“Within our news product, we want to get that information faster and more accurate. We work with the product teams on how we get this into the roadmap and which areas they want to apply it to,” Nugent says.

Reading the News

At Bloomberg, similarly, BERT is helping Terminal users access the most relevant financial information. Bloomberg takes in a huge number of articles every day and clusters them by topic or category, and then again by event. Relevant stories appear to Bloomberg Terminal users grouped under an automatically generated headline that the transformer itself has produced.

“So you aren’t just seeing a cluster for a company like IBM, for example: You are seeing several clusters for IBM that correspond to particular events that have happened to IBM in the past two days,” Bloomberg’s Stent explains. “We may end up with five or 20 events that have happened to that company in the past two days, and then we take each cluster of headlines and we use one of these transformer architectures to automatically construct a new headline, a summary of that cluster.”

The model is further trained by Bloomberg journalists, who can use little “thumbs up” and “thumbs down” icons—like the same buttons on streaming music service Pandora, which the company uses to tailor music channels more accurately to users’ tastes—to teach the model what humans consider to be good headlines. A bad headline gets a thumbs down, a good one a thumbs up.

“Now occasionally we get some horrible clunker, but we can train this solution with the thumbs up and the thumbs down so that over time we get fewer and fewer absolutely horrible ones,” Stent says. A “clunker” might be the model publishing a summary excluding the word “not,” which could make the headline completely factually inaccurate, she says.

Bloomberg is using the solution in other areas too, including helping customer service staff to answer user queries. “Our customer service analysts are subject matter experts with finance backgrounds. We used this very same kind of architecture to help provide good answers to our clients’ questions,” Stent says, adding that customer satisfaction has measurably improved since the tool was introduced.

Exactpro’s Treshcheva is researching ways to evolve how NLP is used at the company, a specialist in software testing. “We work with traditional financial systems—complex, non-deterministic systems—and we use different approaches to assist us in testing,” she says.

Exactpro already performs unsupervized machine learning on log messages from clients’ environments, obtaining clusters of data that can be monitored for behavioral changes. But it is also looking at how to test the software of artificial intelligence (AI)-based systems. Treshcheva says this is where BERT will provide insight, as these are classic NLP applications like chatbots.

“We are now looking at AI-based conversational agents (chatbots in banking and portfolio management services) and machine-readable news. These systems are very often built using machine-learning models, and we are now designing the approach to the verification and validation of such systems,” she says.

Future of BERT

BERT’s applications will extend beyond chatbots, however, Treshcheva says. Current research is looking at applying pre-trained transformer models (FinBERT) to financial sentiment analysis. AIG Investments researchers published a paper last year on using BERT for processing noisy financial text. More obviously, there are applications that are already being explored—the processing of regulatory data files and financial contracts, predictive models for systematic trading and fraud detection, new methods for portfolio optimization and risk management, Treshcheva says. 

Stent says, however, those transformer models have their limitations, and these will begin to emerge the more they are used. Researchers are already looking to the next step: grounding language models in broader contexts, as a human’s language is grounded in all the background knowledge of the world around them.

In the case of Bloomberg, this would mean grounding the transformer model in market data. “By having a model of the market, we gain an understanding of the world in which we can ground our language,” Stent says. “So for our products and our solutions, it’s not just a big language model. The model knows about what companies exist, what asset classes they are in, what products they trade, who runs them, and how they are related to government affairs. This kind of model doesn’t really exist yet, but it’s one of the things that researchers across the industry are working on.”

Researchers are also looking to make the models smaller. Transformer models are “over-parameterized,” Stent says: They have more parameters than can be properly estimated from the data. “They have more knobs in them than there is data to tune the knobs, and that is kind of an insane scenario for machine learning. That means if you could figure out how, you could make the model 10 or 100 times smaller and still get the same performance. But we don’t know how to do that squishing yet,” Stent says. DistilBERT is one such attempt at squishing.

These models also use substantial amounts of computing power, a problem in this carbon-conscious age. “We are now very aware of the environmental impact and the cost of what we are building,” Stent says. Another ethical consideration is that models can chew up huge amounts of data from humans who are biased, and the models then reflect those biases—be it gender or race or something else.

Whatever the hurdles, with BERT, NLP has made another leap in evolution, promising improved tasks and tools in a range of use cases. Perhaps one day, the bots will even make good jokes.

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

Data catalog competition heats up as spending cools

Data catalogs represent a big step toward a shopping experience in the style of Amazon.com or iTunes for market data management and procurement. Here, we take a look at the key players in this space, old and new.

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here