Ping An Asset Management zooms in on NLP models for sentiment analysis

The asset management arm of Ping An Insurance (Group) Company of China is enhancing its NLP models to solve complex, non-linear challenges such as overfitting.

NLP

Ping An of China Asset Management (PAAMC) in Hong Kong—the asset management arm and a wholly-owned subsidiary of China’s largest insurer, Ping An Insurance (Group) Company of China—is looking to upgrade its natural language processing (NLP) models, particularly to account for Chinese sentiment analysis. 

Chi Kit Chai, head of capital markets and chief investment officer at PAAMC, tells WatersTechnology that vectorization of different words is the key to the success of any NLP algorithm. Vectorization is a methodology in NLP that maps words or phrases to a corresponding vector of numbers to find word predictions, similarities, and semantics. 

He says that it can be harder to break down Chinese words in a meaningful way, but PAAMC aims to do so with contextual information and sentiment analysis.

Breaking down Chinese characters is something Boston-headquartered PanAgora Asset Management has also dealt with. The asset manager developed its own machine-learning models to track chat and blog conversations in Chinese to determine market sentiment. 

In 2019, Mike Chen, director of equity and head of sustainable investments at PanAgora Asset Management, told WatersTechnology that its solution relies on an entire corpus used to train the NLP model of languages to track conversations. The challenge is less in the use of different languages and more in the use of slang or other words in markets-related conversations.

PanAgora deals with slang words in the Chinese internet community by waiting for them to gain prominence before it updates its NLP library. 

“The library is the natural-language processing model. It just keeps on updating. When a new cyber slang word gains prominence, if the algorithm sees it sufficiently enough times, it will pick up on it. It’s fully automated and self-updating,” he said. 

The work Chai and his team at PAAMC are doing to better analyze Chinese sentiment analysis will go into the firm’s overall machine-learning framework. The framework combines deep-learning neural networks, gradient boosting machines, and advanced regression models. 

Chai says in artificial intelligence terms, these combinations are called “ensemble methods.” The framework contains non-linear models, which help PAAMC capture factor interactions and non-linear patterns hidden in alpha signals. On top of that, it provides low correlations among multiple models that can further increase Sharpe and information ratios—measurements that help investors determine the risk-adjusted returns of a security or portfolio. 

“Our framework merges the generation of alphas and alpha-weighting algorithms using machine learning techniques. For factor interaction and non-linearity, an example is the leverage of a company. Linear relations can only assume a company’s performance is proportional to its debt ratios. As a matter of fact, debt ratios bear non-linear patterns with a company’s performance,” he says. 

It uses historical structured and unstructured data—including news, price movement information, macroeconomic inputs, and company-specific accounting information—to train its AI algorithms. It monitors more than 300 factors and selects between 20 and 50 factors to construct its portfolios every month.

According to Chris Vera, associate director at asset and wealth management consulting firm Shoreline, using non-linear models means there is a greater ability to incorporate multiple variables to draw conclusions. A linear model would incorporate perhaps two or three inputs to get an output. “Something like, based on these two things, the stock will go up, risk will go down, for example,” he says. 

In contrast, non-linear models can be used to describe text. “When you and I talk—the sentences we send to each other—we need to put long non-linear formulas to describe [the conversation] because we can use lots of different words and we can construct sentences in different ways. It’s more complicated than sending each other numbers because words are quite difficult to describe. There’s context, there’s language, there’s tone, there’s volume, there’s dialect,” he says.

Building on knowledge

The asset manager, which manages over $440 billion of assets, is able to leverage technologies and, perhaps more importantly, ideas from all the other units that sit under its parent company. Ping An Group has three main business segments—insurance, banking, and investment—all of which are supported by its technology arm.

While the asset management business benefits from applying technology that has already been developed in other areas of the group, Chai says different problems require varied solutions from multiple application domains. 

For example, Omni-Sinitic—the machine-learning framework that the group developed, which has in the past bested companies like Microsoft, Google, Alibaba, Huawei, and Facebook in the General Language Understanding Evaluation (Glue) benchmark that is used to evaluate natural-language understanding systems—is useful for NLP problem-solving.

For PAAMC, the focus is different. “Here we focus on solving problems in finance and investment. We deal with NLP in our machine-learning framework from a different perspective as we face different challenges. We also put a lot of emphasis on dealing with overfitting when we process very noisy data to extract high-confidence alpha signals,” he says. 

Overfitting occurs when a model learns the detail and noise in the training data, to the point that it affects the model’s performance on new data. 

As Shoreline’s Vera puts it, overfitting is a stumbling block that happens in data science “when you go from walking to running, and then you trip, scrape your knee, figure out what you did wrong, and then you start to run again.” 

Overfitting tends to happen when data scientists throw too much compute power at a model, he adds. “This is where you need to take a step back because you can’t just train a lot of data, come up with something that is like a complicated jigsaw puzzle piece and assume that jigsaw puzzle piece can be used for other hypotheses. That’s overfitting,” Vera says.

According to Vera, PAAMC seems to have achieved “machine-learning sophistication” ahead of other asset management firms. “They’re well beyond the use-cases of forecasting liquidity, forecasting changes in risk, formulating portfolio construction—that’s all linear. When you’re dealing with overfitting, you’ve moved on to non-linear, and non-linear problems are a lot more data-hungry; they’re a lot harder to explain. If you’ve gotten to the point of overfitting non-linear, then you’ve been on non-linear for a good amount of time,” Vera says. 

This could be due to how PAAMC leverages its parent company’s technology expertise.  

PAAMC’s NLP models use news data and corresponding sentiment scores to rank different stocks. As for the overfitting challenge, Chai says machine learning has different solutions to handle overfitting, including cross validation and regularization. “We also use cross-market validations,” he says.

In terms of alternative datasets, PAAMC uses Chinese texts from different media, which Chai says provide signals that are robust and that have low correlations to other alpha streams it has. These streams include fundamental, macro, and price data. 

Chai says the robustness of the NLP signals depends heavily on the robustness and sophistication of the models. “It is something we spend a lot of time on to differentiate ourselves from others,” he adds.

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

Data catalog competition heats up as spending cools

Data catalogs represent a big step toward a shopping experience in the style of Amazon.com or iTunes for market data management and procurement. Here, we take a look at the key players in this space, old and new.

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here