Datactics Experiments with Knowledge Graphs

- By Wei-Shen Wong
- 17 Dec 2020

Tweet
Facebook
LinkedIn
Save this article
Send to
Print this page

Belfast-based Datactics is experimenting with knowledge graphs, which are becoming increasingly popular among financial services firms, and could improve a firm’s downstream data analysis, such as for fraud detection.

It is working on an internal project using datasets from the UK Companies House—which registers company information and makes it available to the public—to illustrate how firms can improve steps after aligning their data, particularly when the quality of information to work with is good from the start.

Fiona Browne, head of artificial intelligence (AI) at Datactics, says clients would typically use these datasets for downstream analysis like fraud detection.

“What we aim to show is that you can take in datasets and perform a proper matching, de-duplication, and so on, and you could potentially find things within these knowledge graphs that you may have missed without the proper [data] cleansing process at the start,” she says.

For the project, Datactics is applying its data quality engine to the construction of these entity groups.

“So it’s taking in data sources, which these graphs are then built from, ensuring the quality of those data sources—so applying the concepts of your profiling data quality, cleansing, matching, de-duplication, and so on—to feed into graph databases. For example, we often hear about graph databases like Neo4J and the visualization of the graphs, but a lot of the hard work is actually getting to that stage,” she says.

For example, Browne says the UK’s Companies House datasets are “incredibly messy,” even though they are government datasets.

“First, by tidying up and then standardizing the datasets, you’re able to then match it to another dataset—for example, persons of significant control. So, the persons of significant control dataset contains information on, maybe, directors who own shares within the company. And whenever you model that together in a knowledge graph, what this can show is maybe a director has a link to another company, which can be inferred through links to other companies and to other people,” she says.

It is this inferring knowledge that can be extracted using knowledge graphs. Browne says this process won’t be possible with messy data.

Alex Brown, CTO at Datactics, explains that the serious problems with those datasets are mostly due to human input errors, such as spelling.

All the information on a database like the one provided by the UK Companies House, comes directly from either a form filled out by hand, or typed in on the website whenever they’ve formed a company, or when they declare a change in share capital structure, for example.

“All of that information has to be entered manually. And consequently, as a result, there are lots of errors in it. For example, if I’m the owner of a company, I’m a director and my name is Alex Brown. There’s nothing preventing me, in another update, from putting my name as A. Brown,” he says. “[The process] inherently leads to a multitude of data quality problems. We see things like A. Brown being the same person as Alex Brown or the same person as Alexandra Brown, and they might be associated with three different companies, but they [also] might be the same person.”

This is where using knowledge graphs can be useful—to find links that were previously unknown due to discrepancies in the data. Brown says these links can then be used for risk analysis for the know-your-customer (KYC) onboarding process, or for phoenixing, a practice of insolvent companies setting up new companies under slightly different names, usually to evade creditors.

“I remember it was a year ago, maybe a bit longer than that, a big focus of the Financial Conduct Authority (FCA) was trying to take on these phoenixes. And a lot of it is hindered by the fact that people are able to manipulate the data and do what they like when they register the company without the ability to [do that] centrally and the banks have to then try and find it when they’re actually carrying out their own due diligence on providing facilities to those people. And that can be made much more complex if there isn’t a reliable central repository of data,” he adds.

The project Datactics is working on is still in the research-and-development stage. It also has yet to finalize how it will productize and monetize the product. But Brown says a key feature that will come out of it is that Datactics will have an off-the-shelf configuration that will allow customers to build knowledge graphs themselves using Datactics’ technology platform.

“Other areas we might productize is providing the knowledge graphs themselves, moving into publishing the data rather than the actual platform and technology to the model. We haven’t decided yet whether or not we’ll do that ” he says.

More on Data Management

The IMD Wrap: Talkin’ ’bout my generation

As a Gen-Xer, Max tells GenAI to get off his lawn—after it's mowed it, watered it and trimmed the shrubs so he can sit back and enjoy it.

23 Oct 2023

Waters Wavelength Podcast: The issue with corporate actions

Yogita Mehta from SIX joins to discuss the biggest challenges firms face when dealing with corporate actions.

20 Oct 2023

Data catalog competition heats up as spending cools

Data catalogs represent a big step toward a shopping experience in the style of Amazon.com or iTunes for market data management and procurement. Here, we take a look at the key players in this space, old and new.