Datactics Experiments with Knowledge Graphs
The company aims to show that pairing good quality data with knowledge graphs can lead to links that previously would have been missed.
Belfast-based Datactics is experimenting with knowledge graphs, which are becoming increasingly popular among financial services firms, and could improve a firm’s downstream data analysis, such as for fraud detection.
It is working on an internal project using datasets from the UK Companies House—which registers company information and makes it available to the public—to illustrate how firms can improve steps after aligning their data, particularly when the quality of information to work with is good from the start.
Fiona Browne, head of artificial intelligence (AI) at Datactics, says clients would typically use these datasets for downstream analysis like fraud detection.
“What we aim to show is that you can take in datasets and perform a proper matching, de-duplication, and so on, and you could potentially find things within these knowledge graphs that you may have missed without the proper [data] cleansing process at the start,” she says.
For the project, Datactics is applying its data quality engine to the construction of these entity groups.
“So it’s taking in data sources, which these graphs are then built from, ensuring the quality of those data sources—so applying the concepts of your profiling data quality, cleansing, matching, de-duplication, and so on—to feed into graph databases. For example, we often hear about graph databases like Neo4J and the visualization of the graphs, but a lot of the hard work is actually getting to that stage,” she says.
For example, Browne says the UK’s Companies House datasets are “incredibly messy,” even though they are government datasets.
“First, by tidying up and then standardizing the datasets, you’re able to then match it to another dataset—for example, persons of significant control. So, the persons of significant control dataset contains information on, maybe, directors who own shares within the company. And whenever you model that together in a knowledge graph, what this can show is maybe a director has a link to another company, which can be inferred through links to other companies and to other people,” she says.
It is this inferring knowledge that can be extracted using knowledge graphs. Browne says this process won’t be possible with messy data.
Alex Brown, CTO at Datactics, explains that the serious problems with those datasets are mostly due to human input errors, such as spelling.
All the information on a database like the one provided by the UK Companies House, comes directly from either a form filled out by hand, or typed in on the website whenever they’ve formed a company, or when they declare a change in share capital structure, for example.
“All of that information has to be entered manually. And consequently, as a result, there are lots of errors in it. For example, if I’m the owner of a company, I’m a director and my name is Alex Brown. There’s nothing preventing me, in another update, from putting my name as A. Brown,” he says. “[The process] inherently leads to a multitude of data quality problems. We see things like A. Brown being the same person as Alex Brown or the same person as Alexandra Brown, and they might be associated with three different companies, but they [also] might be the same person.”
This is where using knowledge graphs can be useful—to find links that were previously unknown due to discrepancies in the data. Brown says these links can then be used for risk analysis for the know-your-customer (KYC) onboarding process, or for phoenixing, a practice of insolvent companies setting up new companies under slightly different names, usually to evade creditors.
“I remember it was a year ago, maybe a bit longer than that, a big focus of the Financial Conduct Authority (FCA) was trying to take on these phoenixes. And a lot of it is hindered by the fact that people are able to manipulate the data and do what they like when they register the company without the ability to [do that] centrally and the banks have to then try and find it when they’re actually carrying out their own due diligence on providing facilities to those people. And that can be made much more complex if there isn’t a reliable central repository of data,” he adds.
The project Datactics is working on is still in the research-and-development stage. It also has yet to finalize how it will productize and monetize the product. But Brown says a key feature that will come out of it is that Datactics will have an off-the-shelf configuration that will allow customers to build knowledge graphs themselves using Datactics’ technology platform.
“Other areas we might productize is providing the knowledge graphs themselves, moving into publishing the data rather than the actual platform and technology to the model. We haven’t decided yet whether or not we’ll do that ” he says.
Further reading
Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.
To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe
You are currently unable to print this content. Please contact info@waterstechnology.com to find out more.
You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. Printing this content is for the sole use of the Authorised User (named subscriber), as outlined in our terms and conditions - https://www.infopro-insight.com/terms-conditions/insight-subscriptions/
If you would like to purchase additional rights please email info@waterstechnology.com
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. Copying this content is for the sole use of the Authorised User (named subscriber), as outlined in our terms and conditions - https://www.infopro-insight.com/terms-conditions/insight-subscriptions/
If you would like to purchase additional rights please email info@waterstechnology.com
More on Data Management
The IMD Wrap: Talkin’ ’bout my generation
As a Gen-Xer, Max tells GenAI to get off his lawn—after it's mowed it, watered it and trimmed the shrubs so he can sit back and enjoy it.
Waters Wavelength Podcast: The issue with corporate actions
Yogita Mehta from SIX joins to discuss the biggest challenges firms face when dealing with corporate actions.
Data catalog competition heats up as spending cools
Data catalogs represent a big step toward a shopping experience in the style of Amazon.com or iTunes for market data management and procurement. Here, we take a look at the key players in this space, old and new.
This Week: JP Morgan, Broadridge, Lloyds, JSE, Schroders, and more
A summary of the latest financial technology news.
What firms should know ahead of the DSB’s UPI launch
Six jurisdictions have set deadlines for firms to implement the derivatives identifier, with more expected to follow.
Has cloud cracked the multicast ‘holy grail’ for exchanges?
An examination of how exchanges—already migrating to the cloud—are working to solve the problem of multicasting in a new environment.
Waters Wrap: Market data spend and nice-to-have vs. need-to-have decisions
Cost is not the top factor driving the decision to switch data providers. Anthony looks at what’s behind the evolution of spending priorities.
The consolidated tapes are taking shape—but what shape exactly?
With political appetite established on both sides of the Channel, attention is turning to the technical details.