Article

Understanding "Know Your Customer" (KYC)

Discover how banks navigate "Know Your Customer" (KYC) regulations and tackle the complexities of defining customers and relationships.

Gary Class
Gary Class
May 7, 2025 3 min read

"Know Your Customer" (KYC) is a catch-all term used to describe the process of verifying the identity of customers in financial institutions. This process is crucial for complying with various regulatory requirements that differ slightly by region. In the U.S., the Currency and Foreign Transaction Reporting Act of 1970 introduced the Customer Due Diligence (CDD) rule to enhance financial transparency and prevent money laundering. The CDD rule requires banks to:

  1. Verify the identity of the beneficial owners of accounts
  2. Understand the nature of the customer's relationship with the account
  3. Keep ownership records up to date

One major challenge for banks is to create a flexible yet robust definition of both "customer" and "relationship." A customer can be a natural person (e.g., Joe Smith) or a legal entity (e.g., Joe’s Bar & Grill, LLP), as specified in the account holder agreement. The beneficial ownership of accounts can be complex and sometimes intentionally hidden, especially in areas like commercial real estate, where each building might be owned by a limited partnership with many members. Banks can use sophisticated entity resolution algorithms to enhance account ownership data and clarify the relationships between natural and legal persons.1

What is entity resolution?

Entity resolution is the process of determining whether two different descriptions refer to the same real-world entity. For example, it helps to identify if "John Q. Public" and "Jonathan Q. Public" are the same person. The consistency between two sequences of characters, such as the names of two business entities, can be evaluated using the “string similarity” algorithm supported in ClearScape Analytics®. A related approach is a “string-based edit-distance” algorithm, such as Levenshtein distance, where the distance between two words is the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into the other. The known correct name, or positive reference, is compared to alternative names (which may contain missing words, misspellings, abbreviations, etc.), and the concordance between the positive reference and the alternative is assessed by the algorithm.

Using language models for entity resolution

Researchers are now using language models (LMs) for entity resolution tasks. Large language models (LLMs) can help resolve ambiguities in textual data. LLM-based entity resolution algorithms are preferred in cases where it's costly to gather and maintain a large training set because they perform well even without similar examples for training. Using an open-source language model on local hardware can be an efficient alternative if you have task-specific training data and domain-specific matching rules.2

Vector embeddings are numerical representations of datapoints (like words) that capture their meanings and relationships. Words that are related in real life are also close together in vector space, converting unstructured data into a structured format that preserves context. The Teradata Enterprise Vector Store arranges these embeddings for quick retrieval through intelligent search.

Banks can take an ensemble modeling approach to entity resolution by leveraging a language model to create vector embeddings for the entities and then use these embeddings as features in a binary classification model, like XG Boost, to assess the probability of a match. Note that XG Boost is an algorithm available via ClearScape Analytics®.

Conclusion

KYC regulations require banks to manage and monitor the integrity of the data that they collect during the account opening process. To identify the ultimate beneficial owners of these accounts, banks increasingly leverage sophisticated entity resolution algorithms. Teradata supports KYC compliance with tools for master data management and entity resolution.


  1. “Information on Complying with the Customer Due Diligence (CDD) Final Rule,” U.S. Treasury Department Financial Crimes Enforcement Network, https://www.fincen.gov/resources/statutes-and-regulations/cdd-final-rule.
  2. “Entity Matching Using Large Language Models,” Ralph Peeters, Aaron Steiner, and Christian Bizer, 2024.
Tags

About Gary Class

Gary is an accomplished industry strategist with extensive experience in financial services, where he has made significant contributions to advanced analytics and AI. Gary spent over three decades at Wells Fargo Bank as the Director of Advanced Analytics at the forefront of innovation during the transformational era of “anytime, anywhere” banking. His visionary leadership has shaped the landscape of financial services through innovation, data-driven insights, and strategic thinking.

View all posts by Gary Class
Stay in the know

Subscribe to get weekly insights delivered to your inbox.



I consent that Teradata Corporation, as provider of this website, may occasionally send me Teradata Marketing Communications emails with information regarding products, data analytics, and event and webinar invitations. I understand that I may unsubscribe at any time by following the unsubscribe link at the bottom of any email I receive.

Your privacy is important. Your personal information will be collected, stored, and processed in accordance with the Teradata Global Privacy Statement.