Implementing a master data management solution framework.
by Mark Shainman
Many organizations understand the value of corporate data and the importance of managing it as an asset. Companies have sought to implement
solutions that enable them to more efficiently and effectively leverage this asset for analytical and operational corporate processes. The key
component of having good corporate data quality—that which is consistent and accurate—is directly linked to the proper management of master
data, which is defined as a company's reference data that is shared across operational and analytic systems and used to classify and define
transactional data. This includes customer and supplier lists, chart of accounts, bills of material, and organization, product and customer
hierarchies.
Companies face a difficult challenge in keeping master data consistent, complete and controlled across the enterprise—especially when disparate,
decentralized operational and analytical systems define and handle master data in different ways. Inconsistent and inaccurate master data can
cause numerous problems linked to analytical and operational processes, such as:
|
Long or poorly executed new product introduction cycle times. Product master data and workflows that are inadequately
modeled/synchronized across the enterprise can result in reactive, inconsistent and unpredictable new product introduction
programs.
|
|
Difficulty determining customer profitability. Multiple customer master data copies, unsynchronized and of poor quality,
and data across multiple channels, locations and/or the enterprise can cause redundant customer communications and missed
opportunities.
|
|
Vague understanding of supplier value. Inconsistent and unsynchronized vendor master data, product master data and bills of
material across an organization's departments and divisions inhibit the company's ability to get a holistic view of strategic
suppliers. This can lead to inefficient consolidation of spend and reduced leverage with suppliers on pricing and terms.
|
|
Inhibited compliance of regulatory requirements. Clean and consistent codes and hierarchies are essential to meeting
regulatory compliance and audit requirements. But this consistency is not a guarantee if master data is held in multiple
operational systems and/or used for multiple consolidation activities. For better compliance results, data should be centrally
located and managed.
|
|
Limited ability to create a true integrated enterprise data warehouse (EDW). Consolidating data marts leads to co-location,
not integration, of data. Master data inconsistencies affect the overall analytical data quality.
|
| enlarge |
|
The master data management (MDM) solution framework provides a map of each stage of the MDM initiative. Each stage
requires a focus on different disciplines of data management to reach a managed enterprise data asset business
capability.
|
|
For years, traditional data warehousing environments have attempted to rectify disparities in master data by leveraging extract, transform and
load (ETL) tools, along with custom code and third-party data-quality tools. These solutions, however, are fragmented, inflexible and unable to
holistically manage master data. ETL-centric or homegrown attempts also force the responsibility on IT for maintaining and updating the data.
Instead, these tasks belong with the domain experts on the business side.
New technologies, such as Teradata Master Data Management (MDM), can help solve these master data problems; however, the process is often more
important than the underlying technology. Without the correct process in place, the greatest technology is simply a large capital expenditure
with little to no benefit.
When implementing an MDM solution, organizations must consider a solution framework that follows, in many regards, a methodology similar to
creating an EDW. (See figure 1, above.) An MDM solution can and should run parallel with an EDW implementation. In fact, to achieve truly
integrated data within the warehouse, the MDM solution must be part of an EDW effort and implementation.
Selecting the proper MDM solution requires the combined effort of various members in your organization. It is crucial to put together a team
of data stewards and set up the cross-domain data governance rules that are leveraged in MDM. This data governance process must completely
overlay the MDM solution framework (see figure 2, below) so that the correct process and business rules are deliberately defined for each phase. This
is an important part of establishing the groundwork for a successful MDM initiative, because even with a great underlying technology,
governance and process are key.
The governance team must follow four phases to best implement an MDM solution framework:
| enlarge |
|
Each level in the data governance pyramid has different and important functions. These functions foster better
performance in the organization when complying with master data management requirements.
|
|
Phase 1: Define and profile
A critical component of an MDM initiative is assigning management responsibility for the master data. During this first phase, IT must work
with business to explicitly assign data stewardship responsibility to individuals and departments in the organization.
As mentioned earlier, it is crucial for a successful MDM process that this role is shared between IT and the data domain experts on the
business side. Since these experts understand the data, they should be tasked with the stewardship of that data.
Once the stewardship roles are established, the master data must be identified, categorized and prioritized. In most cases, the highest
priorities are the broad subject areas that have the greatest value and are most widely leveraged by multiple systems and users. While it may
be tempting to prioritize the easiest categories first, the easiest does not always carry the most value in the company, so continued
sponsorship and funding can be difficult when little value is seen. Instead, defining priorities should be based on value as weighed against
risk. An MDM initiative can start small with a single high-value domain, such as customer or product, but the eventual goal and end-state
should be a multi-domain, holistic MDM initiative.
To determine what data to manage through MDM, first identify the relevant objects and data elements. Because not all master data is equally
relevant, you must consider the following criteria:
|
Is the master data valuable to the data's consumers?
|
|
Is the data currently shared throughout the organization?
|
|
Is it possible the data could be shared?
|
Data that fits within any of these criteria should be considered relevant and, therefore, qualified for data management.
|
Criteria for evaluating data elements within a subject area:
| > |
How many data elements exist?
|
| > |
What is the overall lifetime capacity of the data elements?
|
| > |
Can the data element be reused across shared boundaries within the organization?
|
| > |
What is its value to the company?
|
| > |
How complex is each data element?
|
| > |
How volatile is the data element; how often does it change?
|
| > |
Which entities should be shared?
|
| > |
Can the entities be categorized in terms of behavior and attributes within the context of the
business needs?
|
—M.S.
|
|
Next, the data subject areas must be defined. While the subject areas should not be limited to specific domains, they often can be based on
categories such as customer or product. Best practice would entail defining an overall enterprise model and process for the master data subject
areas, even if all the data from all the domains is not initially addressed. The master data sources must also be understood and mapped.
During this initial phase, the governance team specifies the policies and business rules regarding how the master data is created and
maintained. The master data can be created within one or more existing operational systems, called a system of record—the place where the
master data is created. In an MDM environment, the master data can also be directly created in the MDM application, but most environments will
be composed of a combination of both. Maintaining the master data in the MDM application makes it so the MDM solution can also act as a system
of record for some of the master data.
This is the phase in which to describe any hierarchies, taxonomies or other relationships that are important to organizing and classifying the
master data objects that are to be managed and maintained.
Phase 2: Acquire and enhance
After the data has been classified and the data rules established, the rules are applied. The data integration process occurs in the staging
area through workflow processes and administration of the business rules defined earlier. This is the stage in which the governance team
examines how the data is extracted from the source systems and staged. Then data-quality functions are performed to clean, rationalize and
cross-reference the data.
The extraction process can be completed through multiple methods, but bulk data movement methods (e.g., ETL, flat file) are usually best for
the initial staging phase. Based on the organization's business needs, such as real-time extraction through the use of enterprise message bus,
and replication, other methods can be leveraged to support the continued update and maintenance required in Phase 3. When considering how to
achieve the initial data acquisition process, the following questions should be addressed:
|
Can the support staff use the tools they are familiar with to provide extracts?
|
|
What times are the most favorable for the extract procedure?
|
|
How will the data be transported from the source to the target repository?
|
|
What interface options does the source system provide?
|
|
Is an infrastructure in place that can be easily leveraged to extract the data from the source systems?
|
|
Mapping information about the master data during the acquisition and staging process
| > |
Source system. What to collect
|
| > |
Subject area name. Facet in the logical data model (LDM)
|
| > |
MDM table name. Table in the industry-specific LDM
|
| > |
MDM column name. Column in the industry-specific LDM
|
| > |
Source. The source system name from which the column will be mapped
|
| > |
Table/file. The name of the table or file containing the operational data
|
| > |
Column. The column name of the source system
|
| > |
Rule. Based on the transformation need
|
| > |
Comment. Any comment required to clarify the rule or uncertainty
|
| > |
Owner. The person who verified the rule
|
| > |
Action. Any action required for verification of the rule
|
—M.S.
|
|
The core component of an MDM application is its ability to set up and manage the business rules and workflows that cross-reference, manage and
enhance your organization's master data. During this initial data acquisition phase you can use the data-quality components of the MDM
application to create the data baseline. You can also leverage the data profiling tools to gain some insight into the baseline data. This
information can later be used to help design the data maintenance process.
Phase 3: Manage
When creating the system of reference where the master repository exists, an initial onetime bulk extraction process, from the staging area in
Phase 2 to the new repository, might be the method leveraged, but in most cases the master repository is simply an evolution of the staging
area in Phase 2.
During Phase 3, a single master repository for master data reference or system of reference is established. System of reference refers to
where the "golden copy" of the master data is maintained for reference or synchronization purposes. The repository contains the master
reference data (such as customer name, product name and bill of material) and master relationship data (for example, customer and product
hierarchies and product-to-supplier relationships).
The organization defines and implements its continual master data maintenance workflows during this third phase. Accordingly, the user
interfaces (UIs) leveraged by the data stewards occur as part of the implementation of the master data maintenance workflows. The ongoing
maintenance by the data stewards and improvement in data quality should always occur in this phase. Though considered in Phase 2, the update
frequency required for the ongoing maintenance of the master data is determined. As data demands adjust based on changing business needs, the
update frequency can change as well. In this environment you can track and maintain changes in master data over time, which enables you to
monitor and analyze data for trends.
When establishing your data maintenance workflows, consider these questions:
|
Does your organization have a best-practice life-cycle workflow for master data assets?
|
|
How do changes get reflected in the master data repository?
|
|
How do you detect and fix master data errors?
|
|
Where is a master data element created or deleted?
|
|
Do you have business processes defined for data entry and updates?
|
|
Are these processes automated in operations?
|
|
Do you have an enterprise business process workflow for each line of business involved in the framework?
|
You must also ask some critical infrastructure questions in the overall support and maintenance of your MDM solution, such as:
|
What type of system availability does the solution require?
|
|
What type of service level agreements must the solution support?
|
|
Is there a need for data to be published?
|
The answers to these questions can help determine the actual infrastructure needed to support your organization's solution.
Phase 4: Use
The clean and accurate master data is used in this fourth phase. You must examine how the data is going to be used and how frequently it will
be accessed. In the case of an EDW implementation, consider how the EDW will consume the master data within the analytical environment. Since
the MDM repository and the EDW exist on the same platform, the clean and accurate master data, cross-reference tables and hierarchies can be
easily published to the data warehouse using numerous methods, ranging from ETL tools to the INSERT SELECT SQL statement.
| enlarge |
|
Teradata allows flexibility in the physical model and how the data is used. Initially, master data management tables
can be dedicated tables in their own database, but they can also be views into tables in the larger enterprise data
warehouse.
|
|
Duplication of the master data within the master data tables to the warehouse is not always necessary. Based on how the master data record is
updated and the update frequency, you can simply create a semantic view on top of the MDM table to access for analytical purposes without
duplicating the data. (See figure 3.) This is just one benefit of the MDM repository co-existing on the same platform as the EDW.
Just as you must examine how the EDW will use the master data, for a broader MDM initiative you must also understand how it will be leveraged
by other systems. If any operational system's local data repositories need to subscribe to the enterprise MDM solution, you must determine how
to update those master data repositories. You will need to verify the existing infrastructure, such as an enterprise message bus or replication
technology, to substantiate the update. Also note the types of adaptors or environmental requirements that exist to update to those
applications.
With Teradata MDM the MDM processes can be exposed as Web services, enabling the application to be used not only as a publish-and-subscribe
method but also to directly access the master data. This allows new applications coming online to, in some cases, forgo a local master data
repository and simply leverage a Teradata MDM service for its master data needs.
MDM benefits
By leveraging a powerful MDM process in conjunction with a robust MDM technology, organizations can solve their problem of inconsistent and
inaccurate master data. They can improve their EDW environment, increasing the overall quality of the data in the warehouse and the accuracy
of master data throughout the enterprise.
MDM is the right solution for companies that wish to lower costs, increase their technological and informational scale, improve business
agility and strengthen enterprise decision making. T
Mark Shainman is the global program manager of the Teradata MDM solution. He also manages the Teradata Oracle Migration Program and works on
the company's strategy and market analysis teams.
Teradata Magazine-March 2008
|