Teradata Magazine Cover Teradata Magazine Online  
Register Help Password
Password:
Quick Links
Current Issue
Archives
Teradata.com
Teradata Magazine Rss Feed
ARCHIVES Search Teradata Magazine Online:  
TECH2TECH
Tech2Tech
table of contents

Ask the expert
Teradata Warehouse 8.0 meets the performance challenge with new functionality.

Quest for quality
Poor quality data can eat into your profit margins, but you don't have to settle for less.

Who's driving?
Let Teradata Warehouse 8.0's event-based features take the wheel.

Now playing
Teradata CRM 5.1 offers new capabilities for the customer-driven enterprise.

Flex your muscles
Now it's easier to create the right function for the job at hand.

Tech support
Hear the voice of experience! A Teradata Certified Master shares great tech tips.


Printable versionPrintable version Send to a colleagueSend to a colleague

Quest for quality

Are the pieces of your data puzzle
creating the right view?

For most businesses, data quality is like the weather. Everyone talks about it, but no one believes they can do anything about it. Unfortunately, doing nothing is no longer a viable option for companies that want to remain competitive.

Poor-quality data is costing corporations a significant portion of their revenue. "As much as 15% to 20% of their operating revenue-and if they're a not-for-profit organization, their operating budget-is spent doing things over, recovering from process failure and compensating customers. And these are only the direct costs," says Larry English, noted consultant and author of Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits.

According to English, the goals of any data quality management system should be to design for quality and to error proof the processes to prevent recurrence of defects. Data quality management is more than simple data cleansing. In fact, some automated data-cleansing techniques can introduce errors to information that was previously correct.

"One of my clients, who was a customer of his own company, decided to look at his own customer record," English explains. "He discovered that through the use of a householding routine, he became 'married' to his mother, who happened to live in the same house. He was single, and his mother lived with him. The householding routine deduced they were married."

English also gives the example of a woman he used to work with, whose first name is George. "She really didn't like it when she was addressed as Mr. or her gender status was changed to male," he relates. Through misguided attempts to improve the quality of their data, companies risked losing her as a customer.

Take a look at the big picture
As poor-quality information circulates throughout a corporation, it affects more and more business decisions. "People use the information to make decisions that are sub-optimized. They end up incurring costs of scrap and rework as a result, and so on down the food chain," says English. "That's the way it works.We don't operate in isolation." He suggests that companies need to identify and measure their own data quality issues to better understand the nature of the problems.

Successful data quality management may require you to reevaluate your processes in order to determine the root cause of a data problem. "Otherwise, you may be attacking the symptoms and only rearranging how the problem shows up," explains English.

He cites an insurance company that discovered 80% of its claims were being paid for the diagnosis of a broken leg. Upon investigation, the company discovered that claims processors were using the system's default diagnosis code. The insurance company's initial response might be to remove the system default and force the claims processors to enter a valid diagnosis code, but that would be treating the symptom rather than the cause.

"The Japanese use a technique that I call 'Why Analysis'-they tend to ask 'why' about five times," says English. He demonstrates how this technique could be used to ferret out the cause of the faulty insurance claims. "Why do you allow a system default? It's faster than entering a code. Why must you do it faster than entering a code? We don't have the time.Why don't you have the time? We get paid for how many claims we process in a day."

There's the root cause-the claims processors were being paid according to the number of claims they could process in a day. Any savings of the claims processors' productivity is lost multiple times by causing downstream processes, such as risk analysis, to fail. If you define your performance measures to measure speed instead of quality, you'll end up with a faster process but poorer quality data. That's a mistake at a time when, increasingly, company value is directly linked to data quality.

One of English's clients is an oil production company with three oil fields and two major customers.

"They have concluded that they are, in fact, not an oil company," English explains. "They said, 'Our assets are not the barrels of oil in reserve that we have, but the data that tells us how much oil we have.'"

He recalls how the company arrived at this conclusion. Employees were digitizing the company's paper data when they came across a well that showed zero barrels of oil in reserve. They determined there was a data quality problem in the original paper charts.

"When they corrected it, they discovered that, at the then-current rate of about $25 a barrel, the well had over $50 million worth of oil," says English. The oil well hadn't changed, nor had the land around it. Only the data had changed, and the company was richer by $50 million.

Develop a strategy
So how does a company initiate a data quality management system? The first step is to understand the problem.

"Many customers don't have a handle on how big the data quality issues are until they get the data into a central place and can profile it," explains Jonathan Klaus, Teradata's vice president of global solutions development.

He notes that data quality is a huge issue for any IT system, but companies with enterprise data warehouses have a significant advantage. "A centralized data warehouse focuses the data into one place," Klaus says. "You can create a single rules-based data quality process that looks at the enterprise data as a whole, rather than having to do it on 50 different operational systems."

Responding to a growing awareness of data quality among data warehouse users, Teradata recently launched a data quality program that combines Teradata's data quality consulting experience with software- based evaluation tools. Consulting includes a comprehensive data quality assessment, and the tools include the newly released Teradata Profiler.

Frank Capobianco is a Teradata data quality assessment analyst. "The first part of the data quality assessment is just getting to know the data at a higher level," he explains. "As you get to know the data, you may find things that look like anomalies. You would then use Teradata Profiler as a follow-up to the broad-based assessment to drill in on those specific anomalies. Teradata lets you get as specific as you want in terms of looking at the data."

The main advantage of Teradata Profiler is that it works within the Teradata Database. "It presents the user with a graphic window into the Teradata system, allowing the user to pick which databases, which tables and which columns to assess," says Capobianco. "Because it works inside the Teradata Database, you can easily keep up with the data quality score and track it over time. In other architectures, it's either more difficult in terms of unloading the data or you have to work on a sample of the data, and the results might not be as accurate."

Teradata Profiler isn't something a Teradata consultant runs once before walking out the door. "A key part of my consulting is to establish a baseline in terms of broad-based results and specific business rules," explains Capobianco. "Then I set in place the infrastructure to allow the profiling results to be replicated and extended. Usually on the last day or two, depending on the scope of the assessment, I provide training for the clients on how to use Teradata Profiler and give them all the metadata I created as part of the training, so they can carry it forward."

Teradata Profiler is only one of the tools that are available to the Teradata consultants. "We leverage third-party products to complement Teradata Profiler, which is designed to profile large data volumes quickly and efficiently within the database," explains Arlene Zaima, marketing manager for Teradata's data quality program. "Once we have identified problem areas, thirdparty products cleanse, correct and supplement missing data."

Solve the puzzle
When a company first implements a data warehouse, quality controls are built in. Shouldn't they be sufficient for maintaining quality data? Not necessarily. "As data warehouses evolve to support new business requirements and applications, so do the data quality requirements," says Zaima, who notes that some types of analysis do not demand the same accuracy as other types.

Typically, as a data warehouse matures, additional data is loaded from various sources analyzed by additional applications. Different applications may require the data to be in totally different formats.

"Our consultants can help the users understand their business goals and identify optimal data quality levels for those goals," Zaima explains. "You don't necessarily need 100% data quality, but you do want to ensure that the quality of the data meets your particular purpose."

Business goals and data quality have to be considered together, because ultimately each depends on the other. This interdependence is a core principle of Teradata's data quality program. It's also a key component of the consulting process.

"We define data quality in terms of what is important to your business," says Klaus. "We draw up a plan for the data quality investigation and determine how to measure it against your data. We measure the stuff we've done to see if it has yielded some benefit. Then we set in motion ongoing governance for the data process."

Reexamining data quality through your company's current business rules and policies can provide new perspective. "It allows you to step back from the firefight of everyday business-the operational systems-and really think about what you can do with the power of this data," says Klaus. T



An attractive data quality profile
Data quality is about more than just ensuring that information is accurate. It's about increasing confidence in analysis, understanding data relationships and making better decisions. Teradata Profiler can help businesses reach these goals.

The product includes descriptive statistics and visualization techniques. In addition, Teradata Profiler provides built-in intelligence to automate some data-profiling tasks. This feature determines the best data analysis to run based on type and results from preliminary analysis. The results are presented in a series of easy-to-interpret graphs and charts.

Other functions include:
> Values analysis-assesses data validity and completeness, and identifies obvious exceptions and suspect values.
> Frequency analysis-identifies duplicate values, performs structural analysis and determines referential integrity of keys between tables.
> Overlap analysis-counts overlapping key fields among pairs of tables and identifies referential integrity issues.
> Statistical analysis-identifies column distribution, assesses column quality and looks for anomalies.
> Histogram analysis-counts the occurrence of values in a series of numeric ranges (bins), which users can define in various ways.
> Scatter plots-identify relationships across two or three variable combinations and detect consistency violations.
-C.M.


Quality in a complete package
Teradata has always placed an emphasis on the quality of its services. Behind every solution is a broad range of business and technical services to help users make the most of their investment. The Teradata data quality program is no exception. It is supported by a robust set of data quality services, including:

> Data quality workshops that answer questions such as 'Why is data quality important to my business?' 'How does data quality impact business?' and 'How can we get started with our data quality implementation?'
> Data quality planning and assessments that identify the scope, objectives, deliverables, measurements and metrics of your data quality plan. Teradata consultants use Teradata Profiler to examine the data directly in the Teradata Database.
> Data quality engagements that address data quality problems once they have been identified. Actions might include changing certain business processes, improving database design, changing data load strategies or implementing in-database data conversions.
> Data quality monitoring and audits, where Teradata consultants can help you establish a data monitoring process to ensure the data complies with service-level agreements.
> Data quality governance that offers support for your internal data governance and stewardship process and helps ensure that data quality is implemented and maintained.
-C.M.

© Teradata Magazine-September 2004


RELATED LINKS:

Is your data clean?
Data Quality Management: Oft-overlooked key to affordable, high-quality patient care
Teradata's Data Quality Consulting Services


back to top



Copyright by Teradata Corporation 2001-2007.