Quest for quality
Are the pieces of your data puzzle
creating the right view?
by David English
For most businesses, data quality is like the weather. Everyone talks
about it, but no one believes they can do anything about it. Unfortunately,
doing nothing is no longer a viable option for companies that want to
remain competitive.
Poor-quality data is costing corporations a significant portion of their
revenue. "As much as 15% to 20% of their operating revenue-and if they're
a not-for-profit organization, their operating budget-is spent doing things
over, recovering from process failure and compensating customers. And
these are only the direct costs," says Larry English, noted consultant
and author of
Improving Data Warehouse and Business Information Quality:
Methods for Reducing Costs and Increasing Profits.
According to English, the goals of any data quality management system
should be to design for quality and to error proof the processes to prevent
recurrence of defects. Data quality management is more than simple data
cleansing. In fact, some automated data-cleansing techniques can introduce
errors to information that was previously correct.
"One of my clients, who was a customer of his own company, decided to
look at his own customer record," English explains. "He discovered that
through the use of a householding routine, he became 'married' to his
mother, who happened to live in the same house. He was single, and his
mother lived with him. The householding routine deduced they were married."
English also gives the example of a woman he used to work with, whose
first name is George. "She really didn't like it when she was addressed
as Mr. or her gender status was changed to male," he relates. Through
misguided attempts to improve the quality of their data, companies risked
losing her as a customer.
Take a look at the big picture
As poor-quality information circulates throughout a corporation, it affects
more and more business decisions. "People use the information to make
decisions that are sub-optimized. They end up incurring costs of scrap
and rework as a result, and so on down the food chain," says English.
"That's the way it works.We don't operate in isolation." He suggests that
companies need to identify and measure their own data quality issues to
better understand the nature of the problems.
Successful data quality management may require you to reevaluate your
processes in order to determine the root cause of a data problem. "Otherwise,
you may be attacking the symptoms and only rearranging how the problem
shows up," explains English.
He cites an insurance company that discovered 80% of its claims were being
paid for the diagnosis of a broken leg. Upon investigation, the company
discovered that claims processors were using the system's default diagnosis
code. The insurance company's initial response might be to remove the
system default and force the claims processors to enter a valid diagnosis
code, but that would be treating the symptom rather than the cause.
"The Japanese use a technique that I call 'Why Analysis'-they tend to
ask 'why' about five times," says English. He demonstrates how this technique
could be used to ferret out the cause of the faulty insurance claims.
"Why do you allow a system default? It's faster than entering a code.
Why must you do it faster than entering a code? We don't have the time.Why
don't you have the time? We get paid for how many claims we process in
a day."
There's the root cause-the claims processors were being paid according
to the number of claims they could process in a day. Any savings of the
claims processors' productivity is lost multiple times by causing downstream
processes, such as risk analysis, to fail. If you define your performance
measures to measure speed instead of quality, you'll end up with a faster
process but poorer quality data. That's a mistake at a time when, increasingly,
company value is directly linked to data quality.
One of English's clients is an oil production company with three oil fields
and two major customers.
"They have concluded that they are, in fact, not an oil company," English
explains. "They said, 'Our assets are not the barrels of oil in reserve
that we have, but the data that tells us how much oil we have.'"
He recalls how the company arrived at this conclusion. Employees were
digitizing the company's paper data when they came across a well that
showed zero barrels of oil in reserve. They determined there was a data
quality problem in the original paper charts.
"When they corrected it, they discovered that, at the then-current rate
of about $25 a barrel, the well had over $50 million worth of oil," says
English. The oil well hadn't changed, nor had the land around it. Only
the data had changed, and the company was richer by $50 million.
Develop a strategy
So how does a company initiate a data quality management system? The first
step is to understand the problem.
"Many customers don't have a handle on how big the data quality issues
are until they get the data into a central place and can profile it,"
explains Jonathan Klaus, Teradata's vice president of global solutions
development.
He notes that data quality is a huge issue for any IT system, but companies
with enterprise data warehouses have a significant advantage. "A centralized
data warehouse focuses the data into one place," Klaus says. "You can
create a single rules-based data quality process that looks at the enterprise
data as a whole, rather than having to do it on 50 different operational
systems."
Responding to a growing awareness of data quality among data warehouse
users, Teradata recently launched a data quality program that combines
Teradata's data quality consulting experience with software- based evaluation
tools. Consulting includes a comprehensive data quality assessment, and
the tools include the newly released Teradata Profiler.
Frank Capobianco is a Teradata data quality assessment analyst. "The first
part of the data quality assessment is just getting to know the data at
a higher level," he explains. "As you get to know the data, you may find
things that look like anomalies. You would then use Teradata Profiler
as a follow-up to the broad-based assessment to drill in on those specific
anomalies. Teradata lets you get as specific as you want in terms of looking
at the data."
The main advantage of Teradata Profiler is that it works within the Teradata
Database. "It presents the user with a graphic window into the Teradata
system, allowing the user to pick which databases, which tables and which
columns to assess," says Capobianco. "Because it works inside the Teradata
Database, you can easily keep up with the data quality score and track
it over time. In other architectures, it's either more difficult in terms
of unloading the data or you have to work on a sample of the data, and
the results might not be as accurate."
Teradata Profiler isn't something a Teradata consultant runs once before
walking out the door. "A key part of my consulting is to establish a baseline
in terms of broad-based results and specific business rules," explains
Capobianco. "Then I set in place the infrastructure to allow the profiling
results to be replicated and extended. Usually on the last day or two,
depending on the scope of the assessment, I provide training for the clients
on how to use Teradata Profiler and give them all the metadata I created
as part of the training, so they can carry it forward."
Teradata Profiler is only one of the tools that are available to the Teradata
consultants. "We leverage third-party products to complement Teradata
Profiler, which is designed to profile large data volumes quickly and
efficiently within the database," explains Arlene Zaima, marketing manager
for Teradata's data quality program. "Once we have identified problem
areas, thirdparty products cleanse, correct and supplement missing data."
Solve the puzzle
When a company first implements a data warehouse, quality controls are
built in. Shouldn't they be sufficient for maintaining quality data? Not
necessarily. "As data warehouses evolve to support new business requirements
and applications, so do the data quality requirements," says Zaima, who
notes that some types of analysis do not demand the same accuracy as other
types.
Typically, as a data warehouse matures, additional data is loaded from
various sources analyzed by additional applications. Different applications
may require the data to be in totally different formats.
"Our consultants can help the users understand their business goals and
identify optimal data quality levels for those goals," Zaima explains.
"You don't necessarily need 100% data quality, but you do want to ensure
that the quality of the data meets your particular purpose."
Business goals and data quality have to be considered together, because
ultimately each depends on the other. This interdependence is a core principle
of Teradata's data quality program. It's also a key component of the consulting
process.
"We define data quality in terms of what is important to your business,"
says Klaus. "We draw up a plan for the data quality investigation and
determine how to measure it against your data. We measure the stuff we've
done to see if it has yielded some benefit. Then we set in motion ongoing
governance for the data process."
Reexamining data quality through your company's current business rules
and policies can provide new perspective. "It allows you to step back
from the firefight of everyday business-the operational systems-and really
think about what you can do with the power of this data," says Klaus.
T
David English, a technology and business writer based
in Greensboro, N.C., has written for PC World and Fortune.