Failure to provide guidance to users of business intelligence results in under-utilization of the enterprise data warehouse.
by Dan Riehle
"MythBusters" is a popular television show that attempts to scientifically test commonly held beliefs. IT
has its own "myth" the show should address: "Build a business intelligence (BI) system and the organization will optimize its use." The
fact is, no matter how great a BI component is, if the end users are not familiar with its capabilities, the system is basically useless.
When a BI component is rolled out without marketing it or providing usage guidance, the results are under-utilization of the
BI asset and impeded growth of the enterprise data warehouse (EDW). Rollout plans for a BI system must go beyond training on
canned reports; high value comes from BI users who guide themselves to solve strategic and tactical information-based challenges.
Education on BI systems should be seen as marketing to the users (guiding them) on how to find data or create reports to help
solve a problem or make innovative use of the BI systems. These concerns can be addressed by providing BI users with access
to metadata, commonly called a repository.
Components of a repository
Data is essential to an organization, and access to that data is critical to business success. BI users may have untethered
access to the broad expanse of data, but if the data is hidden, ambiguous or unknown to the user then it can be confusing, useless
or even dangerous. To be useful, data must be identified, categorized and defined.
Terminology (name or term usage) in organizations has always been ambiguous. A concept is known by many names, and a
given name can refer to multiple concepts. Breaking through the ambiguity is one way to optimize data usefulness to BI users.
Resolving these conflicts and offering BI guidance is a function of a metadata repository.
Building a metadata repository requires a meta-discovery, followed by building a meta-model, delivering acquisition and load
processes into a single version of the meta-truth and finally providing easy Web access to the resultant repository. A robust metadata
management solution must also make extensions available to address organizational guidance and governance, because no two
organizations will have the same needs. These extensions provide the three fundamentals to guidance in a BI environment:
|
Glossary. Is there an index for usages of this name?
|
|
Source. What is the system of origin for data with this name?
|
|
Target. What reports or views carry data associated with this name?
|
Metadata management solutions that answer these questions may help fulfill an organization's governance duties as well,
such as the regulatory requirements of the Sarbanes-Oxley Act (SOX) and Basel II. More significantly, these solutions address
guidance needs, which are the less-technical set of rules and definitions designed for an organization's business users.
| enlarge |
|
The above multiple-glossary repository, drawing from the inheritance feature of Teradata Meta Data Services, identifies words and terms to fulfill an organization's governance and guidance needs.
|
|
By providing a glossary, resolution of disparate terminology from source systems, whether homegrown, vendor-purchased or
merger-acquired, can be achieved. Often, BI purists try changing or standardizing terms; however, this tactic is prohibitively expensive
and invasive. A better and more practical solution is to provide a glossary. A glossary strategy should include a single enterprise
list of terms possibly augmented with several localized glossaries.
Developing glossaries
Any repository is first a dictionary, so glossary design should be executed during the meta-modeling effort. This is when it is
determined whether multiple glossaries are needed, or whether a single glossary will support various area-based dialects.
The following definitions are concepts within an object-oriented repository, with the corresponding relational concept in
parentheses: A repository (instance) is made up of models (schemas/databases). Models consist of classes (tables), which have properties
(columns) and are populated by objects (rows). Relationships (cross-reference tables) hold collections of objects from one class
that have associations with objects in another class. A powerful concept in object-oriented systems is inheritance, where one class
(subclass) might inherit properties and relationships from another class (superclass). When represented in diagrams, subclasses are
wholly contained within the superclass.
| enlarge |
|
Lineage and specification lineage, or spinage, provide audit trails that identify the flow of data within an organization from its source to its target. Left: Essence of BI development with spinage. ACLUP is a generic term that means acquire, cleanse, load, utilize and provide data to users. Right: Metamodel for spinage.
|
|
Figure 1 is an example of a multiple-glossary repository that draws from the inheritance feature. A glossary class is
populated using the name property of all objects from all subclasses of a superclass called sweeper. For any glossary/sweeper
combination there is a relationship that provides drilldown, via inheritance, to the objects with a given name. This vocabulary
is transparent in the Web presentation layer where a user is presented with a specific glossary based on the user profile and can
see and drill to all terms regardless of use.
If a repository has more than one glossary, then one of the sweeper classes is designated as the core sweeper. All other sweepers are
subclasses of the core sweeper. The glossary related to the core sweeper is the enterprise glossary, which can be the intersection or the
union of all the other glossaries.
Lineage
But guiding BI users involves more than providing a glossary. Modern repositories should have source and target portions to
address other guidance (and governance) needs. The fundamental way to provide these capabilities is through the implementation
of lineage. Lineage is an audit trail of how data flows through the various systems, from how data comes into an organization
(source) to how the data is used (target).
The reasons to build lineage capabilities into a metadata management solution are many. Because it details each step from
source to target, lineage will help address an organization's guidance issue (to enable self-education on systems and
impact analysis) and governance issue (which, by indicating where data came from and how it is transformed, will fulfill
regulatory requirements).
Lineage capabilities can be provided relative to the load utilities and view definitions, but customization is necessary to have
robust and detailed source-to-target information. This requires broader lineage details plus summarizations to make it
intelligible enough to be used for guidance.
| Discover Teradata Meta Data Services |
|
Teradata Meta Data Services (MDS) is an object-oriented and comprehensive solution for managing technical and business
metadata in a Teradata environment. MDS allows users to identify, consolidate, understand, manage and navigate technical,
business and lineage metadata, and it provides facilities to integrate related metadata from other sources. MDS helps
identify impact analysis, data redundancy, data relationships, change management and standardize definitions. Tightly integrated
with the Teradata Database, MDS can be easily extended to meet additional metadata management needs.
Out of the box, once the data transform specifications are established and loaded into the MDS repository, they are automatically
available in a Web presentation layer, known as MetaSurf. While MDS automates the acquisition of Teradata's metadata,
extensions must be made to provide guidance or governance because no two organizations will have the same needs.
Teradata provides some lineage capabilities relative to the load utilities and view definitions, but customization is necessary
to have robust and detailed source-to-target information. MDS also supports spinage which typically does not require
any special summarization steps to be understood. By accessing spinage through the MDS presentation layer, BI communities
can be systematically guided to where data originates and terminates in the system, and technical communities have a Web-based impact analysis tool.
MDS is one of the most extensible and customizable metadata solutions. Teradata Professional Services can assist in tailoring
the solution to support an organization's diverse glossary needs, localized population schemes, implementation and maintenance
process, as well as the organization's unique guidance and governance requirements.
—Lisa Slutter
|
|
Building lineage is typically a three- to six-month process, and maintaining it is expensive. Some tools save their lineage
metadata in easily accessible formats; other tools have a proprietary meta-store that requires coding or expensive export routines
to acquire their lineage information. Also, the processes for lineage loading are characteristically complex, because lineage terms
often contain run-time variables to make the cleansing and transform rules re-usable.
Specification lineage
Specification lineage, or "spinage," a term I have coined, is an alternative to lineage. The specification of the extract, transform and
load (ETL) rules to load the EDW and the specifications of rules for BI usages of the EDW are requisite steps in building a BI
solution. Spinage is merely the loading of these specifications into the metadata repository and using them as an alternative
way to provide source and target guidance, including impact analysis. (See figure 2 above.) A benefit of using spinage for guidance
from the repository is that the ETL and BI specification tasks could be strengthened.
The process to capture spinage is centered in the modeling phase of the BI effort. This is when the modelers identify how the data
is to be sourced, what rules to apply and how the data will be used in BI tools by various users. The specifications, typically
developed in spreadsheets, provide the foundation for spinage and are needed by the ETL and BI teams to initialize their tasks.
Once the transform specifications are established and loaded into the metadata repository, any object-oriented metadata
solution should be able to make them automatically available in a Web or other presentation layer. Typically, spinage does
not require any special summarization steps to be understood. By accessing spinage through a presentation layer, BI
communities can be systematically guided to where data originates and terminates in the system, and technical communities
have a Web-based impact analysis tool.
However, while spinage provides forward and backward impact analysis relative to the logical data model, there is one important
caveat: Unlike the standard lineage, which provides actual details on sources and targets, spinage shows only the intent of the
sourcing and use of data, and will not satisfy SOX, Basel II or other emerging requirements. Lineage is thus a governance cost,
while spinage is a much lower guidance cost.
Knowledge promotes trust and growth of BI components
The business value of supporting data lineage or spinage through the implementation of a metadata management solution
leads to greater BI user productivity and a decrease in IT support cost. While project managers applaud the tightening of data
management specification processes, users develop a greater trust in BI tools. This confidence will facilitate the growth of the EDW.
To promote the use of BI tools for the success of the organization, IT workers must formally recognize the need to include
marketing of BI solutions in projects and use modern repositories as the central BI guidance tool. The entire organization
benefits when the "build it, they will come" myth is dispelled. T
Dan Riehle, an independent consultant on BI, EDW and metadata, has worked for various DBMS vendors and was a member of the
ANSI committee that first established the standards for extensible repositories.
Teradata Magazine-December 2007
|