Register | Log in


Subscribe Now>>
Home Tech2Tech Features Viewpoints Facts & Fun Teradata.com
Fresh Perspectives
Send to Colleague

Garbage in, garbage out

If you want better analytics, start with better data integration

by Rob Armstrong, Director of data warehouse support at Teradata

Businesses these days are under immense pressure to be as nimble as possible. People need to detect situations quicker, correlate the data points more frequently and have access to deeper, as well as wider, analytics in order to make the best decision and thus take the best action in a short time frame.

This is not a new phenomenon. Often I am asked how to get better analytics out of the data warehouse and information environment. One place to look for this answer is in the foundation of data, process and system integration. That is where the whole analytical process is started, and that is the first place to ensure it is providing the best competitive basis for complete and accurate analytics.

As I was thinking about this idea of integration, I kept coming back to the two of the most recognized sayings on the topic. Even though they were referring to social integration, they are relevant here, so let's explore how they can be applied to our viewpoint of integration in the data warehouse arena.

Separate but not equal
The most pertinent aspect of data integration is consistency and agreement of data definition at the most basic level. Data needs to mean the same thing regardless of the physical location in your architecture, whether that is the source system, operational data store, data mart or the core repository.

Relationships between the data elements must be maintained as they are moved and used by various applications. Security, auditing and results measurement must be integrated into total processes. In this case, much of the physical separation argued for will actually be detrimental and the phrase "separate but not equal" rings true.

This is a large topic, so let's look at a few of the particulars in detail.

Model integration
I have written about this often, as it is the basis of all the integration efforts. If you do not do the work to integrate the data at the model level, then subsequent process and application efforts will suffer. This does not necessarily mean that every data element and every domain and range constraint needs to be defined before anything else happens, but you should have a good idea of the big subject areas and the linkage elements that hold the data model together. Some linkage elements are data such as customer number, part number, account number and region code. These are the data elements that must be consistent for the entire model to work properly.

Another area of the model that needs to be integrated is particular reference data such as begin- and end-time framing. For example, one table may have date windowing and use "12312999" as the default end date. If other tables use different default end dates, then applications must determine which table is being accessed to know which end date to use. This inconsistency will cause no end of problems and confusion when you try to drive better analytics across your company.

Lastly, referential integrity (RI) must also be instituted in the model. This can be enforced as "hard RI," in which the database checks data integrity before adding data, or "soft RI," meaning the integrity is verified through the extract, transform and load (ETL) processes and the database is simply aware of the relationships. This is important because while many companies have an integrated model, the model can fall apart if the data loaded is not clean and consistent.

Master data management
One way to ensure that the reference data is integrated is through master data management (MDM). In the past few years MDM has gotten much more play as companies begin to understand that "separate is unequal." Spreading the master reference data across the numerous platforms exacerbates the challenge of maintaining consistent and integrated data feeds.

Master data needs to be just that, the master record of the data that represents the truth. Master data is the data of record and needs to be accessible to all applications in real time so front-line applications and ETL processes have up-to-the-minute reference data.

Because of the timeliness and low-latency need of the master data, it is critical that the data elements are stored once and reused; reference data changes should not be propagated out to other platforms. If other platforms required constant and simultaneous updating, certain problems such as time lag could arise. This would lead to uncertainty as to whether the same update has been applied to multiple places. Furthermore, possible conflicts in data transformation effort could result, depending on when the application was created and its system of reference. In short, when it comes to data integration the clear advantage is in the centralization of data.

Separate but equal
Some data can be housed in separate data repositories but still have the same, consistent value among all of those repositories. The important thing is that companies provide a level playing field and maintain information consistency across the environment.

Of course, the Teradata opinion is that to achieve a level playing field, the physical integration of the environment requires leveraging the enterprise data warehouse and data model. However, in reality, not all processes and applications will be—nor should they be—driven from the data warehouse environment. Some applications will be in separate data marts. Ideally, the data will eventually be integrated into the data warehouse for historical and analytical content. In these cases, the idea of "separate but equal" holds true.

Application integration
Different applications run in many different areas of the company—and even span industries. There are front-end kiosks, Web applications, internal business applications and even cross-industry supply chain applications, such as the sharing of seat availability and flight schedules by airlines. Clearly all of these applications will have their own requirements, data needs and latencies. Despite the fact that they are separate entities, they still need consistent data definitions and key metric calculations, as well as access to consistent key reference information.

How do companies achieve equality across the applications and competing analytics? They bring the theme of integration into the application development and promote data reusability instead of data transport. Companies that require the development groups to leverage the integration efforts at the data layer and provide users direct access to that data can ensure that the same data is presented to the end consumer—regardless of the application used.

Metric integration
Integrating security, data auditing and service level expectations in the company's development process is another way to ensure application equality. The benefits are best illustrated with an example of when the opposite procedure is applied.

At one company, the users and developers were not leveraging the data warehouse much; instead, developers used other data platforms for their application needs and users simply extracted from operational systems into spreadsheets. When asked about this, they had a shocking answer: To use the data warehouse, data had to pass a quality benchmark; other platforms did not require the same level of data quality and thus it was "easier and faster" to not use the data warehouse.

It's no surprise why the developers and users preferred to use the non-integrated alternative platforms when the integrated data warehouse required more effort and cost.

Companies will commonly have disparate applications—this isn't the problem. In fact, disparate applications are part of a well-architected solution. The problem arises when users get inconsistent answers from separate applications. This occurs when the metrics and thresholds for quality, security and auditing are different among the disparate applications. That's when separate becomes unequal.

Process integration
While maintaining separate procedures across an enterprise is inevitable, the goal is to have equality and consistency across the processes. One way to integrate the environment into a cohesive whole is by leveraging such processes as the logon, audit tracking and reporting tools.

Aside from these technical aspects of processes, the integration of outcome and messaging carries additional importance. Companies need to ensure their customers are recognized and treated consistently over the spectrum of processes. As a customer, whether I interact with a company via call center, Web kiosk or at the store, I am the same customer and, as such, should have the same perceived value to the company. For example, if I am a big spender in my local retail stores, when I go to that company's Web site and identify myself, I should be treated as a valued total customer, not a first-time visitor or a Web-only shopper.

Outbound messaging relates to the correspondence a company maintains with its customers. These messages must be not only consistent but timely and appropriate as well. Nothing is worse than sending multiple—and sometimes conflicting—offers to the same customer, or to a customer that recently had a bad experience. The customer's information must be accessible so that processes can integrate the messages and the intended outcomes at a corporate, rather than functional, level.

Of course, all of this integration methodology must be driven by executive commitment and consistent strategy. A company's upper management must support the theory and practice of the integration process. For in the end, the following statement will always be true:

Integration cannot be forced; it must be embraced
This gets to the crux of the matter: If organizations try to force integration, they will end up with more resistance than ever.

The better approach is to help individual users understand the value integration brings to their analytics. This knowledge will motivate them to embrace the change and work with the efforts to achieve that goal. For business users, it means more accuracy and agreement across reports and analytics, and more cooperation and understanding between the business applications and functions.

For the IT community, integration decreases data redundancy and data movement, promotes higher reusability of application code and toolsets, and enables faster time-to-development when new user requests come into the IT queue.

Finally, for executives, integration means stronger accountability and trust in the information that drives strategy. It also promotes greater confidence in the numbers and results presented to the regulators and Wall Street. And with integrated data, these executives gain increased opportunities to focus on more profitable and relevant actions. Is your company resisting data integration? If so, what is the foundation of that resistance? Which individuals or groups in your organization are not seeing the value of the integration efforts?

Chances are, they see only the hard work that is required and the probable increase in cost. By showing the benefits of integration, you will surely persuade them to not only change but also become champions of that change. T

Teradata Magazine-June 2008

Related Link

Reference Library

Get complete access to Teradata articles and white papers specific to your area of interest by selecting a category below. Reference Library
Search our library:


Protegrity

Teradata.com | About Us | Contact Us | Media Kit | Subscribe | Privacy/Legal | RSS
Copyright © 2008 Teradata Corporation. All rights reserved.