Teradata Magazine Cover Teradata Magazine Online  
Register Help Password
Password:
Quick Links
Current Issue
Archives
Teradata.com
Teradata Magazine Rss Feed
ARCHIVES Search Teradata Magazine Online:  
ENTERPRISE VIEW

PrintPrint

Send to colleagueSend to colleague
PDF (194 kb) E-mail us

EII, ETL or EAI: Making sense of the alphabet soup

Here is your common-sense guide to modern-day integration.

Ever since the first two programs were written to capture electronic data, we have tried to glue the data back together again. Integration has been the long-standing goal for many companies and fortunately we now have the technology to help us do that. Yet companies are still unclear where the features of one technology end and those of the other begin. What is each technology designed to do? When is it better to use one over another? Do the technologies complement or compete with each other?

The key to successful integration is an understanding of each integration technique and its capabilities. Then you will be able to determine when to use one or the other and why. Let's start out by defining the three integration techniques—data consolidation, data federation and data propagation—and the technologies closely aligned with these techniques.

Data consolidation is the set of processes that capture, cleanse, integrate, transform and load data into a target data store. Typically, data is consolidated from operational data sources and loaded into BI (business intelligence) data stores such as the data warehouse, ODS (operational data stores) and data marts by using ETL (extract, transform and load) technologies. The targets for the technologies are databases.

Data federation is the set of processes that provide a real-time integrated view of disparate data types from multiple sources, providing a universal data access layer. Typically, data federation applies EII (enterprise information integration) technologies to create on-demand or virtual data stores and views of data. The targets for these technologies are the end users in the business community.

Data propagation is the set of processes by which an organization centralizes and optimizes application integration through bulk data movement. Typically, data propagation applies EAI (enterprise application integration) technologies to connect applications in real time to support business-process automation. The targets for these technologies are the applications themselves.

Figure 1
Example of the three integration technologies—enterprise information integration (EII), enterprise application integration (EAI), and extract, transform and load (ETL)—working together.

All three techniques are used in various parts of a corporation's IT infrastructure—data consolidation for building permanent databases used for analytics or reports, data federation for creating virtual dashboards or reports, and data propagation for the transfer of data between applications.

Figure 1 gives you an example of how the three work together.

Comparing techniques

Now let's turn our attention to which techniques and technologies work best in specific situations. First, let's look at data federation and data consolidation. These two techniques seem to be causing confusion today, due to a perceived overlap in their functionalities. Figure 2 places these two techniques along a data integration time spectrum, from batch to real time. The more that data integration needs to be real-time, the more appropriate the data federation techniques become. Alternatively, batch and near real-time data integration needs fit the data consolidation techniques better.

Figure 2
Moving along the data integration spectrum, from batch to real time, requires different integration functionalities from data consolidation and federation.

Data consolidation: This technique is very useful when you need to produce well-documented and reliable data for near real-time and historical analyses—in other words, when creating a data warehouse or series of data marts for time series, multidimensional and other analytics, when creating integrated key master data (master data management), and when building compact historical data stores.

Data consolidation serves a unique long-term purpose, in which the data is used over and over as part of analytics that require accurate, technical metadata to support the overall integrity of the environment. Figure 3 is an example of data consolidation—in this case, the popular usage for creating a data warehouse.

Figure 3
With the data consolidation technique data is physically extracted, integrated, transformed and loaded into a data warehouse.

Data consolidation does have its challenges, though. It requires a thorough understanding of the data requirements for both strategic and tactical decision making. You must ensure that the appropriate data is extracted, transformed and loaded, making it ready for use by analysts. This process can take a long time to understand, design and implement, and it requires a commitment from the business community to create a sustainable data management strategy.

Another consideration is the establishment of audit trails from the start to ensure the consistency and reliability of the data loaded into the data warehouse or ODS. It is important to constantly monitor performance and efficiency of the consolidation design as well. This is influenced by your archive duration, data size and granularity, and load performances.

Data federation: This technique is useful when you need to create a common gateway with one access point and one access language to disparate data. It provides flexible and ad hoc access by end users or applications using an incremental design philosophy that allows access to only the data required to meet the business needs. It is particularly useful in supplementing data warehouse data with additional or real-time details in a virtual fashion. Figure 4 shows an example of data federation in which a virtual ODS is created.

Figure 4
With the data federation technique data is virtually extracted, transformed and integrated and a virtual ODS is created.

Another popular form of this technique is to view the various pieces of current operational data, like contact information, combined with recent campaign or sales information.

Data federation also has its challenges. Like data consolidation, it too requires a thorough understanding of the data requirements for both strategic and tactical decision making. And it requires a commitment from the business to create a data management strategy. For this technique, it is particularly important to constantly monitor performance and efficiency, both of which are greatly influenced by network speed, source performance, and data shape and size.

Data federation also requires thorough query planning and performance monitoring of the overall environment. Remember, you are accessing the live operational databases. To overcome some of the burden on operational systems, some EII technologies cache frequently requested data if the set is not too large.

Within the data integration spectrum, data consolidation and federation are very complementary of each other. The key is to understand each technique's strengths and weaknesses.

Data propagation: Now let's bring data propagation into the picture. Figure 5 shows a new integration spectrum going from data to process. The closer to process integration we get, the more a data propagation approach makes sense. The most appropriate technology for data propagation is EAI, although ETL may also be used for some bulk data movement.

Figure 5
The data propagation technique utilizes EAI through ETL for data movement and connects applications, such as business process automation, in real time.

Data propagation is useful when you need to connect applications in real time—for example, for business process automation. Typically it occurs when you make a change (usually a small set of records) in one application and need to reflect that change elsewhere in other applications. The technology is set up to ensure that the change is captured and delivered reliably.

A second, increasingly popular use for data propagation is the information feedback loop for BI. In this case, the results from certain analytics (e.g., customer segmentation and lifetime value results) are propagated back into the operational systems or ODS. In this way, customer-facing personnel can immediately view and act upon important BI results without having to run the actual analysis. Figure 6 is an example of this form of propagation.

Data propagation has its challenges as well. The biggest one is making sure that the applications did receive all the necessary sets of data correctly and in a timely fashion. What happens if the transfer fails? What happens if an incomplete set is transferred?

Secondly, the integration capabilities of the technologies may not be as sophisticated as those for ETL processing. In that case, you may be limited in the amount of integration and transformation that can be performed on the data during its transfer.

Figure 6
This data propagation example represents how the flow of results from "analytics" are propagated back into operational systems.

When not to use each technique

Obviously each technique and technology has its purpose. There are times when one is preferable over another. Let's finish up with a discussion on when not to use each technique. Note that there are exceptions to each of these, of course.

Data consolidation may not be the best technique for real-time data integration or for the movement of every data element from operations. In addition, it may have difficulty with highly dynamic reports or accessing and publishing XML data. It was designed to work in a batch mode and works best in that fashion.

Data federation should not be used to create a data warehouse—ever. A virtual data warehouse simply does not make sense. It should also not be used for volume data cleansing. The technique is not suitable for time series or data mining analytics. These require stable sets of data from a warehouse. It was designed to deliver requested data on demand and does not keep long-term copies of that data.

Data propagation should not be used for reporting and analytics and when working with large data sets or aggregated data. The reason is that the single-record, push-based architecture is not designed for pulling result sets from queries but for sending messages.

Summary

These three techniques are complementary of each other, and each has a proper place in your IT infrastructure.

Data propagation serves applications and process automation that requires data; data consolidation serves databases in support of decision making; and data federation serves business users and their decision-making process as well. It is your responsibility to review the business requirements thoroughly to determine which tools are appropriate for what integration problems arise. T

© Teradata Magazine-March 2006

RELATED LINKS:

Enterprise Application Integration and Active Data Warehousing
Additional Thoughts: ETL and ELT, RDBMS (Part 1)
Additional Thoughts: ETL and ELT, RDBMS (part II)
Tomorrow's Real-Time ETL Imperative


back to top




Copyright by Teradata Corporation 2001-2007.