|
EII, ETL or EAI: Making sense of the alphabet soup
Here is your common-sense guide to modern-day integration.
by Claudia Imhoff
Ever since the first two programs were written to capture electronic data, we have tried to glue the data back together again. Integration has been the long-standing goal for many companies and fortunately we now have the technology to help us do that. Yet companies are still unclear where the features of one technology end and those of the other begin. What is each technology designed to do? When is it better to use one over another? Do the technologies complement or compete with each other?
The key to successful integration is
an understanding of each integration technique and its capabilities. Then you will be able to determine when to use
one or the other and why. Let's start
out by defining the three integration techniques—data consolidation, data
federation and data propagation—and the technologies closely aligned
with these techniques.
Data consolidation is the set of processes that capture, cleanse, integrate, transform and load data into a target data store. Typically, data is consolidated from operational data sources and loaded into BI (business intelligence) data stores
such as the data warehouse, ODS
(operational data stores) and data marts by using
ETL (extract, transform and load)
technologies. The targets for the
technologies are databases.
Data federation is the set of processes that provide a real-time integrated view of disparate data types from multiple sources, providing a universal data access layer. Typically, data federation applies EII (enterprise information integration) technologies to create on-demand or virtual data stores and views of data. The targets for these technologies are the
end users in the business community.
Data propagation is the set of processes by which an organization
centralizes and optimizes application integration through bulk data movement. Typically, data propagation applies EAI (enterprise application integration)
technologies to connect applications in real time to support business-process automation. The targets for these technologies are the applications themselves.
|
Figure 1
|
|
Example of the three integration technologies—enterprise information integration (EII), enterprise application integration (EAI), and extract, transform and load (ETL)—working together.
|
All three techniques are used in
various parts of a corporation's IT
infrastructure—data consolidation for building permanent databases used
for analytics or reports, data federation for creating virtual dashboards or reports, and data propagation for the transfer of data between applications.
Figure 1 gives you an example of how the three work together.
Comparing techniques
Now let's turn our attention to which techniques and technologies work best in specific situations. First, let's look at data federation and data consolidation. These two techniques seem to be causing confusion today, due to a perceived overlap in their functionalities. Figure 2 places these two techniques along a data integration time spectrum, from batch to real time. The more that data integration needs to be real-time, the more appropriate the data federation techniques become. Alternatively, batch and near real-time data integration needs fit the data consolidation techniques better.
|
Figure 2
|
|
Moving along the data integration spectrum, from batch to real time, requires different integration functionalities from data consolidation and federation.
|
Data consolidation: This technique is very useful when you need to produce well-documented and reliable data for near real-time and historical analyses—in other words, when creating a data warehouse or series of data marts for time series, multidimensional and other analytics, when creating integrated key master data (master data management), and when building compact historical data stores.
Data consolidation serves a unique long-term purpose, in which the data is used over and over as part of analytics that require accurate, technical metadata to support the overall integrity of the environment. Figure 3 is an example of data consolidation—in this case, the popular usage for creating a data warehouse.
|
Figure 3
|
|
With the data consolidation technique data is physically extracted, integrated, transformed and loaded into a data warehouse.
|
Data consolidation does have its challenges, though. It requires a thorough understanding of the data requirements for both strategic and tactical decision making. You must ensure that the appropriate data is extracted, transformed and loaded, making it ready for use by analysts. This process can take a long time to understand, design and implement, and it requires a commitment from the business community to create a sustainable data management strategy.
Another consideration is the establishment of audit trails from the start to ensure the consistency and reliability of the data loaded into the data warehouse or ODS. It is important to constantly monitor performance and efficiency of the consolidation design as well. This is influenced by your archive duration, data size and granularity, and load performances.
Data federation: This technique
is useful when you need to create a common gateway with one access point and one access language to disparate data. It provides flexible and ad hoc access by end users or applications using an incremental design philosophy that allows access to only the data required to meet the business needs. It is particularly useful in supplementing data warehouse data with additional or real-time details in a virtual fashion. Figure 4 shows an example of data federation in which a virtual ODS is created.
|
Figure 4
|
|
With the data federation technique data is virtually extracted, transformed and integrated and a virtual ODS is created.
|
Another popular form of this technique is to view the various pieces of current operational data, like contact information, combined with recent campaign or sales information.
Data federation also has its challenges. Like data consolidation, it too requires
a thorough understanding of the data requirements for both strategic and tactical decision making. And it requires a commitment from the business to create a data management strategy. For this technique, it is particularly important
to constantly monitor performance and efficiency, both of which are greatly
influenced by network speed, source
performance, and data shape and size.
Data federation also requires thorough query planning and performance monitoring of the overall environment. Remember, you are accessing the live operational databases. To overcome some of the burden on operational systems, some EII technologies cache frequently requested data if the set is not too large.
Within the data integration spectrum, data consolidation and federation are very complementary of each other. The key is to understand each technique's strengths and weaknesses.
Data propagation: Now let's bring
data propagation into the picture.
Figure 5 shows a new integration
spectrum going from data to process.
The closer to process integration we get, the more a data propagation approach makes sense. The most appropriate technology for data propagation is EAI, although ETL may also be used for some bulk data movement.
|
Figure 5
|
|
The data propagation technique utilizes EAI through ETL for data movement and connects applications, such as business process automation, in real time.
|
Data propagation is useful when you need to connect applications in real time—for example, for business process automation. Typically it occurs when
you make a change (usually a small
set of records) in one application and need to reflect that change elsewhere
in other applications. The technology
is set up to ensure that the change is
captured and delivered reliably.
A second, increasingly popular use for
data propagation is the information
feedback loop for BI. In this case, the results from certain analytics (e.g.,
customer segmentation and lifetime value results) are propagated back into the operational systems or ODS. In
this way, customer-facing personnel
can immediately view and act upon important BI results without having to run the actual analysis. Figure 6 is an example of this form of propagation.
Data propagation has its challenges
as well. The biggest one is making sure that the applications did receive all the necessary sets of data correctly and in
a timely fashion. What happens if the transfer fails? What happens if an
incomplete set is transferred?
Secondly, the integration capabilities of the technologies may not be as sophisticated as those for ETL processing. In that case, you may be limited in the amount of integration and transformation that can be performed on the data during its transfer.
|
Figure 6
|
|
This data propagation example represents how the flow of results from "analytics" are propagated back into operational systems.
|
When not to use each technique
Obviously each technique and technology has its purpose. There are times when one is preferable over another. Let's finish up with a discussion on when not to use each technique. Note that there are exceptions to each of these, of course.
Data consolidation may not be the best technique for real-time data integration or for the movement of every data element from operations. In addition, it may have difficulty with highly dynamic reports or accessing and publishing XML data. It was designed to work in a batch mode and works best in that fashion.
Data federation should not be used to create a data warehouse—ever. A virtual data warehouse simply does not make sense. It should also not be used for volume data cleansing. The technique is not suitable for time series or data mining analytics. These require stable sets of data from a warehouse. It was designed to deliver requested data on demand and does not keep long-term copies of that data.
Data propagation should not be used for reporting and analytics and when working with large data sets or aggregated data. The reason is that the single-record, push-based architecture is not designed for pulling result sets from queries but for sending messages.
Summary
These three techniques are complementary of each other, and each has a
proper place in your IT infrastructure.
Data propagation serves applications
and process automation that requires data; data consolidation serves databases in support of decision making; and data federation serves business users and
their decision-making process as well.
It is your responsibility to review the business requirements thoroughly to determine which tools are appropriate for what integration problems arise. T
Claudia Imhoff, president of Intelligent
Solutions, is a popular speaker as well as
an internationally recognized expert on
customer relationship management, business
intelligence and the infrastructure to
support these initiatives, the corporate
information factory. She has co-written
five books on these subjects. She writes
monthly columns, has an expert channel
and maintains a blog, which can be found
at www.B-EYE-Network.com. She also
contributes to many other publications.
© Teradata Magazine-March 2006
back to top
|