Teradata Magazine Cover Teradata Magazine Online  
Register Help Password
Password:
Quick Links
Current Issue
Archives
Teradata.com
Teradata Magazine Rss Feed
ARCHIVES Search Teradata Magazine Online:  
ENTERPRISE VIEW
A bowl of alphabet soup
Enterprise application integration (EAI)-the ability to link transactions across multiple systems through existing applications like SAP, PeopleSoft, I2 and Oracle Financials. EAI is typically a push system that is not utilized for batch transformation.

Extract-transform-load (ETL)-generally used to move large sets of data, transform it mid-stream and load it in the target system(s). ETL is usually a pull system; however, some vendors are heading toward push/pull ETL.

Extract-load-transform (ELT)-transformation occurs in the database after the loading cycle has been accomplished. ELT is generally a pull system with massive data sets.

Extract-transform-load-transform (ETLT)-a combination of in-stream and in-database transformations. ETLT is generally a pull system utilized mostly in a batch-oriented or small-batch (timed) fashion.

Extract-load in real-time (ELR)-the ability to connect to any system, extract data and load it into another system in real-time, with some transformation occurring within the RDBMS. ELR could be a push or a pull system for sending data through the pipes.

Extract-load in real-time with dynamic restructuring capabilities (ELRD)-automatically detects changes to structures and reacts to those changes automatically. ELRD will require a bit of engineering "magic" to develop.

 border=

Printable versionPrintable version Send to a colleagueSend to a colleague

Tomorrow's real-time ETL imperative

Vendors-and businesses-must adopt real-time strategies
in order to survive.

ETL (extract-transform-load) has become a commodity in the marketplace. Nearly every data warehousing vendor offers it, and most DBAs understand what it takes to prepare data for loading into a data warehouse.

ELT (extract-load-transform) is not as well-known and requires a bit more skill to architect and utilize. However, it too is becoming mainstream. In fact, there are some tools that offer GUI-driven ELT capabilities with metadata management, making this method very accessible.

But change is in the air. ETL must shift to survive, and while ELT is part of the solution, it's not enough. I believe that advances in real-time will change ETL and ELT into something I call ELR (extract-load in real-time). ETL and ELT vendors that don't adapt might not make the cut.

What's happening?
Dealing with data in transformational layers in real-time instead of in batches changes the architectural data flow. It requires that we change the way we think about data. For example, transformation elements such as data mining and data cleansing will be introduced downstream instead of upstream.

Performance throughput will also change. ELR, with its highly focused transactional sessions and heavy transformation logic, will slow down throughput significantly, but because these tools will be dealing with much smaller volumes of data with greater frequency, latency may become a non-issue.

Industry consultants are seeing the following trends developing in this space:

> Data volumes have more than doubled, and business needs are driving that growth. With the advent of RFID technology, volume will only continue to increase, presenting new challenges to all vendors.
> Data quality has turned into the idea of "knowledge management through data mining." In-database data mining is the wave of the future; it's becoming nearly impossible to export large data sets, reformat them and push them through algorithms outside the database.
> Thanks to active data warehousing, data is bypassing ETL completely and being deposited by enterprise application integration (EAI) and other mechanisms directly into the enterprise data warehouse, requiring that transformation be embedded in the DBMS systems. (This may not be true for data warehouses that still operate in batch mode.)
> Batch modes of the future will be small and fast, except in the cases where data can sit until a batch window can be opened and the data moved en masse.
> Government and international regulations suggest that auditors can request to view data as it appeared in every step of the ETL process. This places a tremendous load on the data warehouse, and it puts the onus on the business itself to maintain scrupulous data records.

As a result, many businesses are changing the way they view, record, collect, manage, interpret and transform data. Therefore, data warehouses need to become better at tracking "before" and "after" pictures of these data modifications. Transformations that take place within the database are a perfect way to track the process-to produce an audit trail, if you will.

Data warehouses are fast becoming "active" data warehouses, and they are beginning to look, act and respond as independent operational systems.

Additionally, businesses are required to make faster, better decisions than ever before. As a result, the ETL industry is being pushed towards moving less data more frequently and, of course, at a faster rate. These changes also give rise to on-demand requests for information that previously was unavailable.

What does that mean to the transformation process? We no longer have the time to transform the data in-stream. Extract and load routines have to bring the data in as is, and then, depending on the business's needs, do the transformation either on the way into the data mart or warehouse (for OLAP and strategic analysis) or on a transaction-by-transaction basis inside of a data-mining algorithm.

So what's the bottom line?
Volume, latency and functionality have hit a convergence point. Businesses no longer have the time to perform massive sets of transformation on massive data volumes before making tactical decisions. Traditional ETL methods create bottlenecks because in some cases they offer a single-point solution for either batch or near-real time, but they don't offer a complete view of the data.

In order to survive going forward, ETL engines will need to shift their focus to either ETLT (extract-transform-load-transform) or ELRD (extract-load in real-time with dynamic restructuring capabilities). ETL tools have evolved to handle metadata, scheduling and volume without batting an eye.
Offbeat harmony
Click to enlarge

The feature shift
Click to enlarge

While transformation mid-stream will never go away entirely, there is a growing need to tie transformation to in-database data mining, in-database data quality, in-database ranking and scoring, in-database integration, in-database parallelism, and in-database scalability, fault tolerance and fail-over. Moreover, ETL vendors must offer in-database management of the structures, combined with business rules and transformation logic. Impact analysis should be a single stream of findings, and changes to table structure should be automatically adapted by transformation logic, data mining, profiling, quality and cleansing routines.

Please don't discount the ETL engines just yet-they are mature and exist for very good reasons. ETL vendors still have a few tricks up their sleeves, as do all companies that have good direction. But they will have to change or they will not survive. ETL and ELT will have a place in the new market, but only if they incorporate real-time capabilities, not to mention the ability to handle metadata and business rules and to adapt to and integrate best-of-breed solutions in each of these areas.

Let's look at the issue another way: ETL wins when a business has huge data volumes to transform quickly and consistently from a metadata standpoint. It wins when the database machine does not have enough power or is widely distributed, geographically speaking. It also works if multiple targets exist on disparate platforms, but only if there is a high-speed gateway between the RDBMS engines.

On the other hand, ELT wins when the business has a high-powered, highly scalable database engine like Teradata that is capable of running inserts, updates and deletes in parallel-not to mention data mining, profiling, cleansing and segmentation-all from a SQL command line.

ELT might struggle if the transformation logic must be built and then distributed across physically separated data warehouses. In this case, in-stream transformation would work better.

However, if the data warehouse is based on a single integrated version of the data, then having the transformation occur within the database might prove more effective because it can take advantage of all the parallelism and power within an RDBMS engine such as Teradata.

ETLT gives the database architect and designers a choice between ELT and ETL, allowing them to use the best solution for the current need. It is my opinion that the ETLT engine of the future will be ELR and will include visual leverage points from both ETL and ELT.

In other words, we will have the tools necessary to build source- and target-related SQL and transformations, and we will be able to include them in the database as throwaway or persistent SQL elements.

Some transformations will include data mining, cleansing and/or massively parallel transformation ability. But all will include real-time capacities, even attaching triggers to existing database systems to capture data automatically.

Looking ahead
ETL won't go away, but it will morph into something else in the future. Today's vendors must focus on "connecting" to everything and providing more near real-time integration features and database-specific generation specialties to survive. They must leverage their utilization and expertise in metadata and begin applying business rules and not just technical rules to the information systems.

I believe that the glory days of being "just ETL" or "just ELT" are over. The days of ETLT and ELRD are just beginning, particularly in terms of visual business rule interfaces.

The new ETLT tool of the future will contain a Web-based business workflow diagramming solution, much like EAI, that end-users can design on three levels: business rules, encapsulation routines and low-level code. (See the "Future shock" sidebar for more features.) The end result will be high-speed, dynamic data, along with structure triggering and manipulation.

Current ETL vendors that I think have an opportunity to change the future of this space are (in alphabetical order): Ab Initio, Ascential, Hummingbird and Informatica. And the RDBMS vendor that is best suited to handle this change is Teradata.

Without a doubt, they face a tremendous challenge. But the future is in their hands, and that future looks bright indeed. T


Future shock
Tomorrow's ETL (ELRD) engines must include all the features mentioned in the article "Symphony of two" (which appeared in the 2Q 2002 issue of Teradata Magazine), along with these additions:

> Dynamic structure configuration/detection of target and source structure changes

> Business rule integration with business rules at the database level

> Stronger dynamic metadata exchanges and delta change detections

> Real-time messaging stream focus, while batch still resides as a data movement operation

> In-database transformation to take advantage of SQL query optimizers, parallelism, fault-tolerance, fail-over, redundancy and the new data-mining algorithms including data quality, imputed values, segmentation, scoring and transformation at the SQL level

> New scheduling capabilities close to EAI-delayed data queuing and data aging

> Advancements on "connect anywhere rapidly without writing code"

> Secure, encrypted transmissions

> Better focus on data delivery and integration of vendor-to-vendor products with connectivity to everywhere

> Web Business Workflow components

> The ability to connect anywhere to anything at any time, read/write, including mainframe, XML, XQuery, flat files, "text files"/unstructured data, searching-locating, loading, etc.

> The ability to build mappings from templates

> The ability to handle ELT or ETL depending on switches and business rules

> The ability to offer real-time and architectural change suggestions to existing data flow mappings

> The ability to gauge performance based on machine specifications, providing some level of self-monitored metrics in a learning mode, resulting in internal tuning and mapping suggestion wizards

© Teradata Magazine-September 2004

RELATED LINKS:

Symphony of two
Dan Linstedt's online column on the ETL/ELT debate Part I
Dan Linstedt's online column on the ETL/ELT debate Part II


back to top



Copyright by Teradata Corporation 2001-2007.