Getting more from SQL
The traditional extract, transform and load (ETL) software market has grown during the past 10 years. Many have successfully used ETL products to help load their data warehouse. But ETL tools of the past may not be the best approach to solving today's data integration needs.
In the 1990s, the value proposition brought to the market by ETL software vendors was compelling. At that time, the generic SQL provided by RDBMS vendors did not have the power necessary to perform the data transformation and aggregation processes required to prepare and maintain a successful data warehouse. As a result of SQL's limitations, many data warehouse teams turned to ETL tools—and their proprietary transformation languages—to perform the required extraction, transformation and loading tasks. Those that did not rely upon ETL tools to perform such tasks turned to manual coding and the use of third generation languages. The latter approach often proved to be cumbersome, producing code that was difficult to manage and maintain.
Last year Sunopsis announced the development of Sunopsis Open Connector for the Teradata warehouse. The Sunopsis Open Connector for Teradata helps organizations streamline the task of accessing and reconciling data from diverse production applications and speed the loading of data into their Teradata warehouse. It includes open extensions that enable Sunopsis to leverage native SQL, optimized for the Teradata platform.
"Teradata provides an extremely powerful database engine to process massive amounts of data and perform very complex calculations," says Stephen Brobst, chief technology officer of Teradata. "Sunopsis' architecture leverages this power to transform data. Sunopsis implements what is called an ELT (extract, load and transform) approach, instead of the more traditional ETL approach, and this is what makes the difference."
John Rauscher, COO of Sunopsis, explains some options available to businesses today. Sunopsis' next-generation data integration ELT software allows organizations to efficiently manage, share and consolidate data from disparate data sources with high performance and without the complexity and cost of traditional ETL solutions. With a global presence and more than 400 customers worldwide, Sunopsis has operations in the United States, Europe and Asia.
Q: What has changed during the last 10 years?
A: Today the power of SQL is radically different than the power of SQL 10 years ago. In fact, the capacity of SQL has multiplied by a factor of 20 in the last 10 years, making SQL an ideal platform for performing even the most complex data integration tasks.
With the improved performance of SQL, developers can now create data integration scripts with SQL and use existing RDBMS engines (running on a target system) to actually perform required transformations. Using this approach, data is extracted, then loaded and transformed on the Teradata server or wherever the data resides, thus creating an ELT architecture. Unlike an ETL approach, where data is extracted from the sources, transformed by the proprietary engine installed on the hub server and then loaded into the target data warehouse, an ELT architecture does not require customers to purchase a hub server to sit between the source and target systems.
Q: What are the benefits of the ELT approach?
A: Using an ELT approach provides three main benefits for customers: significant performance gains, reduced capital investment costs and optimization of existing technology and human resources.
From a performance perspective, the ELT approach is roughly 10 times faster than the ETL approach. The performance gain is mainly achieved thanks to the bulk processing of data, using the set processing features of the relational model. With traditional ETL tools, all data transformations executed inside the ETL engine are done on a row-by-row basis. With the ELT approach, the engine used for transformation is the Teradata RDBMS. All data processing is done with SQL in bulk, making the ELT approach much faster than the ETL (hub server) approach. Additionally, when using an ELT model, data does not need to be moved or re-indexed on the ETL engine. By eliminating these steps, considerable performance gains are realized.
The second major benefit of the ELT model is a reduction in capital investment costs. Since ELT customers do not need to buy an ETL engine, the ELT approach typically costs 50% less than the ETL approach.
Finally, the ELT approach allows end user organizations to optimize existing technologies and human resources. There are no proprietary languages to learn, and no additional hardware to support and administer.
Q: Don't IT teams already use an ELT approach when doing manual coding?
A: Yes. When developing data integration scripts manually, an ELT approach is typically used—SQL code is used to create the data integration scripts, and the Teradata RDBMS is used to execute the data transformations. But why hand-code and manually maintain scripts when you can utilize an automatic data integration code generator that gives you all of the benefits of manual coding (performance, flexibility, no new language to learn) plus metadata management capabilities, development and maintenance productivity gains and automatic data quality control?
Sunopsis offers a native SQL code generator that can be installed on any workstation in only a few minutes. With a few clicks, Sunopsis provides metadata management and the ability to design complex data transformation processes such as incremental updates or slowing changing dimensions. Sunopsis also provides automatic data quality features and generates efficient, native SQL code specifically for Teradata. Additionally, the graphical interface increases productivity and decreases development time.
Using Sunopsis, developers connect directly to the source (such as RDBMS, ERP/CRM, flat files or XML), reverse-engineer the metadata automatically, graphically define the mapping between source and target, design the transformations, reverse-engineer the constraints defined in the Teradata Database for data quality purposes, and include them in the generated SQL code. When the data integration scripts are completed, the SQL code is propagated to the data warehouse server and a scheduler ensures that the powerful Teradata RDBMS executes the data integration scripts.
Q: What do you say to Teradata customers already using a traditional ETL tool?
A: If performance is important for you, then the next time you need to perform data transformation or aggregation processes, you definitely should consider using an ELT approach. You'll be able to mix both approaches (ETL and ELT) and keep on using ETL scripts already developed by your traditional ETL tool. Scripts requiring a high level of performance can be developed by Sunopsis, thereby allowing you to fully leverage the power of your existing systems and resources. T
For more information on Sunopsis for Teradata, visit www.sunopsis.com/corporate/us/products/sunopsis/value_dw_teradata.htm and for more information on Sunopsis visit www.sunopsis.com/corporate/index.htm.