Teradata Magazine Cover Teradata Magazine Online  
Register Help Password
Password:
Quick Links
Current Issue
Archives
Teradata.com
Teradata Magazine Rss Feed
ARCHIVES Search Teradata Magazine Online:  




 

Tech2Tech:
Applied Solutions:
Striking a balance

Strategic queries aren't the only ones that can run happily on a Teradata Warehouse.

Applied Solutions:
Too much space?

What can you do with all that extra disk capacity? think historical data.

Insider's Warehouse:
Good performance

New tools make it possible to define a business strategy and monitor performance.

Just the FAQs
We simplify the complicated. Read FAQs posted online or ask the experts.










Teradata Warehouse 7.1 creates a "trusted, integrated environment" for running all decision-making work in a single place on a single copy of the data.

























As the active data warehouse continues to become mission- critical, it's necessary to ensure its continued service to the applications-even in the face of disaster.

 

TERADATA WAREHOUSE 7.1

Depth, breadth and strength

by Todd Walter

ACTIVE DATA WAREHOUSES are implemented in more organizations every day. While the concept is now well-established, the requirements on the underlying technologies continue to increase. With the Teradata Warehouse 7.1 suite of products, Teradata responds to those ever-increasing requirements in every dimension.

What aspects of active data warehousing does Teradata Warehouse 7.1 address?
Functionality and performance enhancements support strategic decision-making, while query response time and triggers support tactical decision-making. Teradata SQL becomes extensible with the addition of User Defined Functions. New functionality allows new ways of thinking about storing all the data for the enterprise and keeping it fresh. Work in many other areas of Teradata Warehouse 7.1 creates a "trusted, integrated environment" for running all decision-making work in a single place on a single copy of the data for the entire enterprise.

What is new for strategic decision-making?
Traditionally known as decision support, reporting, OLAP or data mining, strategic decision-making is the foundation of active data warehousing. Expanded active data warehousing requires new depth and breadth of analytics, including those developed by customers and partners. Functionality and performance are addressed together to allow strategic analysis to be performed more efficiently and closer to the data.

What's new in Teradata Warehouse 7.1?
A very high proportion of strategic queries perform aggregation. You can specify aggregation by defining one or more columns as a "grouping" so that rows with the same value of the grouping will be brought together to allow computation of an aggregate or summary result, e.g., the sum of sales by store. Sometimes it is desirable to aggregate more than one grouping of the data for analysis-the sum of sales by country, by region and by store, for example. Extended grouping functionality is now included in the SQL language to allow many groupings to be calculated in a single SQL statement.

Different options allow combinations of grouping. GROUPING SETS allows exact specification of the desired groups. ROLLUP creates a result similar to a report that has several subtotaling breaks, and CUBE computes all combinations of grouping of a set of columns. Compared to running many queries to get the same answers, the performance of the combined grouping will be optimized by using the results of previous groups to compute the next higher grouping in the set.

Hash join has also been added to the performance choices available to the query optimizer. This function eliminates large, costly sort steps that could be required to prepare for other join methods. The optimizer will choose hash join when large sorts would otherwise be required and will continue to choose methods like merge join when the data is already sorted or the sort is inexpensive.

What strategic query enhancements can we expect?
Query complexity continues to rise, driven by the complexity of representing the entire enterprise in a single data model and by tools and applications that generate SQL requests. The Teradata query optimizer is continually being enhanced to identify opportunities for generating ever-more efficient plans for very complex queries. BI tools generate SQL with many OUTER JOINs, which can now be optimized to eliminate entire joins from the plan. Queries using the INNER JOIN syntax will now be fully optimized the same way as queries using the comma syntax. The hash join and grouping features deliver performance as well.

What if the analysis is too complex for SQL expressions?
Complex data models and very large data sets are best analyzed by pushing the computations close to the data. Teradata magnifies this advantage with the parallel processing power of the platform and database engine. There are computations that are too complex to express using the SQL language, leading to analyses that require the extraction of large volumes of data and resulting in lost performance and scalability.

Enter User Defined Functions (UDFs). UDFs allow an algorithm of arbitrary complexity to be written in the C programming language. The algorithm is then installed in the database engine where it can be referenced just as a standard SQL function would be. Teradata includes a copy of the function on every unit of parallelism, moving the function to the data and fully parallelizing function execution. For example, once you develop a customer segmentation model, you can install it into the database as a UDF. The segmentation UDF can then be executed in a large complex SQL statement taking advantage of the full parallelism of Teradata and eliminating the need to export a very large amount of data, compute the segmentation and push the results back into the database.

UDFs can be either scalar or aggregate functions. Computations that can compute their result from a single set of arguments are scalar functions and work like the built-in functions SQRT or ABS. Computations that must bring together data from many rows are aggregate functions like SUM or COUNT. The full parallel power of the Teradata aggregation engine is available to aggregate UDFs. The function creator simply needs to provide the algorithms to do the accumulation of data from a new row into the accumulators for the group; the aggregation engine does the rest-grouping, cache management, parallel distribution and result collection.

What's new for tactical queries and event detection?
Tactical decisions are different from strategic decisions in several ways. They have a smaller scope (one product instead of many, one customer instead of all or one transaction instead of millions), they typically have much more stringent response-time requirements, and there are often many more of them. Teradata Warehouse 7.1 focuses on response time for high-volume tactical queries and trigger functionality for event detection.

What enhancements apply to tactical queries?
Join index (JI) allows the DBA to create a physical database structure that pre-computes much of the result for a set of high-volume queries. The JI enables the queries to respond very quickly since much of the join or aggregation work is already completed. New optimization technology allows a single JI to be used by queries that refer to a subset of the tables joined in the JI, in addition to queries that use the same or more tables. Since one JI can deliver performance to a much greater range of queries, fewer JIs are required, lowering the overall data maintenance and storage costs.

What's new with triggers?
The definition of triggers has been standardized recently. This release brings the Teradata implementation into compliance with the standard definition. A full implementation of row triggers was added to the statement triggers already in place. Recursion has been added allowing triggers to change the same table that caused the trigger to be fired.

What about loading and keeping the data up to date?
A large part of capturing the data into the warehouse is the transformation, checking and cleansing of the incoming data streams. UDFs will play a prominent role in allowing the complex manipulations to be encoded into SQL statements and executed in parallel in Teradata. Complex string transformations and comparisons, encoding and decoding, time and date manipulation, and complex arithmetic can all be performed using UDFs.

What will help capture all the data from the enterprise?
A single enterprise-wide repository is the foundation of an active data warehouse. Teradata Warehouse 7.1 adds large data object storage, enhances Partitioned Primary Indexes (PPIs), completes the identity column implementation and adds direct Unicode support to Java Database Connectivity (JDBC).

Large data objects (LOBs) up to 2GB may now be stored directly in Teradata via new data types Character Large Object (CLOB) and Binary Large Object (BLOB) for character and binary data respectively. CLOBs may contain character or text data, such as documents or XML. BLOBs may contain binary data such as pictures, audio or biometric data. The column containing the LOB is just another column in a table-a picture of the employee in the employee table for instance. The file system has been extended to store the large objects independently from the rest of the row to allow for good scan and join performance on the rest of the columns. Access to the large objects is provided with new client interface calls and new functionality in the query tools. UDFs have been designed to handle the various LOB types so that complex analysis can be performed, for instance, to analyze XML, match biometric data or find a specific object in a set of pictures.

PPI partition elimination has been extended to include joins between PPI and non-PPI tables. Dynamic Partition Elimination collects partition information from the non-PPI table and uses it to choose the partitions in the PPI table. Complex conditions on the non-PPI table can now be the basis for high- performance access to a small number of partitions in the PPI table.

Teradata Warehouse 7.0 delivered Identity Column to allow users to generate unique values in a table.

This new release, Teradata Warehouse 7.1, completes the implementation with the addition of bulk load utility support for identity columns. FastLoad, MultiLoad and Teradata Warehouse Builder can all now operate on identity columns in addition to the SQL update operations and the TPump that were supported initially.

What makes up the "trusted integrated environment"?
All decision-making work must be done simultaneously on a single copy of all enterprise data. This implies that the system handling the work is trusted in many ways. Data must always be available, reliable, secure and private. The system must be manageable by a minimum of resources, regardless of data volume, model complexity and workload diversity. And it must be trusted to store all the data in a minimum amount of space. Teradata Warehouse 7.1 focuses on manageability and availability.

What is new in system management?
The Database Query Log has been extended with resource utilization information. Total CPU and I/O, as well as high and low AMP CPU and I/O, are added to both the query log record and the step records to allow a detailed understanding of query execution.

The data dictionary now contains use counts and last use dates. As a result, the usage trends of all database objects can be tracked and understood to allow cleanup and the maximum utilization of system capacity.

Users with multiple roles can enable all of them simultaneously without the creation of higher level roles. Unwanted long-running transaction rollbacks can be stopped to eliminate their consumption of system resources.

What's new for high availability?
The NCR MPP platform and the Teradata clique concept have been extended with new options for single-system high availability. The maximum size of a clique increased to eight nodes from four. When an eight-node clique experiences a node failure, more nodes are available to redistribute the workload, resulting in a much lower system impact. For business-critical systems that cannot afford any performance degradation, one of the eight nodes can be designated as a hot standby. The spare is activated when a node failure occurs and takes over all of the failure nodes' workload, resulting in zero performance degradation. When the failed node is repaired, it becomes the new hot standby node, requiring no further outage to return it to the configuration.

As the active data warehouse continues to become mission-critical, it's necessary to ensure its continued service to the applications-even in the face of a disaster. Teradata Warehouse 7.1 introduces the Query Director component to route and balance workload between two, probably geographically separated, Teradata data warehouse systems and to fail over applications to one of the systems should a disaster occur.

Summary
New functionality and performance encourages greater use of active data warehousing.

New data types and function extensibility allow greater depth of analysis. Management and availability extensions make it easier to operate a critical enterprise component.

Overall, Teradata Warehouse 7.1 enables more analysis using a wider range of data on a stronger foundation than ever before. T

Todd Walter, chief technology officer, Teradata Development Division, oversees all research and development efforts for Teradata Database software and systems. In addition, he is responsible for the future vision and development of the active data warehouse. E-mail him at todd.walter@teradata-ncr.com.

PHOTO BY ALEX HAYDEN




Copyright by Teradata Corporation 2001-2007.