Teradata Magazine Cover Teradata Magazine Online  
Register Help Password
Password:
Quick Links
Current Issue
Archives
Teradata.com
Teradata Magazine Rss Feed
ARCHIVES Search Teradata Magazine Online:  
 




















Teradata makes impossible workloads possible and possible workloads economical.































































A new “expedited” workload type will allow system resources to be reserved for time-sensitive requests.





























This set of technology allows even the highest volume data flows to keep the active data warehouse current.

 


TERADATA WAREHOUSE 7.0 — BETTER AT BEING THE BEST

by Todd Walter

With active data warehousing front and center, Teradata is releasing the next installment of the suite of products making up Teradata Warehouse 7.0. Anchored by Teradata Database V2R5.0 and containing major upgrades in Teradata Tools and Utilities 7.0, this is the most comprehensive update ever produced by Teradata Engineering. Every aspect of active data warehousing is made simpler, faster, more economical and easier to implement.

What aspects of active data warehousing does Teradata Warehouse 7.0 address?
Functionality and performance enhancements support strategic decision-making, while scalability and short-query performance are emphasized for tactical decision-making. Data freshness achieves new levels of performance and scalability, as well as ETL and EAI partner tool integration. Work in many other areas of Teradata Warehouse 7.0 creates a “trusted integrated environment” for running all decision-making work in a single place, on a single copy of the data, for the entire enterprise.

What is new for strategic decision-making?
Traditionally known as decision support, reporting, OLAP or data mining, strategic decision-making is the foundation of Teradata’s market differentiation. New functionality is added to extend Teradata’s analytic engine. For instance, counting the number of unique customers and the number of unique households becomes a single query. Moving Average (and any other aggregate operation) can be calculated over an arbitrary period of time, before and after an event. And the statisticians will appreciate the detailed control they gain for performing stratified sampling operations. Of course each new function scales linearly and is fully integrated into Teradata’s open standard interfaces so any user or application can utilize it on any volume of data.

How does performance improve?
Query complexity continues to rise, driven by the complexity of representing the entire enterprise in a single data model and by tools and applications that generate SQL requests. The Teradata query optimizer is continually being enhanced to identify opportunities for generating ever more efficient plans for very complex queries. Much intelligence is added to globally reorganize complex queries during query optimization so that costly operations are performed at a more optimal time. Aggregate operations are broken up and moved where they can be performed at the least expense. Derived tables are decomposed and integrated more tightly into the query plan. Entire joins and table accesses are eliminated when the optimizer can see that they do not contribute to the query result; this is especially important for generated SQL. Workload management features automatically degrade the priority of a query as its execution time extends; control the number of concurrent, long-running, low-priority queries; and queue low-priority queries for which resources are currently unavailable.

Is query complexity limited?
Every software system has limits, but Teradata continues to move its limits beyond the complexity that even the largest enterprises require. A table can now have up to 2,000 columns, and an index can now have up to 64 columns. The size of the internal representation of the query plan increases more than 16 times. And the size of a single SQL statement, view or macro can be 1MB—a 16-fold increase.

Are there tradeoffs between different types of strategic decisions?
Strategic decisions come in many different shapes and sizes. Some ask questions across long periods of time, all divisions of the company or all categories of products. Others are narrower in scope—asking about a single week instead of five years, for example. The tension between the performance of these two types of queries is typically resolved either by accepting sub-optimal performance for one of the workloads or by building very complex data structures that require much more management and make queries more complex to create and maintain.

The Partitioned Primary Index (PPI) feature eliminates this tradeoff without introducing management overhead. As part of the table definition, a function is specified that identifies the scoping attributes of the narrower queries. Teradata utilizes the primary index of the table to distribute data evenly across all units of parallelism, then uses the new partitioning function to group data together on the disk. Grouping the data by the attributes that define the scope of the query allows the query to be completed with considerably less work. In a table with five years of historical transactions, a query requesting one month of data must read all five years. When a partitioning function is provided to group the data by month, the query now can access only the month it needs—60 times less work.

The performance delivered by the PPI feature can be produced in a number of more traditional ways, such as breaking up a large table into a number of smaller ones or manually defining partitions. However, these traditional tactics require a great deal of effort on the part of the system management staff and the person asking the queries. PPI delivers the performance without the hassle. The function is specified once when the table is defined. Data layout and space management is automatically handled by the Teradata Database, as is the physical grouping when new data is added. The query optimizer leverages the grouping automatically to deliver performance without the help of the query developer (or tool). Insert performance improves automatically by writing to only the data blocks for the group, and deletion of an old group—the oldest month, for instance—is instantaneous.

What is new for tactical decision-making?
Tactical decisions are different from strategic decisions in several ways. They have a smaller scope (one product instead of many, one customer instead of all or one transaction instead of millions), they typically have much more stringent response-time requirements, and there are often many more of them. Teradata Warehouse 7.0 addresses each of these dimensions and enhances the ability of this workload to run concurrently with strategic queries and the updates required for keeping the data fresh.

How is the smaller scope of these queries addressed?
Expanded indexing choices allow the DBA and application designer to create structures that optimize the performance of small-scope requests. Global Index physically locates index rows together in one unit of parallelism for one value of a non-unique index. Parallel databases with local indexing require all units of parallelism to look for the accounts belonging to a single customer where the new Global Index allows one unit of parallelism to answer that same question, improving both performance and throughput. Sparse Index allows an index to be defined on only some of the values in the underlying table, resulting in less space and lower update cost to maintain the index structure. As a result, more indexes can be defined to help other parts of the workload. Join Index and Aggregate Join Index enhancements improve performance of the tactical workload by pre-computing frequently used joins and aggregates. Teradata’s query optimizer makes all of these new indexing choices invisible to the query developer (or query tool) by automatically choosing the right structure to answer each query in the most efficient manner.

How is response time improved?
Most important is the ability to control response time because tactical, time-sensitive queries are sharing system resources with a wide variety of other work. A new “expedited” workload type will allow system resources to be explicitly reserved for time-sensitive requests. This allows the system administrator to deliver service levels to business functions that can’t tolerate long response times during peak workloads. The new indexing choices also improve response time along with internal database engine architectural and coding changes.

How does Teradata Warehouse 7.0 handle high-volume requests?
Throughput is the key to handling volume, and scalability is the key to delivering throughput for a workload. Scalability for tactical queries is different than scalability for long-running strategic queries. For small-scope tactical requests, utilizing all units of parallelism for a single query in a large system often causes the overhead of executing the query to overwhelm the actual work of answering the question.

The Teradata optimizer and execution engine have learned a new way of thinking about parallelism and short requests, using minimum parallelism to get the job done rather than always using all available parallelism. If only a few transactions or rows are qualified by a query, then the operations that follow—aggregation, sort, merge, query completion—are performed only on those parallel units that have qualified rows, leaving the rest of the parallel units to process other requests. In concert with Global Index, this new way of planning and executing queries dramatically improves the throughput and scalability of tactical workloads.

Are there tradeoffs between tactical and strategic queries?
Teradata makes every effort to minimize the inevitable tradeoffs. Cylinder Read eliminates the tradeoff between the large data block sizes desirable for strategic queries and the small block sizes desirable for tactical requests. When a scan or large join is performed, a large section of the disk is read in a single I/O, picking up many data blocks at the same time. Tactical requests just read the small data blocks that they require. Of course, the Teradata optimizer and file system automatically determine which method to use without requiring knowledge or direction from the user or system management staff.

How can data be kept up-to-date to answer the tactical queries (operational questions)?
Data freshness is key in active data warehousing. The focus is on moving the data warehouse update process from batch to continuous. Performance and throughput are improved through optimization of the database engine. UPSERT, a common function in data warehouse update streams, is upgraded to perform well on tables with complex surrounding structures, such as triggers, referential integrity and complex indexes. TPump, the continuous-update load utility, is enhanced with new reporting, exception handling and control functions. Teradata Warehouse Builder is released on all source platforms to provide scalable acquisition of the data from external sources—files, streams, ETL tools or directly from the organization’s EAI infrastructure. Combined, this set of technology allows even the highest volume data flows to keep the active data warehouse current with the organization, such as a railroad an hour from the tracks, a telco within an hour of the switches and a retailer within minutes of the checkout lanes, allowing business decisions to be made on the current state of the entire enterprise.

Batch updates continue to be important because not all data is required to be up-to-the-minute in all warehouses. Changes to index-update algorithms and PPI improve the performance of batch updates. Teradata Warehouse Builder integrates with the batch ETL tools and allows the data acquisition portion of the load process to scale.

What makes up the “trusted integrated environment”?
All decision-making work needs to be done simultaneously upon a single copy of all data about the enterprise. This implies that the system handling the work must be trusted in many ways. Data must always be available, reliable, secure and private. The system must be manageable by a minimum of resources regardless of the number of users, the volume of data, the complexity of the model and the diversity within the workload. And it must be trusted to store all the data in the minimum space possible.

With Teradata Warehouse 7.0, availability is increased through special attention to hardware expansions (such as a 40% increase in performance of the expansion tool) and attention to isolating faults and recovering from them faster and easier when they happen. For security and privacy in an environment with an exploding number of users, Teradata adds roles, profiles and user-level password security. Teradata Manager, Teradata Meta Data Services and Teradata Warehouse Builder all extend their management capabilities, including the addition of wizards for managing statistics and indexes. A query log is added to capture the flow of work. And multi-value compression vastly reduces the space required to store the data while at the same time improving performance.

Summary
Teradata Warehouse 7.0 is the largest and most wide-ranging release ever built by the Teradata organization. It addresses every aspect of active data warehousing, making impossible workloads possible and possible workloads economical. In this economy, when it is critical to make data-driven decisions based on the current status of the entire enterprise, you no longer need the resources of a large enterprise to get it together in an active data warehouse. T

E-MAIL ME
Looking for answers to life’s mysteries? Or would you just like to know more about the Teradata Warehouse and related applications? Ask the Expert! E-mail questions and comments to Todd at: todd.walter@teradata-ncr.com

Photo by Alex Hayden




Copyright by Teradata Corporation 2001-2007.