|

TERADATA WAREHOUSE 7.1
Depth, breadth and strength
by Todd Walter
ACTIVE DATA WAREHOUSES
are implemented in more organizations every day.
While the concept is now well-established, the requirements on
the underlying technologies continue to increase. With the Teradata
Warehouse 7.1 suite of products, Teradata responds to those ever-increasing
requirements in every dimension.
What aspects of active data warehousing
does Teradata Warehouse 7.1 address?
Functionality and performance enhancements support strategic decision-making,
while query response time and triggers support tactical decision-making.
Teradata SQL becomes extensible with the addition of User Defined
Functions. New functionality allows new ways of thinking about
storing all the data for the enterprise and keeping it fresh.
Work in many other areas of Teradata Warehouse 7.1 creates a "trusted,
integrated environment" for running all decision-making work in
a single place on a single copy of the data for the entire enterprise.
What is new for strategic decision-making?
Traditionally known as decision support, reporting, OLAP or data
mining, strategic decision-making is the foundation of active
data warehousing. Expanded active data warehousing requires new
depth and breadth of analytics, including those developed by customers
and partners. Functionality and performance are addressed together
to allow strategic analysis to be performed more efficiently and
closer to the data.
What's new in Teradata Warehouse 7.1?
A very high proportion of strategic queries perform aggregation.
You can specify aggregation by defining one or more columns as
a "grouping" so that rows with the same value of the grouping
will be brought together to allow computation of an aggregate
or summary result, e.g., the sum of sales by store. Sometimes
it is desirable to aggregate more than one grouping of the data
for analysis-the sum of sales by country, by region and by store,
for example. Extended grouping functionality is now included in
the SQL language to allow many groupings to be calculated in a
single SQL statement.
Different options allow combinations of grouping.
GROUPING SETS allows exact specification of the desired groups.
ROLLUP creates a result similar to a report that has several subtotaling
breaks, and CUBE computes all combinations of grouping of a set
of columns. Compared to running many queries to get the same answers,
the performance of the combined grouping will be optimized by
using the results of previous groups to compute the next higher
grouping in the set.
Hash join has also been added to the performance
choices available to the query optimizer. This function eliminates
large, costly sort steps that could be required to prepare for
other join methods. The optimizer will choose hash join when large
sorts would otherwise be required and will continue to choose
methods like merge join when the data is already sorted or the
sort is inexpensive.
What strategic query enhancements can
we expect?
Query complexity continues to rise, driven by the complexity of
representing the entire enterprise in a single data model and
by tools and applications that generate SQL requests. The Teradata
query optimizer is continually being enhanced to identify opportunities
for generating ever-more efficient plans for very complex queries.
BI tools generate SQL with many OUTER JOINs, which can now be
optimized to eliminate entire joins from the plan. Queries using
the INNER JOIN syntax will now be fully optimized the same way
as queries using the comma syntax. The hash join and grouping
features deliver performance as well.
What if the analysis is too complex
for SQL expressions?
Complex data models and very large data sets are best analyzed
by pushing the computations close to the data. Teradata magnifies
this advantage with the parallel processing power of the platform
and database engine. There are computations that are too complex
to express using the SQL language, leading to analyses that require
the extraction of large volumes of data and resulting in lost
performance and scalability.
Enter User Defined Functions (UDFs). UDFs allow
an algorithm of arbitrary complexity to be written in the C programming
language. The algorithm is then installed in the database engine
where it can be referenced just as a standard SQL function would
be. Teradata includes a copy of the function on every unit of
parallelism, moving the function to the data and fully parallelizing
function execution. For example, once you develop a customer segmentation
model, you can install it into the database as a UDF. The segmentation
UDF can then be executed in a large complex SQL statement taking
advantage of the full parallelism of Teradata and eliminating
the need to export a very large amount of data, compute the segmentation
and push the results back into the database.
UDFs can be either scalar or aggregate functions.
Computations that can compute their result from a single set of
arguments are scalar functions and work like the built-in functions
SQRT or ABS. Computations that must bring together data from many
rows are aggregate functions like SUM or COUNT. The full parallel
power of the Teradata aggregation engine is available to aggregate
UDFs. The function creator simply needs to provide the algorithms
to do the accumulation of data from a new row into the accumulators
for the group; the aggregation engine does the rest-grouping,
cache management, parallel distribution and result collection.
What's new for tactical queries and
event detection?
Tactical decisions are different from strategic decisions in several
ways. They have a smaller scope (one product instead of many,
one customer instead of all or one transaction instead of millions),
they typically have much more stringent response-time requirements,
and there are often many more of them. Teradata Warehouse 7.1
focuses on response time for high-volume tactical queries and
trigger functionality for event detection.
What enhancements apply to tactical
queries?
Join index (JI) allows the DBA to create a physical database structure
that pre-computes much of the result for a set of high-volume
queries. The JI enables the queries to respond very quickly since
much of the join or aggregation work is already completed. New
optimization technology allows a single JI to be used by queries
that refer to a subset of the tables joined in the JI, in addition
to queries that use the same or more tables. Since one JI can
deliver performance to a much greater range of queries, fewer
JIs are required, lowering the overall data maintenance and storage
costs.
What's new with triggers?
The definition of triggers has been standardized recently. This
release brings the Teradata implementation into compliance with
the standard definition. A full implementation of row triggers
was added to the statement triggers already in place. Recursion
has been added allowing triggers to change the same table that
caused the trigger to be fired.
What about loading and keeping the data
up to date?
A large part of capturing the data into the warehouse is the transformation,
checking and cleansing of the incoming data streams. UDFs will
play a prominent role in allowing the complex manipulations to
be encoded into SQL statements and executed in parallel in Teradata.
Complex string transformations and comparisons, encoding and decoding,
time and date manipulation, and complex arithmetic can all be
performed using UDFs.
What will help capture all the data
from the enterprise?
A single enterprise-wide repository is the foundation of an active
data warehouse. Teradata Warehouse 7.1 adds large data object
storage, enhances Partitioned Primary Indexes (PPIs), completes
the identity column implementation and adds direct Unicode support
to Java Database Connectivity (JDBC).
Large data objects (LOBs) up to 2GB may now
be stored directly in Teradata via new data types Character Large
Object (CLOB) and Binary Large Object (BLOB) for character and
binary data respectively. CLOBs may contain character or text
data, such as documents or XML. BLOBs may contain binary data
such as pictures, audio or biometric data. The column containing
the LOB is just another column in a table-a picture of the employee
in the employee table for instance. The file system has been extended
to store the large objects independently from the rest of the
row to allow for good scan and join performance on the rest of
the columns. Access to the large objects is provided with new
client interface calls and new functionality in the query tools.
UDFs have been designed to handle the various LOB types so that
complex analysis can be performed, for instance, to analyze XML,
match biometric data or find a specific object in a set of pictures.
PPI partition elimination has been extended
to include joins between PPI and non-PPI tables. Dynamic Partition
Elimination collects partition information from the non-PPI table
and uses it to choose the partitions in the PPI table. Complex
conditions on the non-PPI table can now be the basis for high-
performance access to a small number of partitions in the PPI
table.
Teradata Warehouse 7.0 delivered Identity Column
to allow users to generate unique values in a table.
This new release, Teradata Warehouse 7.1, completes
the implementation with the addition of bulk load utility support
for identity columns. FastLoad, MultiLoad and Teradata Warehouse
Builder can all now operate on identity columns in addition to
the SQL update operations and the TPump that were supported initially.
What makes up the "trusted integrated
environment"?
All decision-making work must be done simultaneously on a single
copy of all enterprise data. This implies that the system handling
the work is trusted in many ways. Data must always be available,
reliable, secure and private. The system must be manageable by
a minimum of resources, regardless of data volume, model complexity
and workload diversity. And it must be trusted to store all the
data in a minimum amount of space. Teradata Warehouse 7.1 focuses
on manageability and availability.
What is new in system management?
The Database Query Log has been extended with resource utilization
information. Total CPU and I/O, as well as high and low AMP CPU
and I/O, are added to both the query log record and the step records
to allow a detailed understanding of query execution.
The data dictionary now contains use counts
and last use dates. As a result, the usage trends of all database
objects can be tracked and understood to allow cleanup and the
maximum utilization of system capacity.
Users with multiple roles can enable all of
them simultaneously without the creation of higher level roles.
Unwanted long-running transaction rollbacks can be stopped to
eliminate their consumption of system resources.
What's new for high availability?
The NCR MPP platform and the Teradata clique concept have been
extended with new options for single-system high availability.
The maximum size of a clique increased to eight nodes from four.
When an eight-node clique experiences a node failure, more nodes
are available to redistribute the workload, resulting in a much
lower system impact. For business-critical systems that cannot
afford any performance degradation, one of the eight nodes can
be designated as a hot standby. The spare is activated when a
node failure occurs and takes over all of the failure nodes' workload,
resulting in zero performance degradation. When the failed node
is repaired, it becomes the new hot standby node, requiring no
further outage to return it to the configuration.
As the active data warehouse continues to become
mission-critical, it's necessary to ensure its continued service
to the applications-even in the face of a disaster. Teradata Warehouse
7.1 introduces the Query Director component to route and balance
workload between two, probably geographically separated, Teradata
data warehouse systems and to fail over applications to one of
the systems should a disaster occur.
Summary
New functionality and performance encourages greater use of active
data warehousing.
New
data types and function extensibility allow greater depth of analysis.
Management and availability extensions make it easier to operate
a critical enterprise component.
Overall, Teradata Warehouse 7.1 enables more
analysis using a wider range of data on a stronger foundation
than ever before. T
Todd Walter, chief technology officer,
Teradata Development Division, oversees all research and development
efforts for Teradata Database software and systems. In addition,
he is responsible for the future vision and development of the
active data warehouse. E-mail him at todd.walter@teradata-ncr.com.
PHOTO BY ALEX HAYDEN
|