With active data warehousing front and center,
Teradata is releasing the next installment of the suite of products
making up Teradata Warehouse 7.0. Anchored by Teradata Database
V2R5.0 and containing major upgrades in Teradata Tools and Utilities
7.0, this is the most comprehensive update ever produced by
Teradata Engineering. Every aspect of active data warehousing
is made simpler, faster, more economical and easier to implement.
What aspects
of active data warehousing does Teradata Warehouse 7.0 address?
Functionality and
performance enhancements support strategic decision-making,
while scalability and short-query performance are emphasized
for tactical decision-making. Data freshness achieves new
levels of performance and scalability, as well as ETL and
EAI partner tool integration. Work in many other areas of
Teradata Warehouse 7.0 creates a trusted integrated
environment for running all decision-making work in
a single place, on a single copy of the data, for the entire
enterprise.
What is new for
strategic decision-making?
Traditionally known
as decision support, reporting, OLAP or data mining, strategic
decision-making is the foundation of Teradatas market
differentiation. New functionality is added to extend Teradatas
analytic engine. For instance, counting the number of unique
customers and the number of unique households becomes a single
query. Moving Average (and any other aggregate operation)
can be calculated over an arbitrary period of time, before
and after an event. And the statisticians will appreciate
the detailed control they gain for performing stratified sampling
operations. Of course each new function scales linearly and
is fully integrated into Teradatas open standard interfaces
so any user or application can utilize it on any volume of
data.
How does
performance improve?
Query complexity
continues to rise, driven by the complexity of representing
the entire enterprise in a single data model and by tools
and applications that generate SQL requests. The Teradata
query optimizer is continually being enhanced to identify
opportunities for generating ever more efficient plans for
very complex queries. Much intelligence is added to globally
reorganize complex queries during query optimization so that
costly operations are performed at a more optimal time. Aggregate
operations are broken up and moved where they can be performed
at the least expense. Derived tables are decomposed and integrated
more tightly into the query plan. Entire joins and table accesses
are eliminated when the optimizer can see that they do not
contribute to the query result; this is especially important
for generated SQL. Workload management features automatically
degrade the priority of a query as its execution time extends;
control the number of concurrent, long-running, low-priority
queries; and queue low-priority queries for which resources
are currently unavailable.
Is query
complexity limited?
Every software system has limits, but Teradata continues to
move its limits beyond the complexity that even the largest
enterprises require. A table can now have up to 2,000 columns,
and an index can now have up to 64 columns. The size of the
internal representation of the query plan increases more than
16 times. And the size of a single SQL statement, view or
macro can be 1MBa 16-fold increase.
Are there
tradeoffs between different types of strategic decisions?
Strategic decisions
come in many different shapes and sizes. Some ask questions
across long periods of time, all divisions of the company
or all categories of products. Others are narrower in scopeasking
about a single week instead of five years, for example. The
tension between the performance of these two types of queries
is typically resolved either by accepting sub-optimal performance
for one of the workloads or by building very complex data
structures that require much more management and make queries
more complex to create and maintain.
The Partitioned Primary Index (PPI) feature
eliminates this tradeoff without introducing management overhead.
As part of the table definition, a function is specified that
identifies the scoping attributes of the narrower queries.
Teradata utilizes the primary index of the table to distribute
data evenly across all units of parallelism, then uses the
new partitioning function to group data together on the disk.
Grouping the data by the attributes that define the scope
of the query allows the query to be completed with considerably
less work. In a table with five years of historical transactions,
a query requesting one month of data must read all five years.
When a partitioning function is provided to group the data
by month, the query now can access only the month it needs60
times less work.
The performance delivered by the PPI feature
can be produced in a number of more traditional ways, such
as breaking up a large table into a number of smaller ones
or manually defining partitions. However, these traditional
tactics require a great deal of effort on the part of the
system management staff and the person asking the queries.
PPI delivers the performance without the hassle. The function
is specified once when the table is defined. Data layout and
space management is automatically handled by the Teradata
Database, as is the physical grouping when new data is added.
The query optimizer leverages the grouping automatically to
deliver performance without the help of the query developer
(or tool). Insert performance improves automatically by writing
to only the data blocks for the group, and deletion of an
old groupthe oldest month, for instanceis instantaneous.
What is new
for tactical decision-making?
Tactical decisions
are different from strategic decisions in several ways. They
have a smaller scope (one product instead of many, one customer
instead of all or one transaction instead of millions), they
typically have much more stringent response-time requirements,
and there are often many more of them. Teradata Warehouse
7.0 addresses each of these dimensions and enhances the ability
of this workload to run concurrently with strategic queries
and the updates required for keeping the data fresh.
How is the
smaller scope of these queries addressed?
Expanded indexing
choices allow the DBA and application designer to create structures
that optimize the performance of small-scope requests. Global
Index physically locates index rows together in one unit of
parallelism for one value of a non-unique index. Parallel
databases with local indexing require all units of parallelism
to look for the accounts belonging to a single customer where
the new Global Index allows one unit of parallelism to answer
that same question, improving both performance and throughput.
Sparse Index allows an index to be defined on only some of
the values in the underlying table, resulting in less space
and lower update cost to maintain the index structure. As
a result, more indexes can be defined to help other parts
of the workload. Join Index and Aggregate Join Index enhancements
improve performance of the tactical workload by pre-computing
frequently used joins and aggregates. Teradatas query
optimizer makes all of these new indexing choices invisible
to the query developer (or query tool) by automatically choosing
the right structure to answer each query in the most efficient
manner.
How is response
time improved?
Most important is the ability to control response time because
tactical, time-sensitive queries are sharing system resources
with a wide variety of other work. A new expedited
workload type will allow system resources to be explicitly
reserved for time-sensitive requests. This allows the system
administrator to deliver service levels to business functions
that cant tolerate long response times during peak workloads.
The new indexing choices also improve response time along
with internal database engine architectural and coding changes.
How does
Teradata Warehouse 7.0 handle high-volume requests?
Throughput is the
key to handling volume, and scalability is the key to delivering
throughput for a workload. Scalability for tactical queries
is different than scalability for long-running strategic queries.
For small-scope tactical requests, utilizing all units of
parallelism for a single query in a large system often causes
the overhead of executing the query to overwhelm the actual
work of answering the question.
The Teradata optimizer and execution engine
have learned a new way of thinking about parallelism and short
requests, using minimum parallelism to get the job done rather
than always using all available parallelism. If only a few
transactions or rows are qualified by a query, then the operations
that followaggregation, sort, merge, query completionare
performed only on those parallel units that have qualified
rows, leaving the rest of the parallel units to process other
requests. In concert with Global Index, this new way of planning
and executing queries dramatically improves the throughput
and scalability of tactical workloads.
Are there
tradeoffs between tactical and strategic queries?
Teradata makes
every effort to minimize the inevitable tradeoffs. Cylinder
Read eliminates the tradeoff between the large data block
sizes desirable for strategic queries and the small block
sizes desirable for tactical requests. When a scan or large
join is performed, a large section of the disk is read in
a single I/O, picking up many data blocks at the same time.
Tactical requests just read the small data blocks that they
require. Of course, the Teradata optimizer and file system
automatically determine which method to use without requiring
knowledge or direction from the user or system management
staff.
How can data
be kept up-to-date to answer the tactical queries (operational
questions)?
Data freshness
is key in active data warehousing. The focus is on moving
the data warehouse update process from batch to continuous.
Performance and throughput are improved through optimization
of the database engine. UPSERT, a common function in data
warehouse update streams, is upgraded to perform well on tables
with complex surrounding structures, such as triggers, referential
integrity and complex indexes. TPump, the continuous-update
load utility, is enhanced with new reporting, exception handling
and control functions. Teradata Warehouse Builder is released
on all source platforms to provide scalable acquisition of
the data from external sourcesfiles, streams, ETL tools
or directly from the organizations EAI infrastructure.
Combined, this set of technology allows even the highest volume
data flows to keep the active data warehouse current with
the organization, such as a railroad an hour from the tracks,
a telco within an hour of the switches and a retailer within
minutes of the checkout lanes, allowing business decisions
to be made on the current state of the entire enterprise.
Batch updates continue to be important
because not all data is required to be up-to-the-minute in
all warehouses. Changes to index-update algorithms and PPI
improve the performance of batch updates. Teradata Warehouse
Builder integrates with the batch ETL tools and allows the
data acquisition portion of the load process to scale.
What makes
up the trusted integrated environment?
All decision-making
work needs to be done simultaneously upon a single copy of
all data about the enterprise. This implies that the system
handling the work must be trusted in many ways. Data must
always be available, reliable, secure and private. The system
must be manageable by a minimum of resources regardless of
the number of users, the volume of data, the complexity of
the model and the diversity within the workload. And it must
be trusted to store all the data in the minimum space possible.
With Teradata Warehouse 7.0, availability
is increased through special attention to hardware expansions
(such as a 40% increase in performance of the expansion tool)
and attention to isolating faults and recovering from them
faster and easier when they happen. For security and privacy
in an environment with an exploding number of users, Teradata
adds roles, profiles and user-level password security. Teradata
Manager, Teradata Meta Data Services and Teradata Warehouse
Builder all extend their management capabilities, including
the addition of wizards for managing statistics and indexes.
A query log is added to capture the flow of work. And multi-value
compression vastly reduces the space required to store the
data while at the same time improving performance.
Summary
Teradata Warehouse 7.0 is the largest and most
wide-ranging release ever built by the Teradata organization.
It addresses every aspect of active data warehousing, making
impossible workloads possible and possible workloads economical.
In this economy, when it is critical to make data-driven decisions
based on the current status of the entire enterprise, you
no longer need the resources of a large enterprise to get
it together in an active data warehouse. T
E-MAIL ME
Looking for answers to lifes mysteries? Or
would you just like to know more about the Teradata Warehouse
and related applications? Ask the Expert! E-mail questions
and comments to Todd at: todd.walter@teradata-ncr.com
Photo by Alex Hayden