Meeting the performance challenge
Teradata Warehouse 8.0 delivers faster, fresher results
for more effective decision-making.
by Todd Walter
We are bombarded daily with performance enhancement messages. Today's enterprises
must reduce costs, increase efficiency, improve customer service and deliver
profitability. On top of that, ethics and legislative compliance insist
that we watch and audit the business in detail. To complicate matters, all
of this must be done faster than ever before.
Every day, more operational data is being captured and analyzed to measure,
manage, audit and improve every aspect of the business. Decisions must be
made quicker and more efficiently by every person or system within the enterprise.
An active data warehouse, where data is integrated to provide a full and
agile view of the business, is required to ensure proper enterprise data
management and satisfy the performance demands of today's business.
Teradata Warehouse 8.0-complete with Teradata Database V2R6, Teradata
Tools and Utilities 8.0, and a number of other components-has new functionality
and enhanced performance to enable efficiencies in decision-making throughout
the enterprise.
How does Teradata Warehouse 8.0 address active
data warehousing?
Functionality and performance enhancements support strategic decision-making
while short-query performance is emphasized for tactical decision-making.
Data freshness achieves new levels of throughput. Major functionality additions
advance the ability to create leading-edge, eventdriven applications.Work
in many other areas of Teradata Warehouse 8.0 minimizes costs while simultaneously
providing the ability to deliver quick time-to-market for new applications,
maximizing enterprise data store ROI.
What's new for strategic
decision-making?
Many business structures are hierarchical-organized by departments,
products, parts, etc. Recursive query functionality with straightforward
queries makes possible the analysis of such hierarchical structures. A single
SQL statement replaces a complex application, and the query optimizer has
the opportunity to create a highperformance execution plan. Top N Rows is
a new SQL option that allows the return of a small portion within a large
result set, resulting in low-cost execution for browsing large-table contents.
Internal limits that constrained very large queries have been eliminated
to allow any enterprise query.
Complex queries, particularly those from BI tools, frequently contain IN
lists to describe the desired results. These IN lists have become more complex,
so the improved Teradata Query Optimizer creates specialized plans to take
advantage of available indexes and maximize performance.
Accurate data demographic information is key to optimization of complex
queries but frequently is not collected. In order to gain a better understanding
of data demographic information, an enhanced sampling mechanism reviews
more data on more units of parallelism.More accurate samples will reduce
the effects of missing statistics on complex query plans.
What
about tactical-query performance?
Smaller, faster queries are necessary for frontline applications. Performance
of such queries is significantly improved with a focus on CPU utilization.
In turn, more throughput is possible on the same configuration.
Tactical queries that access more rows frequently do so in a small range
within a large detailed table. Partitioned primary index (PPI) allows narrow-range
queries to have efficient access within this larger data set. PPI has been
enhanced to allow partitioned access during index access.
A non-unique secondary index (NUSI) contains the partition-id in each row-id
it references. NUSI access methods have been enhanced to utilize partitioning
conditions in the query to qualify the partition prior to using the NUSI
for access to the base table, resulting in more value for NUSIs on a partitioned
table. A NUSI may also be defined on the PI column(s) of a PPI table, where
it will offer single-AMP tactical access via the PI values rather than probing
all partitions.
What's new for event- or exception-driven applications?
Event detection, evaluation and propagation can be performed at any level
of timeliness required by the business process. Triggered stored procedures,
enhanced stored procedure functionality, queue tables and table functions
work together to allow a new class of active data warehouse applications
to be created.
How are near real-time events evaluated?
Extended triggers allow the execution of stored procedures as well as SQL
operations. The trigger detects data changes and invokes the stored procedure
to determine, via the stored procedure logic, if the event is interesting.
In Teradata Warehouse 8.0, the stored procedure may be written in C in addition
to the ANSI stored procedure language, allowing for more complex logic and
easier integration of external algorithms. Changes in the architecture allow
stored procedures to be executed with less overhead and improved performance.
How are interesting events propagated?
Event detection only has value if the events can be acted upon. C stored
procedures can directly invoke external functions such as sending e-mail,
pages or messages. Enterprise application integration (EAI) facilities provide
the backbone of message and event distribution for the enterprise; the external
functions provide a key new integration point with the EAI infrastructure.
What if the events require more intensive processing?
Executing a complex stored procedure as part of the data-changing operation
may interfere with the update application's service levels. A queue table
can separate the two. A queue table is defined like a SQL table with normal
column definitions. It combines the semantics of a queue and a table. An
INSERT statement places an entry in the queue. A SELECT AND CONSUME statement
through any standard SQL interface consumes the record at the front of the
queue. If the queue is empty, the SELECT waits until a record arrives. A
triggered or invoked stored procedure can insert into the queue and allow
the data-changing operation to complete. Independently, a stored procedure
or an external application can consume the queue and perform any level of
further analysis and processing before deciding whether to forward the event
for action.
What if data is required from a source outside
the data warehouse?
A table function is a new form of user defined function (UDF). It is used
in the FROM clause of a SQL statement, where its result becomes a table
that participates in the rest of the SQL execution. The function is written
in C and can get data from anywhere: a file, another database or even a
message bus. The function returns rows, which are written to a spool file
until the function signals that it's done. Then the rest of the statement
is executed using that spool. It may be used in any DML statement including
INSERT SELECT, UPDATE WHERE or a normal SELECT statement.
How
do all the updates get into the active data warehouse efficiently?
Update performance for single-row DML operators (INSERT, UPDATE, DELETE)
is significantly improved. Applications that operate on single rows, continuous
update processes and TPump all benefit from improved performance. The focus
in Teradata Warehouse 8.0 is on the CPU utilization and, thus, the throughput.
A bulk array update interface is included and will be used in the next interface
and TPump release.
How is backup time reduced?
Backup processes consume time and resources, and they potentially interfere
with online data availability. A table with five years of history partitioned
by month will typically be changed or added to only in the most recent partition.
Teradata Warehouse 8.0 can back up selected partitions of a PPI-defined
table, taking less time and consuming fewer resources.
How
is availability delivered for business-critical applications?
As the active data warehouse becomes a business-critical system, even higher
levels of availability are required. The applications cannot tolerate even
short offline times for planned system maintenance, nor is it acceptable
for them to be down in the event of a disaster. Extreme availability requires
multiple systems that are geographically separated. Teradata is creating
the technology to make it easy to create and manage these systems as a single
active data warehouse delivering continuous service to business-critical
applications.
Teradata Warehouse 8.0 introduces data replication as a mechanism to synchronize
data between systems. Query Director, the tool for routing workloads between
multiple systems and managing application failover, is upgraded with new
capabilities.
How is security managed for the large volume
of users being added?
Security attacks, legislative imperatives and auditing requirements are
putting great pressure on the security infrastructure and security administrators.
At the same time, corporate IT systems are directly accessed by a larger
user set, and the environment changes daily as people come and go and the
organization evolves.
Single sign-on environments are being adopted to consolidate and centralize
the management of security authorization and authentication for the entire
enterprise. Teradata will participate in the single sign-on environment,
allowing users to be defined in and authenticated from the enterprise security
service. Enabled by lightweight directory access protocol (LDAP) and open
interfaces, this Teradata functionality can work with the organization's
chosen authentication toolset.
How can data from across the
world be better integrated?
Countries around the world are moving to integrate their country-specific
information into a global view of their enterprise. Enhanced data access
interfaces (ODBC, JDBC, CLI) handle data in Unicode just as the database
engine does. Similarly, enhanced data load and export utilities move Unicode
data in and out of the database. These features, combined with a Unicode
user environment, will enable Teradata to handle any number of languages
simultaneously.
How can more stringent service levels be managed?
Currently, user resource allocations and resource access rules must be specified
and tuned. Teradata Warehouse 8.0 marks the birth of a new generation workload
management philosophy: User and application response-time service levels
and relative business priorities will be specified.
For ease of analysis, Teradata Manager dashboard indicators will display
service levels against current performance. Flexible exception criteria
can be defined to automate changes in priority for queries that demonstrate
unexpected behavior patterns.
In the future, this will become a closed loop where Teradata will recommend
settings and manage the workloads to the specified service levels without
administrator intervention.
What's the final word?
Teradata Warehouse 8.0 furthers Teradata's mission to deliver the technology
underlying enterprise performance, bringing performance enhancements to
every aspect of enterprise data management. This new release, which includes
Teradata Database V2R6 and Teradata Tools and Utilities 8.0, requires less
time and effort, as well as fewer people and resources, to manage the growing
complexity of running a data warehouse.
T
Todd Walter, CTO, Teradata Development Division,
oversees R&D efforts for Teradata Database software and systems. He
also is responsible for the future vision and development of the active
data warehouse. You can send your data warehousing questions to the expert
at todd.walter@teradata-ncr.com.
© Teradata Magazine-September
2004
back to top