Teradata Magazine Cover Teradata Magazine Online  
Register Help Password
Password:
Quick Links
Current Issue
Archives
Teradata.com
Teradata Magazine Rss Feed
ARCHIVES Search Teradata Magazine Online:  
























Workload management supports consistent, short response times that are required for tactical queries.
































































Teradata Warehouse 7.0 adds functionality, scalability, performance and management to active data warehousing.



























Combining the privileged access lane with a high execution priority would cause a riot at your local grocery store.


STRATEGICALLY TACTICAL

by Todd Walter

Traditionally data warehousing has focused on strategic decisionmaking using historical data. A growing but still relatively small group of users perform reporting, OLAP, data mining and forecasting using a large amount of enterprise data. The active data warehouse adds a new dimension of tactical decision-making—small-scope questions delivering information to the people and systems on the front line of the organization. The industry has developed the operational data store (ODS), usually presented as running on a separate platform, to satisfy tactical decision-making requirements (For more information see "ODS Origami"). An active data warehouse with Teradata addresses the same workload but without a multiplicity of databases and systems. Teradata Warehouse 7.0—the latest installment in the Teradata release roadmap—adds functionality, scalability, performance and management specifically for this workload.

What is different about the tactical decision-making workload vs.
strategic decision-making?

Tactical decisions are generally based on small-scope queries (one product instead of many, one customer instead of all or one market basket instead of millions). Accessing a small to moderate number of rows from the database, these queries are expected to return to the user quickly. Sub-second to few-second response times are the norm. The number of users and the number of requests for information might be very high. These tactical queries often execute simultaneously on a Teradata Warehouse with a volume of complex strategic queries and continuous updates.

What is new for tactical queries in Teradata Warehouse 7.0?
Performance, scalability and workload management have been the center of attention for tactical queries. Among the many enhancements to performance are new indexing choices, specifically Global Index and Sparse Index. Scalability is extended by teaching the optimizer a new way to build plans for small-scope queries. Improving response time consistency for the tactical queries comes from a very focused set of enhancements to Priority Scheduler and Teradata Dynamic Query Manager.

What is a global index ?
Here is a scenario: Suppose Barry has three accounts with his broker. The Accounts table has a primary index of Account_ID, which distributes the accounts across the units of parallelism (AMPs)—let's assume a 64 AMP system—using the hash value of the Account_ID. The result is excellent data distribution, balanced complex queries and great joins on Account_ID. In this example, the call center application needs to retrieve just Barry's accounts, and it only has Barry's Customer_ID.

The traditional approach to accessing data when the selection value is other than the primary index is to build a non-unique secondary index (NUSI). However, if a NUSI were built on Customer_ID, each AMP would be involved in the indexed access, because with a NUSI, each AMP indexes its own data rows separately in local structures. The execution step that retrieves the rows via the index will have to be sent to all AMPs, since it is not possible to know beforehand which AMPs have the desired rows and which do not. Each AMP will look in the index, but only three will find anything. The other 61 AMPs will accept the step, look in the index but find nothing, report completion and go back
for more work.

A Global Index can index a non-unique set of data just like a NUSI, but it has an added advantage—each index row is distributed by hashing the index value (in this case, Customer_ID). This distribution causes all index entries for Barry's accounts to be on a single AMP. Using a Global Index, the execution step can be sent to a single AMP to get all the accounts from one place. If account detail is required, the rowIDs from the index can be used to go exclusively to the AMPs containing the Account rows. Four AMPs do the work, one for accessing the global index row that reflects Barry's Customer_ID value and three for the actual Account rows. The rest are available to do work for other requests. The optimizer automatically chooses whether to use the index based on the characteristics of the query. A broker can now quickly and efficiently find Barry's bonds.

How does a Sparse Index work?
A column being indexed often has uneven data distribution. In some cases, the unevenness is extreme, as might be the case when there are many instances of NULL or a default value appearing in an indexed column. The high-frequency values are not useful for index access and will be avoided by the optimizer, but updating the high-frequency values is expensive due to the number of entries in the index.

A Sparse Index allows an expression to be specified on the index definition to exclude or include desired rows. Eliminating the high-frequency values makes the index smaller and more efficient to update. This in turn allows more indexes to be defined to optimize the high-volume tactical queries. The optimizer automatically matches the query conditions to the index conditions to determine whether the index is applicable.

A Global Index on a column with very high-frequency values will result in uneven distribution of the index and possibly uneven distribution of work. Combining Sparse and Global Index to eliminate high-frequency values makes the Global Index less expensive and more valuable.

What's different about how tactical queries can be optimized and executed?
Teradata's optimizer has spent its entire life learning to maximize parallelism. Using all AMPs to execute a query is a great strategy for large-scope strategic queries. For tactical queries accessing a few rows, using all AMPs is usually less efficient. Often AMPs are put to work by a query only to find that there are no rows to process, as often happens when only a few rows are selected. The larger the system, the greater the potential for unnecessary AMP involvement with these kinds of queries.

An entirely new mechanism, GroupAMP, has been built into the execution engine to support the involvement of few AMPs to process a question. As a query executes, the AMPs that have produced results for a given step register as a "group." Further processing steps in the same query are sent only to this group of AMPs, eliminating the use of AMPs that have nothing to do. The optimizer has learned how to recognize tactical queries and when to generate plans for them using the new mechanism. Global Index, triggers and join index maintenance also take advantage of this feature.

Data retrieval, joins and result merge/return all benefit from the new plan. An indexed access step might result in only a few rows in a spool on only a few AMPs. Under those conditions, the subsequent join step is sent only to the AMPs with the spool, as is the sort, to prepare the result rows in order. In addition, the final result merge/return is sent exclusively to the group, instead of to all AMPs, minimizing the resources used for returning the few row result. Additional special optimizations address messages returning a small number of rows and aggregations that result in only a single aggregate group. The new plans can be observed in the Explain plan by looking for the term "GroupAMP."

When only a few AMPs participate, what locks are required?
The new GroupAMP mechanism makes possible greater flexibility and efficiency in how locks are applied. When GroupAMP plans are used, every effort is made to perform only row-hash level locking instead of applying table-level locks. This can be especially important on update operations. For example, the optimizer will recognize if an INSERT SELECT is only inserting a few rows, and it will launch the INSERT SELECT on only a few AMPs, locking only the newly inserted rows. Previously, an INSERT SELECT was always an all-AMP operation that applied a table-level lock. GroupAMP combined with lower locking granularity increases concurrency for all workloads on the system.

How can Teradata's scalability be improved?
People are often surprised by the modest resource requirements of today's tactical query workload. By query count, the tactical workload looks very large. But the actual resources required per query are very small compared to a strategic query. A data warehouse configured to handle many concurrent strategic queries can handle at least a moderate workload of tactical queries with a minimum impact on overall system resources. Looking to the future however, Teradata expects the volume of tactical queries to explode. As it does, inefficiencies such as the use of all AMPs will multiply and affect performance and throughput.

The change to using few AMPs automatically in query plans, via Global Index, has two scalability benefits. Since only a few AMPs will be used for one query, the remaining AMPs will be available to execute other concurrent queries, and throughput will improve for tactical workloads on existing configurations. When a configuration is grown today, a tactical workload that is dominated by unneeded all-AMP steps puts more AMPs to work doing nothing. With the Teradata Warehouse 7.0 changes, increasing the size of the configuration will increase the number of concurrent tactical requests, scaling the overall workload.

What's new in workload management?
Consistent, short response times are required for the tactical queries. New workload management features deliver greater consistency even in the face of very heavy workloads. Reserved AMP worker tasks, query milestone limits and concurrency control all contribute to enhancing Teradata's ability to deliver to very demanding response time requirements.

What is an AMP worker task and why would I want to reserve one?
The fast checkout lane at the grocery store is available only for people with a few items. Without this lane, all the checkers are likely to be busy with people shopping for the week, causing shoppers with a small number of items to wait a long time. AMP worker tasks are like checkout lanes: Each one can work on one execution step of one query on one AMP at a time, and there are a limited number of them. If a high volume of strategic queries are utilizing all the AMP worker tasks, then tactical queries do not have a place to execute and they must wait in line.

The ability to reserve AMP worker tasks for a small portion of the workload ensures that a lane is always available to process the tactical requests. Combining this privileged access lane with a high execution priority does something that would cause a riot at your local store. Within Teradata, queues of requests waiting to execute are ordered by priority. If the fast lane is full, the high-priority, short requests go to the front of the other lines, as they are eligible to use all of the AMP worker tasks if needed, not just the reserved ones.

How do query milestone limits help?
Long-running work should not consume large amounts of resources if there is high-priority work to do. Query milestone limits automatically control the resource consumption of large queries, especially when users submit an unpredictable variety of queries. At each defined milestone, the query moves to a new (usually lower) Priority Scheduler allocation group with a new priority and resource weighting.

If two milestone limits are defined, with the first at five seconds and the second at one minute, a query that consumes fewer than five seconds of CPU will run entirely at the highest priority. However, a longer-running query would be able consume up to five seconds of CPU at high priority, but then be demoted to a medium priority for the next minute of CPU consumption, with the rest of its execution time at a low priority. Once the long-running query completes, the session returns to the original high priority so that the next query starts with fresh milestones. Surprising creativity by ad-hoc users can be managed such that it does not impact high-priority tactical work.

What does query concurrency control do for tactical work?
Like the milestone limit, concurrency control is about keeping the strategic work from interfering with the tactical query response time. If there is a lot of low-priority strategic work, it could tie up an undeserved level of resources, thus interfering with other parts of the workload. Concurrency control allows you to specify a maximum number of queries to run per performance group, account or user. Combined with the delayed query feature of Teradata Dynamic Query Manager, it allows a large volume of low-priority work to be submitted but not actually executed. Additional queries beyond the concurrency control value will be queued and executed as prior work finishes. This provides a continuous but managed flow of low-priority work while ensuring that resources are available for the higher priority parts of the workload.

Conclusion
The tactical query workload is a key part of an active data warehouse. The results of the strategic analysis and the current state of the business must be available to the business's front line in order to meet today's customer, supplier and management expectations. Functionality, performance, scalability and management enhancements in Teradata Warehouse 7.0 deliver a strengthened, more thorough ability to perform the tactical workload on the same data as the strategic workload of the warehouse, eliminating the separate ODS, reducing the cost and increasing the manageability of the overall warehouse environment. T

E-MAIL ME
Looking for answers to life's mysteries? Or would you just like to know more about the Teradata Warehouse and related applications? Ask the Expert! E-mail questions and comments to Todd at: todd.walter@teradata-ncr.com

Photo by Alex Hayden




Copyright by Teradata Corporation 2001-2007.