Teradata Magazine Cover Teradata Magazine Online  
Register Help Password
Password:
Quick Links
Current Issue
Archives
Teradata.com
Teradata Magazine Rss Feed
ARCHIVES Search Teradata Magazine Online:  













Tech2Tech:
Ask the Expert: Teradata 7.1
Teradata 7.1 meets active data warehousing's ever-increasing requirements.

Applied Solutions:
Too much space?

What can you do with all that extra disk capacity? think historical data.

Insider's Warehouse:
Good performance

New tools make it possible to define a business strategy and monitor performance.

Just the FAQs
We simplify the complicated. Read FAQs posted online or ask the experts.

















Very few businesses have the financial luxury of developing a fully redundant, completely mirrored disaster recovery program, yet the need for such a system is growing.

STRIKING A BALANCE

You can mix tactical and strategic queries on the same database

by Carrie Ballinger

THE OLD ADAGE "THERE ARE TWO SIDES TO every story" can be applied to data warehousing, where under some circumstances a question can have two or more valid, yet opposite answers.

If the question is how to access data most efficiently from a Teradata data warehouse, the answer that first jumps to mind is parallelize the work across the entire system.

This is not surprising because, traditionally, data warehouse applications have focused on drawing strategic advantage from the data, leading to complex and sometimes long-running queries. The parallel architecture of Teradata supports these complex queries by spreading the work across all of the configuration's parallel units (referred to as AMPs).

But as increasing numbers of Teradata users are introducing quick-turnaround tactical queries alongside strategic queries, a second, opposite, answer to the above question emerges: Localize the work to touch as little of the system as possible using a single AMP for best performance. Queries that touch one or few AMPs allow more such queries to execute at the same time, without performance compromise (see figure 1).

Scalability for tactical queries
To better understand the differences between single-AMP and all-AMP queries, consider scalability-the degree to which a platform is capable of absorbing growth with a predictable, proportional impact on performance (see figure 2).

With single-AMP tactical queries there is a dimension of scalability that is slightly different from that observed with all-AMP strategic queries. With all-AMP queries, more nodes translate to proportionally faster queries, because the work can be spread across a larger number of AMPs.

But since they only touch one AMP on one node, single-AMP queries don't experience a performance benefit when nodes are added. However, in contrast to all-AMP queries, an increasing number of single-AMP queries can be executed at the same time before contention for resources causes a lengthening of individual response times.

Here is an example: Assume you have a table that is composed of one million rows. Assume further that the read capacity of each node in your configuration is 100 rows per second. If you are doing a complex strategic query that involves a full scan of this table, then the Teradata response time for the query will diminish in proportion to the increase in nodes. This is because each node has fewer rows as nodes are added, so each node can perform its portion of the scan faster.

Rows/node
Strategic query
response time
1 node
1,000,000
10,000 seconds
10 nodes
100,000
1,000 seconds
100 nodes
10,000
100 seconds
200 nodes
5,000
50 seconds

But if your application is performing single- or few-AMP tactical queries, adding nodes will not shorten the query response time. It will, however, increase the number of tactical queries that can be performed at any given time.

Response time
Strategic query
response time
1 node
0.01 sec.
100 requests/sec.
10 nodes
0.01 sec.
1,000 requests/sec.
100 nodes
0.01 sec.
10,000 requests/sec.
200 nodes
0.01 sec.
20,000 requests/sec.

Single-AMP queries scale well in Teradata because they engage the same small level of database resources even when the system doubles or triples in size. The advantage of single-AMP queries increases as the configuration grows. As figure 3 illustrates, a single-AMP query will be engaging resources from only 4% of the AMPs in a two-node configuration. If that platform grows to eight nodes, the same query will now be engaging resources from only about 1% of the configuration.

Determining single- or all-AMP
After a Teradata SQL statement is optimized, the dispatcher sends the individual operations that make up that query, one step at a time, to some number of AMPs in the system, determined by the query optimizer. Usually, this will involve all AMPs in the system.

In addition, when the query plan includes all-AMP steps, the optimizer will add steps to the beginning and the end of the plan. These all-AMP steps will set and then release table-level locks on each AMP to support the all-AMP database access steps.

Under some conditions, the optimizer might decide that this query work can be accomplished by involving only a single AMP in the configuration. This process is usually referred to as a single-AMP operation step.

Each table in Teradata is defined with one or more columns as a primary index (PI). A row's PI value goes through a hashing algorithm in Teradata before the row is inserted into the database. The output of that hashing process determines which AMP a row will belong to and what its individual rowID will be. This rowID is a logical address that determines where on the AMP's disks a given row will be located.

Teradata single-AMP operations require that a PI value be available in the query for that table. Single-AMP steps can be achieved by any of the following database operations:
1. Simple single-row inserts;
2. Simple selects, updates and deletes qualified with a PI value; or
3. Joins between two tables that share the same PI domain and whose primary indexes make up the join constraint, and for which a single PI value is available.

When a single-AMP step is executed, Teradata folds into the same execution step the database locking for the row(s) that are involved. This eliminates the additional all-AMP steps that would be required to set and release table-level locks.

Even when joins between tables occur, Teradata can deliver localized operations. The following are examples of joins that can be performed by engaging one or few AMPs.

In all of these few-AMP examples, the first table is accessed either using a unique primary index (UPI), a non-unique primary index (NUPI) or a unique secondary index (USI). In Teradata, USI access is a two-AMP operation: The first AMP accesses the index row and a second AMP accesses the base-table row indicated by the indexed row.

Keep in mind that, in a massively parallel processing (MPP) system, a join can only be performed if the associated rows are brought into the memory of a single node. Also, in Teradata, the rows to be joined must be located in the portion of a node's memory that is controlled by a single AMP. If associated rows of tables-to-be-joined already reside on the same AMP, then performing a join between them is efficient because no row relocation is required. If not, one or both tables' rows must be repositioned before the join can take place so that associated rows are under the control of the same AMP.

Join Example 1
This merge join can be accomplished on a single AMP because the join columns are the same as the PI columns for both tables. Since the PI columns of both tables are the same, the associated rows from each table were located to the same AMP at the same time the rows were inserted before the SQL statement joins them (see figure 4).

select *
from transaction, item
where T_txnID = I_txnID
and T_txnID = 365;

A PI-to-PI index join is the most efficient Teradata join.

Join Example 2
This nested join requires three AMPs: two for USI access of the Customer table and one for UPI access of Location. The selected Customer data will be relocated to the AMP where the Location row resides, and they will be joined there (see figure 4).

select *
from Customer, Location
where C_LocID = l_LociD
and C_Custname = 'Ann Davis';

Expanding single-AMP benefits
Teradata Database V2R5 introduces a new way to optimize and execute requests based on the theory that tactical queries are more efficient when they engage fewer resources. Short queries often require more than one AMP, but fewer than all AMPs. With the group-AMP feature of V2R5, the optimizer now looks for opportunities to transform what used to be all-AMP query plans into few-AMP plans. Queries that involve more than one but less than 50% of the total AMPs in the system are candidates for group-AMP plans.

Join Example 3
This group-AMP merge join requires one AMP for NUPI access of the Item table and several AMPs to join to the associated Part rows. For each of the several Items that carry the specific TxnID value, the Item data is relocated to the AMP that carries the Part table rows associated with its foreign key, l_Partkey (see figure 4).

select *
from Item, Part
where I_Partkey = P_Partkey
and T_TxnID = 365;

Localizing the database work
Several approaches to physical database design can make one-, few- or group-AMP query plans more likely. Most of these choices need to be made carefully, as they can influence how other work in the system is optimized.

1. Define tables with the same PI definitions: When frequent joins will take place between two associated tables, there will be a performance advantage if their PI definitions come from the same domain and if these PI columns are the same as the join columns. Physically designing tables with this approach can benefit join processing for both traditional complex decision support and for tactical queries (see figure 5).

2. Increase the likelihood of few-AMP nested joins: Consider placing USIs to facilitate accessing or joining, as shown in figure 5. In Teradata, access using a USI is usually a two-AMP operation.

3. Consider a join index: In Teradata, join indexes have many characteristics of base tables, such as supporting a PI. You may create a single-table join index with a different PI than the base table. This PI could correspond to value(s) frequently passed by the application. In figure 6, the Customer table has C_Custkey defined as its PI. A query that requires data about a specific customer but only has a value for customer name could use a join index for single-AMP access. The query could be satisfied by the join index alone.

4. Use a global index: A global index is a single-table join index, similar to the one illustrated in figure 5, except for one important difference. The global index row carries a pointer back to the base table and is used as an alternate means to get to the base table row. This pointer that is carried in the global index may be the unique PI or the USI value for a base table row, either of which can be used for direct access to the base table. It can also be the rowID of the base table row, as is the case in the example shown in figure 6.

A global index offers the advantages of a non-unique secondary index (NUSI, which supports duplicate rows per value) combined with the advantages of a USI (its index rows are hash-distributed on the indexed value) and is often able to offer group-AMP capabilities.

As with all Teradata join indexes, the use of a global index is transparent to the query, and its use will be determined by the optimizer. If the optimizer chooses to use a global index, then the query may benefit from group-AMP access. When the number of rowIDs for a given global index value are less than half the number of AMPs in the current configuration, the optimizer may choose to perform a group-AMP join between those rowIDs and the base table rows, instead of an all-AMP join.

Conclusion
Trying to resolve opposing requirements in data warehousing can sometimes lead to unsatisfying compromises, but not when it comes to Teradata's ability to mix distinctly different strategic and tactical query needs in a single database. With traditional all-AMP strengths on one hand and ever-improving few-AMP capabilities on the other, Teradata clearly satisfies the two sides of the performance question. T

Teradata senior technical consultant Carrie Ballinger works in the Active Data Warehouse Center of Expertise in El Segundo, California.

PHOTO BY CLARA HAYDEN

Tactical queries vs. OLTP

Tactical queries are similar to classical online transaction processing (OLTP) in that both are composed of short, often direct-access queries with defined response time requirements.

OLTP, however, is more focused on bookkeeping activities and recording operational events, while tactical queries are focused on decision-making. OLTP generally comes with a higher ratio of writes compared to reads, while tactical queries tend to be read intensive. OLTP usually has a departmental view and a narrow context while tactical queries may be broader, often enterprise-wide. With OLTP queries, latency is always critical. Tactical queries, on the other hand, are designed for response time requirements that can range from sub-second speeds up to 10 seconds or more.




Copyright by Teradata Corporation 2001-2007.