|

STRIKING A BALANCE
You can mix tactical and strategic
queries on the same database
by Carrie Ballinger
THE
OLD ADAGE "THERE ARE TWO SIDES TO every story"
can be applied to data warehousing, where under some circumstances
a question can have two or more valid, yet opposite answers.
If the question is how to access data most efficiently
from a Teradata data warehouse, the answer that first jumps to
mind is parallelize the work across the entire system.
This is not surprising because, traditionally,
data warehouse applications have focused on drawing strategic
advantage from the data, leading to complex and sometimes long-running
queries. The parallel architecture of Teradata supports these
complex queries by spreading the work across all of the configuration's
parallel units (referred to as AMPs).
But as increasing numbers of Teradata users
are introducing quick-turnaround tactical queries alongside strategic
queries, a second, opposite, answer to the above question emerges:
Localize the work to touch as little of the system as possible
using a single AMP for best performance. Queries that touch one
or few AMPs allow more such queries to execute at the same time,
without performance compromise (see figure 1).
Scalability
for tactical queries
To better understand the differences between single-AMP and all-AMP
queries, consider scalability-the degree to which a platform is
capable of absorbing growth with a predictable, proportional impact
on performance (see figure 2).
With single-AMP tactical queries there is a
dimension of scalability that is slightly different from that
observed with all-AMP strategic queries. With all-AMP queries,
more nodes translate to proportionally faster queries, because
the work can be spread across a larger number of AMPs.
But since they only touch one AMP on one node,
single-AMP queries don't experience a performance benefit when
nodes are added. However, in contrast to all-AMP queries, an increasing
number of single-AMP queries can be executed at the same time
before contention for resources causes a lengthening of individual
response times.
Here is an example: Assume you have a table
that is composed of one million rows. Assume further that the
read capacity of each node in your configuration is 100 rows per
second. If you are doing a complex strategic query that involves
a full scan of this table, then the Teradata response time for
the query will diminish in proportion to the increase in nodes.
This is because each node has fewer rows as nodes are added, so
each node can perform its portion of the scan faster.
| |
Rows/node |
Strategic
query
response time |
1 node |
1,000,000 |
10,000 seconds |
10 nodes |
100,000 |
1,000 seconds |
100 nodes |
10,000 |
100 seconds |
200 nodes |
5,000 |
50 seconds |
But if your application is performing single-
or few-AMP tactical queries, adding nodes will not shorten the
query response time. It will, however, increase the number of
tactical queries that can be performed at any given time.
| |
Response time |
Strategic
query
response time |
1 node |
0.01 sec. |
100 requests/sec. |
10 nodes |
0.01 sec. |
1,000 requests/sec. |
100 nodes |
0.01 sec. |
10,000 requests/sec. |
200 nodes |
0.01 sec. |
20,000 requests/sec. |
Single-AMP queries scale well in Teradata because
they engage the same small level of database resources even when
the system doubles or triples in size. The advantage of single-AMP
queries increases as the configuration grows. As figure 3 illustrates,
a single-AMP query will be engaging resources from only 4% of
the AMPs in a two-node configuration. If that platform grows to
eight nodes, the same query will now be engaging resources from
only about 1% of the configuration.
Determining single- or all-AMP
After a Teradata SQL statement is optimized, the dispatcher sends
the individual operations that make up that query, one step at
a time, to some number of AMPs in the system, determined by the
query optimizer. Usually, this will involve all AMPs in the system.
In addition, when the query plan includes all-AMP
steps, the optimizer will add steps to the beginning and the end
of the plan. These all-AMP steps will set and then release table-level
locks on each AMP to support the all-AMP database access steps.
Under some conditions, the optimizer might decide
that this query work can be accomplished by involving only a single
AMP in the configuration. This process is usually referred to
as a single-AMP operation step.
Each table in Teradata is defined with one
or more columns as a primary index (PI). A row's PI value goes
through a hashing algorithm in Teradata before the row is inserted
into the database. The output of that hashing process determines
which AMP a row will belong to and what its individual rowID will
be. This rowID is a logical address that determines where on the
AMP's disks a given row will be located.
Teradata single-AMP operations require that
a PI value be available in the query for that table. Single-AMP
steps can be achieved by any of the following database operations:
1. Simple single-row inserts;
2. Simple selects, updates and deletes qualified with a PI value;
or
3. Joins between two tables that share the same PI domain and
whose primary indexes make up the join constraint, and for which
a single PI value is available.
When a single-AMP step is executed, Teradata
folds into the same execution step the database locking for the
row(s) that are involved. This eliminates the additional all-AMP
steps that would be required to set and release table-level locks.
Even when joins between tables occur, Teradata
can deliver localized operations. The following are examples of
joins that can be performed by engaging one or few AMPs.
In all of these few-AMP examples, the first
table is accessed either using a unique primary index (UPI), a
non-unique primary index (NUPI) or a unique secondary index (USI).
In Teradata, USI access is a two-AMP operation: The first AMP
accesses the index row and a second AMP accesses the base-table
row indicated by the indexed row.
Keep in mind that, in a massively parallel processing
(MPP) system, a join can only be performed if the associated rows
are brought into the memory of a single node. Also, in Teradata,
the rows to be joined must be located in the portion of a node's
memory that is controlled by a single AMP. If associated rows
of tables-to-be-joined already reside on the same AMP, then performing
a join between them is efficient because no row relocation is
required. If not, one or both tables' rows must be repositioned
before the join can take place so that associated rows are under
the control of the same AMP.
Join Example 1
This merge join can be accomplished on a single AMP because the
join columns are the same as the PI columns for both tables. Since
the PI columns of both tables are the same, the associated rows
from each table were located to the same AMP at the same time
the rows were inserted before the SQL statement joins them (see
figure 4).
select *
from transaction, item
where T_txnID = I_txnID
and T_txnID = 365;
A PI-to-PI index join is the most efficient
Teradata join.
Join Example 2
This nested join requires three AMPs: two for USI access of the
Customer table and one for UPI access of Location. The selected
Customer data will be relocated to the AMP where the Location
row resides, and they will be joined there (see figure 4).
select *
from Customer, Location
where C_LocID = l_LociD
and C_Custname = 'Ann Davis';
Expanding single-AMP benefits
Teradata Database V2R5 introduces a new way to optimize and execute
requests based on the theory that tactical queries are more efficient
when they engage fewer resources. Short queries often require
more than one AMP, but fewer than all AMPs. With the group-AMP
feature of V2R5, the optimizer now looks for opportunities to
transform what used to be all-AMP query plans into few-AMP plans.
Queries that involve more than one but less than 50% of the total
AMPs in the system are candidates for group-AMP plans.
Join Example 3
This group-AMP merge join requires one AMP for NUPI access of
the Item table and several AMPs to join to the associated Part
rows. For each of the several Items that carry the specific TxnID
value, the Item data is relocated to the AMP that carries the
Part table rows associated with its foreign key, l_Partkey (see
figure 4).
select *
from Item, Part
where I_Partkey = P_Partkey
and T_TxnID = 365;
Localizing the database work
Several approaches to physical database design can make one-,
few- or group-AMP query plans more likely. Most of these choices
need to be made carefully, as they can influence how other work
in the system is optimized.
1. Define tables with the
same PI definitions: When frequent joins will take place
between two associated tables, there will be a performance advantage
if their PI definitions come from the same domain and if these
PI columns are the same as the join columns. Physically designing
tables with this approach can benefit join processing for both
traditional complex decision support and for tactical queries
(see figure 5).
2. Increase the likelihood
of few-AMP nested joins: Consider placing USIs to facilitate
accessing or joining, as shown in figure 5. In Teradata, access
using a USI is usually a two-AMP operation.
3. Consider a join index:
In Teradata, join indexes have many characteristics of base tables,
such as supporting a PI. You may create a single-table join index
with a different PI than the base table. This PI could correspond
to value(s) frequently passed by the application. In figure 6,
the Customer table has C_Custkey defined as its PI. A query that
requires data about a specific customer but only has a value for
customer name could use a join index for single-AMP access. The
query could be satisfied by the join index alone.
4. Use a global index:
A global index is a single-table join index, similar to the one
illustrated in figure 5, except for one important difference.
The global index row carries a pointer back to the base table
and is used as an alternate means to get to the base table row.
This pointer that is carried in the global index may be the unique
PI or the USI value for a base table row, either of which can
be used for direct access to the base table. It can also be the
rowID of the base table row, as is the case in the example shown
in figure 6.
A global index offers the advantages of a non-unique
secondary index (NUSI, which supports duplicate rows per value)
combined with the advantages of a USI (its index rows are hash-distributed
on the indexed value) and is often able to offer group-AMP capabilities.
As with all Teradata join indexes, the use of
a global index is transparent to the query, and its use will be
determined by the optimizer. If the optimizer chooses to use a
global index, then the query may benefit from group-AMP access.
When the number of rowIDs for a given global index value are less
than half the number of AMPs in the current configuration, the
optimizer may choose to perform a group-AMP join between those
rowIDs and the base table rows, instead of an all-AMP join.
Conclusion
Trying to resolve opposing requirements in data warehousing can
sometimes lead to unsatisfying compromises, but not when it comes
to Teradata's ability to mix distinctly different strategic and
tactical query needs in a single database. With traditional all-AMP
strengths on one hand and ever-improving few-AMP capabilities
on the other, Teradata clearly satisfies the two sides of the
performance question. T
Teradata senior technical consultant Carrie
Ballinger works in the Active Data Warehouse Center of Expertise
in El Segundo, California.
PHOTO BY CLARA HAYDEN
| Tactical
queries vs. OLTP |
Tactical
queries are similar to classical online transaction processing
(OLTP) in that both are composed of short, often direct-access
queries with defined response time requirements.
OLTP, however, is more focused on bookkeeping
activities and recording operational events, while tactical
queries are focused on decision-making. OLTP generally comes
with a higher ratio of writes compared to reads, while tactical
queries tend to be read intensive. OLTP usually has a departmental
view and a narrow context while tactical queries may be
broader, often enterprise-wide. With OLTP queries, latency
is always critical. Tactical queries, on the other hand,
are designed for response time requirements that can range
from sub-second speeds up to 10 seconds or more.
|
|