Register | Log in


Subscribe Now>>
Home News Tech2Tech Features Viewpoints Facts & Fun Teradata.com
Tech Support
Send to Colleague

Keeping harmony

A Teradata Certified Master answers readers' technical queries.

by Carrie Ballinger, Senior Technical Consultant, Teradata Certified Master V2R5

Ever since I majored in music in college, one of my favorite musical chords has been the diminished fifth. The diminished fifth introduces a dissonance that cries out for resolution and was considered so unsettling in medieval times that it was banned by religious leaders for hundreds of years.

Questions and answers, such as those contained in this column, are similar to the dissonance and consonance of a musical chord progression. An unanswered question is like a diminished fifth—it's in transit, unstable, leaving you wanting more. But an answer without a question is like a one-chord tune—flat, lifeless and hard to care about.

To make a technical point that sticks, or to write a bit of music someone might want to hear again, it's beneficial to recognize and reinforce the dependency between dissonance and consonance.

Hash joins eliminate sorting
Dear Carrie: In your Tech Support column in the December 2006 issue of Teradata Magazine Online, your answer about monitoring memory included this comment: "Hash joins eliminate costly row redistributions by doing more work in memory." I disagree.

The key benefit of the hash join over the merge join is that it eliminates the need to sort either input. If the join does not involve the primary index (PI) of the large table, then redistribution of that table will be required prior to the join.
—Getting it straight

Figure 1: Hash join on the fly
enlarge
Hash join on the fly eliminates sorting and row distribution of the large table.

Dear Straight: Yes, I agree. I should have emphasized that eliminating sorts is the significant advantage of hash joins—I stand corrected.

Although this is not a dodge of your criticism, there is a type of hash join that does eliminate row redistributions as well as sorts, and I can only guess that was in my mind when I wrote my response last December.

The standard hash join does require the same spooling and the same relocation of rows before the join, whether a table is being duplicated or redistributed. In this regard, it is similar to the merge join. However, if the small table is small enough to fit into one hash partition and if it is duplicated, the redistribution of the large table can be eliminated by doing a hash join on the fly, a variant of the standard hash join. In such a case, the large table is read directly, without spooling or redistributing, and the hash join is performed between the small table spool and the large table rows. (See figure 1, above.)

Both types of hash joins eliminate sorting, and the on the fly version eliminates redistribution as well. Here are examples of the standard hash join and the hash join on the fly, so you can see what I am talking about:

TABLE DEFINITIONS

CREATE SET TABLE CAB.FewCustomers   
     (
      C_CUSTKEY INTEGER NOT NULL,
      C_NAME VARCHAR(25),     
      C_ADDRESS VARCHAR(40),
      C_NATIONKEY INTEGER NOT NULL,
      C_PHONE CHAR(15) CHARACTER,
      C_ACCTBAL DECIMAL(15,2))
UNIQUE PRIMARY INDEX ( C_NAME );

CREATE MULTISET TABLE CAB.ManyOrders  
     (
      O_ORDERKEY INTEGER NOT NULL,
      O_CUSTKEY INTEGER NOT NULL,
      O_ORDERSTATUS CHAR(1),
      O_TOTALPRICE DECIMAL(15,2),
      O_ORDERDATE DATE FORMAT 'yyyy-mm-dd')
UNIQUE PRIMARY INDEX ( O_ORDERKEY );


SELECT COUNT(*)
FROM ManyOrders;  (the big table)

Count(*)
75000000

SELECT COUNT(*)
FROM FewCustomers;   (the small table)

Count(*)
300000

REGULAR HASH JOIN WITH REDISTRIBUTION OF BOTH TABLES 

EXPLAIN 
SELECT * 
FROM FewCustomers  
     , ManyOrders 
WHERE c_custkey = o_custkey;

Explanation
  1) First, we lock a distinct CAB."pseudo table" for read on a
     RowHash to prevent global deadlock for CAB.FewCustomers  . 
  2) Next, we lock a distinct CAB."pseudo table" for read on a
     RowHash to prevent global deadlock for CAB.ManyOrders . 
  3) We lock CAB.FewCustomers   for read, and we lock CAB.ManyOrders 
     for read. 
  4) We do an all-AMPs RETRIEVE step from CAB.FewCustomers   by way of
     an all-rows scan with no residual conditions into Spool 2
     (all_amps) fanned out into 35 hash join partitions, which is
     redistributed by hash code to all AMPs.  The size of Spool 2 is
     estimated with low confidence to be 301,520 rows.  The estimated
     time for this step is 1.21 seconds. 
  5) We do an all-AMPs RETRIEVE step from CAB.ManyOrders  by way of an
     all-rows scan with no residual conditions into Spool 3 (all_amps)
     fanned out into 35 hash join partitions, which is redistributed by
     hash code to all AMPs.  The input table will not be cached in
     memory, but it is eligible for synchronized scanning.  The result
     spool file will not be cached in memory.  The size of Spool 3 is
     estimated with high confidence to be 74,958,940 rows.  The
     estimated time for this step is 3 minutes and 59 seconds. 
  6) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
     all-rows scan, which is joined to Spool 3 (Last Use) by way of an
     all-rows scan.  Spool 2 and Spool 3 are joined using a hash join
     of 35 partitions, with a join condition of ("C_CUSTKEY = O_CUSTKEY"). 
     The result goes into Spool 1 (group_amps), which is built locally
     on the AMPs.  The result spool file will not be cached in memory. 
     The size of Spool 1 is estimated with no confidence to be
     4,520,425 rows.  The estimated time for this step is 55.18 seconds. 
  7) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 4 minutes and 56 seconds. 


HASH JOIN ON THE FLY WITH NO REDISTRIBUTION OF LARGE TABLE  

SELECT COUNT(*)
FROM ManyOrders;  (the big table)

Count(*)
75000000


SELECT COUNT(*)
FROM FewCustomers; (the small table)

Count(*)
1001


EXPLAIN 
SELECT * 
FROM FewCustomers  
     , ManyOrders 
WHERE c_custkey = o_custkey;

Explanation
  1) First, we lock a distinct CAB."pseudo table" for read on a
     RowHash to prevent global deadlock for CAB.FewCustomers  . 
  2) Next, we lock a distinct CAB."pseudo table" for read on a
     RowHash to prevent global deadlock for CAB.ManyOrders . 
  3) We lock CAB.FewCustomers   for read, and we lock CAB.ManyOrders 
     for read. 
  4) We do an all-AMPs RETRIEVE step from CAB.FewCustomers   by way of
     an all-rows scan with no residual conditions into Spool 2
     (all_amps), which is duplicated on all AMPs.  The size of Spool 2
     is estimated with high confidence to be 20,020 rows.  The
     estimated time for this step is 0.03 seconds. 
  5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
     all-rows scan, which is joined to CAB.ManyOrders  by way of an
     all-rows scan with no residual conditions.  Spool 2 and
     CAB.ManyOrders  are joined using a single partition hash_ join,
     with a join condition of ("C_CUSTKEY = CAB.ManyOrders .O_CUSTKEY"). 
     The input table CAB.ManyOrders  will not be cached in memory. 
     The result goes into Spool 1 (group_amps), which is built locally
     on the AMPs.  The size of Spool 1 is estimated with no confidence
     to be 15,008 rows.  The estimated time for this step is 1 minute
     and 11 seconds. 
  6) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 1 minute and 11 seconds.

Coding for cached plans
Dear Carrie: I have a question about caching tactical queries. My user says only 5% of his macros are being cached. He is getting this figure from the Database Query Log (DBQL). Can you throw any light on this? We are using macros with parameterized SQL.
—Cached Out

Dear Cached: If only 5% of the CacheFlags in the DBQLogTbl indicate cached plans were used for this application, then I would look first at how the application was coded. The values must be passed from the application in a separate data parcel (external to the request itself) for the request to be cached the first time it is seen. With macros, two levels of parameters are required for this to happen:

  1. The macro level: In the SQL, within the macro itself
  2. The request level: With BTEQ, in the USING clause parenthetical expression

I ran some examples (shown below) to illustrate this. The first has no parameterization; the second has parameterization only at the SQL level; the third shows parameterization at the SQL and request levels (by means of the USING clause). Only the third request will provide the cached plan functionality.

1) No parameterization:

replace macro cabmac as
(SELECT * FROM ordertbl
WHERE o_orderdate = 980101;);

2) Parameterization only at the SQL level:

replace macro cabmac (key1 DATE) as
(SELECT * FROM ordertbl
WHERE o_orderdate = :key1;);

exec cabmac(980101);

3) Parameterization at the SQL and request levels:

replace macro cabmac (key1 DATE) as
(SELECT * FROM ordertbl
WHERE o_orderdate = :key1;);

USING (key1 DATE)
exec cabmac(:key1);

In this latter case, the actual values are sent in a file separate from the request. The USING clause tells the system to look for the value in an associated file—that usually appears in a previous ".import" command if you are using BTEQ.

Below is some additional detail, including a sample MP-RAS script that does the import and macro execution, and DBQL output from the previous three macros.

The following code, taken from a recent tactical query test that was run on a UNIX MP RAS platform, uses a different macro than the above examples.

THE OUTPUT

+---------+---------+---------+---------+---------+---------+---------
.os del tq1_25.dat
+---------+---------+---------+---------+---------+---------+---------
.os echo 178369 > tq1_25.dat
+---------+---------+---------+---------+---------+---------+---------
.import VARTEXT file=tq1_25.dat
+---------+---------+---------+---------+---------+---------+---------
USING thekey (VARCHAR(7))
    exec cabmac(:thekey);

 *** Query completed. One row found. 3 columns returned. 
 *** Total elapsed time was 1 second.

  C_CUSTKEY          C_ACCTBAL  C_MKTSEGMENT
-----------  -----------------  ------------
     178369            3028.84  BUILDING2

+---------+---------+---------+---------+---------+---------+---------

Applications other than BTEQ will use different conventions. The important thing for achieving cached plans is that the PE must receive the SQL in a different parcel than the value(s). Using two different parcels does not add transmission overhead. Both the SQL parcel and the data parcel will be sent bundled into the same message.

Excerpt from Database Query Log output for the three examples
The following output from the DBQL shows that the only case where the plan was immediately cached was when parameterization was used in the request and the macro. As indicated by the cache flag, all other variants of the request did not lead to cached plans.

CacheFlag
QueryText
 
begin query logging on DBC;
 
SELECT c_custkey , c_acctbal , c_mktsegment FROM
tpcd.customer_view WHERE c_custkey = 178369 ;
 
EXEC cabmac ( 178369 ) ; 
 
database tpcd;
T
USING thekey (VARCHAR(7))exec cabmac(:thekey);
 
end query logging on DBC;

Appropriate use of queue tables
Dear Carrie: Files from different sources come from the TIBCO adapter and land on our server about every five minutes. After a group of these files becomes available, we load them into a target table. Because there are so many of these files, the job is eating a lot of CPU. I am wondering if, as a better-performing alternative, I can use a queue table to load these files.
—Wondering

Dear Wondering: You are probably looking for ways to minimize the utility startup overhead, which can become disproportionately expensive when frequently loading a very small number of rows. While queue tables are persistent database objects, they were developed to support things like event processing rather than mass loading.

Here are some reasons I can think of that would make queue tables less desirable for large loads:
The queue table is a conduit, not a destination, so if you loaded into a queue table, another process would need to be available at the other end to read from the queue and further process the rows. This adds complexity to the process.
A queue-insertion timestamp (QITS) column is required for each queue table row, adding extra space per row loaded.
An in-memory FIFO cache on each parsing engine (PE) is required for every queue table, which adds some overhead. This cache is a list of QITS and row ID pairs for rows to be taken off the queue and sorted by QITS.
Neither FastLoad nor MultiLoad supports loading into queue tables.

While you can load rows into queue tables using multi-statement requests (MSR) or TPump, neither method will guarantee that these rows will be consumed in the order they appeared, since the entire MSR will have the same QITS.

The ideal queue table is small, with few update and delete SQL operations. In these cases, the rate of insertion is generally close to the rate of consumption. Queue tables are a good alternative for the polling approach of registering events and are ideal for processing exceptions during a load process—but they are less useful as a staging table for all rows being loaded.

One solution might be to stream multiple files to a single load job via scripting, an access module or Teradata Parallel Transporter.

Contrast in DBQL timestamps
Dear Carrie: I am looking at DBQL detail across two tables and have found an oddity when I compare timestamps between them. When my system is very busy and I use the following calculation, I sometimes see a time lag:

(DBC.DBQLStepTbl.StepStartTime - DBC.DBQLogTbl.FirstStepTime where DBC.DBQLStepTbl.StepLev1Num = 1).

When the query sends the first step to the AMPs (FirstStepTime), shouldn't the time be the same as when the first step starts (StepStartTime)? If the system time between nodes is not in sync, could that be a contributing factor to this lag?
—Time on My Hands

Dear Time: The intent—and this occurs in most cases—is that the two timestamps will be the same. However, they come from two different DBQL tables that have their contents filled in at different times. Also, for these particular timestamps, they are set in different parts of the dispatcher code. Because these two values are set at slightly different points in the flow, a busy system could lead to them being slightly different.

Consider the step table timestamp to be more accurate than that of the default logging table. The default table timestamp is filled in prior to sending the first step, and because of that it is possible that an unexpected delay could make that assignment slightly premature. The step table timestamp is set the moment the step is dispatched.

You make an interesting point about differences with system clocks across nodes, but I think this can be discounted for the following reason: Because for a particular query all the rows that get inserted into the different DBQL tables are inserted on the same PE that the query came through, the timestamps for those related rows will reflect the single system clock on that PE's node.

Collecting random AMP samples
Dear Carrie: Does the optimizer always go out and get random AMP samples for the tables in a query even when collected statistics exist? Or are random AMP samples only obtained from the database after the optimizer finds no statistics have been collected on the table?
—Statistically Speaking

Figure 2: How the optimizer searches for statistics
enlarge
Random AMP samples are always accessed before the optimizer looks for collected statistics.

Dear Speaking: Here's how and when random AMP samples are acquired: (See figure 2, left.)

First, random AMP samples are kept in the table header that resides in memory in the dictionary cache. The optimizer first looks for the table header in the dictionary cache in order to extract the random AMP samples. If the table header is not in the cache, it is read from disk, where it can be found as a single row in subtable 0 of the base table. As part of the process of reading the table header from disk, random AMP samples are collected for that table and moved into a field in the table header when the table header is placed in the cache.

After the table header has been located and the random AMP samples have been accessed, a routine that looks for collected statistics is called. This routine will be called once for each index and/or column on a table for which the optimizer would like statistics. As part of that routine, the dictionary cache is searched for the relevant histogram.

If the statistical information of interest is not cached, an express request is issued that retrieves the collected statistics from the data dictionary tables on disk. If statistics for that column or index are found in the data dictionary, the histogram that contains the statistical information is placed in the dictionary cache for use by other queries and is used by the optimizer for the current query.

If no collected statistics exist and the column is a primary or non-unique secondary index (NUSI), then the random AMP samples that are stored in the table header will be used. Non-indexed columns that have no statistics collected will not use random AMP samples, but will rely on static formulas to determine selectivity estimates.

Statistics are removed from the cache periodically to make sure that what is cached is reasonably current. T

Teradata Magazine-June 2007

Related Links

Reference Library

Get complete access to Teradata articles and white papers specific to your area of interest by selecting a category below. Reference Library
Search our library:
Manthan
Trillium
Protegrity
Teradata.com | About Us | Contact Us | Media Kit | Subscribe | Privacy/Legal | RSS
Copyright © 2008 Teradata Corporation. All rights reserved.