carrie_ballinger

Carrie Ballinger, Senior database analyst, Teradata Certified Master

Tech2Tech

Tech Support

Did You Know?

Teradata Certified Master assists readers by answering their technical questions.

Human knowledge is doubling every 10 years, according to Michio Kaku’s best-selling book “Visions: How Science Will Revolutionize the 21st Century.” You and I make contributions to knowledge growth every time we discover a new technical solution or apply something we’ve learned to a problem we’ve been working on.

While I find it fascinating that knowledge is expanding at such a fast rate, I personally like to focus on how it can be made more accessible to a broader audience. This column is my attempt at sharing useful tips and information about the Teradata Database.

Explain confidence levels

Dear Carrie:

Despite large amounts of statistics that have been collected on my tables, I still get 80% of my explain plans showing “low confidence.” How can I raise these levels?
—Losing Confidence

Dear Losing:

Your plans are probably fine. Confidence levels indicate the degree to which statistics are (or are not) available for a given query step. The more complete the statistics, the higher the confidence level. High and low confidence levels are both perfectly acceptable—both are treated the same for Optimizer costing purposes.

The following shows how different confidence levels are determined:

  • High confidence will appear on a step that performs database access if a single predicate exists in the query for the table being accessed. Statistics must be present on the predicate as well as on the primary index (PI). This predicate can involve multiple columns in equality conditions as long as a multi-column statistic covers them.
  • Low confidence will be displayed if usable statistics are present either from collected statistics or from a random AMP sampling. Even if all possible statistics have been collected, you’ll get low confidence if:
    • Multiple predicates appear in the query for the table being accessed
    • No PI statistics were collected
    • The step is a join step (in most cases)
    image

    Click to enlarge

    Even when complete statistics have been collected on both tables feeding into a join, the correlation between the join columns of the two tables is not known to the Optimizer, as illustrated in figure 1. Unless one side is a unique index with one value selected, the outcome of the join can be only generally predicted, and low confidence will result.

  • Index join confidence will appear in instances when an index (with or without statistics) is present on one side of the join but no index or statistics are on the other side.
  • No confidence is assigned when no usable statistics exist for a step. Be aware that when no confidence appears in one step, it will trickle down to the next dependent steps. This will occur even if good statistics are available in subsequent tables that are involved in the joins. For this reason, it is not uncommon to see steps further down in a complex query with low or no confidence, even though those steps are making use of available statistics.

    Surprisingly, confidence levels do not directly contribute to better plans. Rather, they are only indicators of how thorough the statistics collection is for a given step. Even if additional statistics are collected in order to elevate a confidence level, they do not guarantee a more optimal plan.

    In general, it is probably best not to be too concerned about the confidence levels in explains. Rather, focus on problem queries, such as ones that are running unexpectedly long, and check those explains to see whether the steps look reasonable. Use the confidence levels as an indicator of where additional statistics might be helpful on steps that directly access a table.

    If you’ve collected a very large number of statistics, you may end up with ones that are not needed on columns and indexes. Collecting statistics only on the columns and indexes that will improve plans can significantly reduce the overall collection time. This will free up system resources for other work.

    Below is an example illustrating that low confidence will appear for join steps that follow high-confidence steps. PI statistics have been collected for the tables on both sides of a join (T_Txnkey is the PI of the Txn table; P_Partkey is the PI of the Parttbl) and on the single-selection columns (T_Shipdate and P_Size). Yet the step that joins them, step 5, shows low confidence:

HELP STATISTICS Txn;

Date Time Unique Values Column Names
08/11/28 13:27:36 11,250,000 T_TXNKEY
08/11/28 13:18:38 5,996,852 T_PARTKEY
08/11/28 13:22:35 2,526 T_SHIPDATE

HELP STATISTICS Parttbl;

Date Time Unique Values Column Names
08/11/28 13:27:46 6,000,000 P_PARTKEY
08/11/26 14:50:10 50 P_SIZE

EXPLAIN

SELECT * FROM Txn,Parttbl

WHERE P_partkey = T_partkey

AND P_size = 3

AND T_shipdate = 950101

Explanation

  1. First, we lock a distinct CAB."pseudo table" for read on a RowHash to prevent global deadlock for CAB.Txn.
  2. Next, we lock a distinct CAB."pseudo table" for read on a RowHash to prevent global deadlock for CAB.Parttbl.
  3. We lock CAB.Txn for read, and we lock CAB.Parttbl for read.
  4. We do an all-AMPs RETRIEVE step from CAB.Txn by way of an all-rows scan with a condition of ("CAB.Txn.T_SHIPDATE = DATE '1995-01-01'") into Spool 2 (all_amps) (compressed columns allowed), which is redistributed by the hash code of (CAB.Txn.T_PARTKEY) to all AMPs. Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is estimated with high confidence to be 18,636 rows (2,348,136 bytes). The estimated time for this step is 4.16 seconds.
  5. We do an all-AMPs JOIN step from CAB.Parttbl by way of a RowHash match scan with a condition of ("CAB.Parttbl.P_SIZE = 3"), which is joined to Spool 2 (Last Use) by way of a RowHash match scan. CAB.Parttbl and Spool 2 are joined using a merge join, with a join condition of ("CAB.Parttbl.P_PARTKEY = T_PARTKEY"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 18,636 rows (3,894,924 bytes). The estimated time for this step is 0.50 seconds.
  6. Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
    -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 4.66 seconds.

In the next example, three tables are joined and the following statistics collected. Note that the third table, Supplier, has no statistics collected.

HELP STATISTICS Txn;

Date Time Unique Values Column Names
08/11/26 14:41:54 45,000,000 T_TXNKEY
08/11/26 14:45:30 6,000,000 T_PARTKEY
08/11/26 14:48:52 300,000 T_SUPPKEY
08/12/03 11:17:38 2,526 T_SHIPDATE


HELP STATISTICS Parttbl;

Date Time Unique Values Column Names
08/12/01 08:55:06 6,000,000 P_PARTKEY
08/11/26 14:50:10 50 P_SIZE

No statistics are collected for the Supplier table.

EXPLAIN

SELECT * FROM Txn,Parttbl,Supplier

WHERE P_partkey = T_partkey

AND S_suppkey = T_suppkey

AND P_size = 3

AND T_shipdate = 950101

AND S_nationkey = 6

Explanation

  1. First, we lock a distinct CAB."pseudo table" for read on a RowHash to prevent global deadlock for CAB.Supplier.
  2. Next, we lock a distinct CAB."pseudo table" for read on a RowHash to prevent global deadlock for CAB.Parttbl.
  3. We lock a distinct CAB."pseudo table" for read on a RowHash to prevent global deadlock for CAB.Txn.
  4. We lock CAB.Supplier for read, we lock CAB.Parttbl for read, and we lock CAB.Txn for read.
  5. We do an all-AMPs RETRIEVE step from CAB.Txn by way of an all-rows scan with a condition of ("CAB.Txn.T_SHIPDATE = DATE '1995-01-01'") into Spool 2 (all_amps) (compressed columns allowed), which is redistributed by the hash code of (CAB.Txn.T_SUPPKEY) to all AMPs. Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is estimated with high confidence to be 74,730 rows (9,415,980 bytes). The estimated time for this step is 16.61 seconds.
  6. We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a RowHash match scan, which is joined to CAB.Supplier by way of a RowHash match scan with a condition of ("CAB.Supplier.S_NATIONKEY = 6"). Spool 2 and CAB.Supplier are joined using a merge join, with a join condition of ("CAB.Supplier.S_SUPPKEY = T_SUPPKEY"). The result goes into Spool 3 (all_amps) (compressed columns allowed), which is redistributed by the hash code of (CAB.Txn.T_PARTKEY) to all AMPs. Then we do a SORT to order Spool 3 by row hash. The size of Spool 3 is estimated with no confidence to be 13,028 rows (2,722,852 bytes). The estimated time for this step is 0.08 seconds.
  7. We do an all-AMPs JOIN step from CAB.Parttbl by way of a RowHash match scan with a condition of ("CAB.Parttbl.P_SIZE = 3"), which is joined to Spool 3 (Last Use) by way of a RowHash match scan. CAB.Parttbl and Spool 3 are joined using a merge join, with a join condition of ("CAB.Parttbl.P_PARTKEY =T_PARTKEY"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 13,028 rows (3,804,176 bytes). The estimated time for this step is 0.47 seconds.
  8. Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
    -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 17.16 seconds.

In the above explain text, step 5, which accesses the Txn table, shows high confidence because statistics were collected on both the single predicate, T_Orderdate, and the primary index column, T_Txnkey.

Step 6, which joins Txn to Supplier shows no confidence because no statistics were collected for the Supplier table, even though the step that feeds into that join (Step 5) exhibited high confidence.

Step 7 also shows no confidence even though this step joins the spool from a previous step that accessed the Parttbl table whose collected statistics warrant high confidence. Step 7 shows no confidence because the spool the Parttbl is being joined to was built in Step 6, which showed no confidence. No confidence always has a trickle-down effect within a plan, even when a new table being joined to the already built spool file has all the right statistics.

Query demotions and reserved AWTs

Dear Carrie:

If I reserve two AMP worker tasks (AWTs) for a tactical allocation group (AG), and a running query gets demoted via a workload exception or a query milestone, does that query’s AWT still count against the reserve? I’m worried that other tactical queries get held up if an undeserving query uses too many resources but still hangs on to that reserved AWT.
—Dutiful DBA

image

Click to enlarge

Dear Dutiful:

As each query step arrives on the AMP from the dispatcher, it is assigned a work type that indicates the importance of the work that will be done. Most such query steps are assigned a type of MsgWorkNew.

However, if the AG associated with the dispatched step is an expedited AG, then that query will be eligible for a MsgWorkEight AWT, a special work type intended for tactical queries. Work coming from an expedited AG will draw AWTs from a special reserve pool. Once an AWT of a certain work type is acquired by a query, it sticks with that query until the query step ends.

If the query is moved under the control of a different, non-expedited AG (possibly because it was demoted into a different workload), then the query’s next step will be assigned to a MsgWorkNew AWT, not a reserved MsgWorkEight AWT. (See figure 2, right)

Although no mechanism is in place to change the work type of a step that is already executing on the AMPs, you can:

  • Refine the classification criteria used in your tactical workload to exclude queries with non-tactical characteristics. If you rely on estimates, check for missing statistics that could help make the estimates more accurate.
  • Increase the number of reserved AWTs that tactical work can draw from. If you are on an open systems platform, the number of reserved AWTs can be increased to as high as 20 if the former limit of five is too restrictive. Too few AWTs in the reserve pool may not support your tactical concurrency, so don’t be shy about making the pool larger if needed.

The number of partitions that must be processed, not the number of partitions defined in the table, will predict possible performance issues.

PPI table with high granularity

Dear Carrie:

I’m consulting at a site, and part of my work includes planning for some data movement. I just discovered that the client has 7,750 partitions on its transaction history table. What are the trade-offs in having such a large number of partitions?

In the past, I’ve seen partitioned primary index (PPI) strategies that use different partition widths. Closer time periods use finer partitioning—daily, for instance—versus older time periods of, say, four years and up that use weekly or monthly partitioning. Is that advisable?
—Avoiding Extremes

Dear Extremes:

A PPI table that has such a large number of partitions could present you with a couple of issues:

  • Single-AMP PI access will be degraded significantly if the PI of a PPI table is defined as a non-unique PI (NUPI). Unless the PI includes a partitioning column, it is required to be non-unique. With a NUPI, each non-eliminated partition on the AMP must be probed to see whether it contains the NUPI value specified in the query. For a single AMP, this could take a long time because the probing happens serially.
  • Joins that use the PI of a PPI table may be slower. This delay will occur unless the other table in the join is identically partitioned and the join conditions in the query include the partitioning columns. Otherwise, when the number of partitions involved in the join activity approaches or exceeds 300, the other table must be scanned multiple times when doing the usual sliding window join.

These two issues will not slow the queries down much as long as they are coded to have good partition elimination. The number of partitions that must be processed, not the number of partitions defined in the table, will predict possible performance issues.

There is no single answer as to whether your client should redefine its partitioning. However, the recommendation in the PPI Orange Book is to make sure all of the following aspects of partitioning work well together:

  • Partitioning expressions
  • Queries, specific and overall
  • Performance, access method, join strategy, partition elimination
  • Data maintenance
  • Altering partitioning
  • {ackup and restore

Check out the most current Partitioned Primary Index Usage (Single-Level and Multi-Level Partitioning) Orange Book, available from the Teradata@Your Service Web site.

AWT ResUsage table

Dear Carrie:

After upgrading my client to Teradata 12, I reworked its AWT report to point at the new ResUsage table. I’m seeing some data that I don't understand, however. WorkTypeInUse00 (MsgWorkNew counts) and WorkTypeInUse01 (MsgWorkOne counts) are higher in some cases than their theoretical limits, and also larger than InUseMax. Are you aware of any issues with these columns?
—Seeing is Not Believing

Dear Seeing:

In Teradata 12, a new table was added to ResUsage called ResUsageSAWT. This table is used to capture information about AWTs. If you are seeing numbers that are out of range in your ResUsageSAWT output, you most likely have different collection rates versus logging rates. You can tell whether this is the case by examining the CollectIntervals column in the SAWT output. This column indicates how many collect intervals are contained within each log interval for that data.

When gathering output from the SAWT table, you must factor in the CollectIntervals column and adjust your SQL accordingly. When multiple collection intervals occur in each log period, the values in the work type columns equal the sum of the in-use counts reported in each collection period.

Actual output from the ResUsageSAWT table is shown using two different SQL syntax examples. The first example ignores the CollectIntervals column; the second example accounts for the CollectIntervals column value.

  • Ignoring the CollectIntervals column in the SQL. Following is ResUsageSAWT output (see table 1) in which the logging rate was 600 seconds and the collection rate was 60 seconds. Note that the SQL statement accesses the WorkTypeInUse00 and WorkTypeInUse01 values without any adjustment to the in-use counts. The output from this SQL follows:
  • SELECT TOP 5

    thedate,thetime,nodeid,vprid,

    secs,collectintervals,

    WorkTypeInuse00 AS worknew,

    WorkTypeInuse01 AS workone,

    InUseMax

    FROM DBC.ResUsageSAWT

    ORDER BY worknew DESC;

    As you can see in table 1, the number reported for worknew (216) exceeds the hard limit of 50 for MsgWorkNew work types. In addition, the InuseMax column value (44- 48) is less than the worknew and workone values combined. Both situations are impossible.

  • Incorporating the CollectIntervals column into the SQL. The same SQL as above is used, but the statistics columns are divided by the collect interval value to provide an accurate reporting of in-use counts, as seen in table 2:
  • SELECT TOP 5

    thedate,thetime,nodeid,vprid,secs,

    collectintervals,

    WorkTypeInuse00/collectintervals (FORMAT '99.99') AS worknew,

    WorkTypeInuse01/collectintervals (FORMAT '99.99') AS workone,

    InUseMax

    FROM DBC.ResUsageSAWT

    ORDER BY worknew DESC;
image

Click to enlarge

Losing data during DBQL maintenance

Dear Carrie:

Each day just before offloading the Database Query Log (DBQL) data to a history database, we issue an END LOGGING statement followed by a BEGIN LOGGING statement. But we’re losing in-flight data. Is there any way to minimize this lost data?
—In-Flight Turbulence

Dear In-Flight:

The END LOGGING command forces the DBQL buffers to be written to disk. The problem is that if a query completes after END LOGGING but before BEGIN LOGGING, DBQL won’t record that query.

In-flight data is lost only when the END LOGGING command is issued against the users who are doing the in-flight work. A way to minimize lost data is to issue the END LOGGING on a user ID that is not being used, or on a user ID that is set up just for this purpose. In fact, one company site uses SystemFE as its target user ID.

Just so you know, it can take a little time after the END LOGGING command for every parsing engine to receive the message and write their caches. Leaving a little time between the END LOGGING command and when you start copying to the history tables provides a small buffer to accommodate possible delays. This is especially true of large systems with a lot of activity.


twitter
StumbleUpon
Delicious
Facebook
Google
Technorati
Reddit
Your Comment:
  
Your Rating:

Comments
 
awesome explanation.. Thanks a Lot for publishing this article, Please do keep posting this kind of study . Make us also part to share your wealth of knowledge. Thanks, S

8/4/2009 2:37:03 AM
— Anonymous