Register | Log in


Subscribe Now>>
ARCHIVE: Vol. 6, No. 4
Home News Tech2Tech Features Viewpoints Facts & Fun Teradata.com
Tech Support
Download PDF|Send to Colleague

Enlightening questions, expert answers

A Teradata Certified Master answers readers’ technical queries.

by Carrie Ballinger, Senior Technical Consultant, Teradata Certified Master V2R5

I’ve got this quote by French playwright Eugène Ionesco hanging on my wall at work that reads: “It’s not the answer that enlightens, but the question.” Those words are particularly relevant to this column, which is composed of technical questions I have received from people such as yourselves.

When I first started this column in September 2004, I thought it would be simple to just make up questions as I went along based on answers I already knew. However, I soon discovered that it’s difficult to come up with questions that are knotty enough to simulate real production issues, so I quickly discarded that approach. The main reason I’ve continued to fill this column with actual inquiries is that I have come to agree with Eugène: It’s the questions themselves, the ones you ask me, that are enlightening.

When an AG becomes inactive
Dear Carrie: I’m having a heated disagreement within my department about a function of priority scheduler. Specifically, how does resource allocation change when one allocation group (AG) becomes inactive? We don’t agree on whether the resources freed up by the inactivity go to other groups within the same resource partition (RP) first, or if the CPU belonging to the recently inactive AG is reallocated across all RPs equally.
—Take My Side

Figure 1: When an allocation group becomes inactive
Figure 1
enlarge

Dear Take My Side: I’ll answer your question using figure 1. All work entering the system is assigned to a priority scheduler AG, which carries the run-time priority by which that work will be controlled. “Inactive” is a status given to an AG that has had no activity for a period of equal to the “active interval” (usually 61 seconds). So here’s what happens when an AG becomes inactive:

First, let’s assume that AG10, AG11 and AG12 are AGs that all belong to RP1, and AG20 belongs to RP2. As figure 1 shows, both RPs and AGs have relative weights, with the sum of the relative weights of the AGs being equal to their RP’s relative weight. Now suppose that after a period of time, the last query under the control of AG10 completes and AG10 becomes idle. It remains idle for 61 seconds and is then flagged as inactive.

When an AG becomes inactive, relative weights for the other active AGs within the same RP will be recalculated. The remaining two AGs in RP1 (AG11 and AG12) will end up with larger relative weights than before because there are fewer AGs within RP1 to share RP1’s relative weight. Other AGs from other RPs, such as AG20, will have the same relative weights as before.

Because they have higher relative weights, AG11 and AG12 have more CPU available to them to consume, if they have a need for it. But AG20 from RP2 will be entitled to the same level of CPU as before since its relative weight has not changed.

In reality, most of the higher-priority AGs aren’t greedy and consume less CPU than their relative weights. This means that there is often unused CPU available for other active AGs—that is, groups that are able to consume more than their relative weight.

All active AGs are allowed to consume that unused resource no matter what their RP of origin. Allocation groups that came from the same RP as the inactive AG are not favored in the dispersal of spare CPU that may result from unconsumed CPU.

The higher the relative weight of the other active AGs, the more of the unused CPU they are offered. This happens no matter what RP they belong to.

Complex mathematics and column alias names
Dear Carrie: Our site is very keen on column aliases because it makes the code easier to maintain. We have extremely complex formulae dealing with longitude and latitude, point-to-point distances accounting for the curvature of the Earth. Does Teradata remember the value of the column alias and substitute the value for each subsequent use of the alias? Or does Teradata perform the original calculation for each subsequent use of the alias? Here’s an example of the kind of thing we are doing:

SELECT HEADING_ID
  , RECNUM
  , OPS_ID
  , LTNP
  , COS(LOCHDG*3.141562654/180) ALIAS1
  , ALIAS1 + ALIAS1 + ALIAS1 + ALIAS1 + 
    ALIAS1 HDGTOTAL
FROM LOCTABLE

The subject is important because any small performance gain or loss will matter when the incoming daily data reaches billions of rows per day.
—Calculating in the U.K.

Dear Calculating: The Teradata optimizer has the ability to use what is called “common sub-expression elimination,” which would apply in this case. If a calculation appears more than once in a query, and each appearance is semantically the same, the optimizer will recognize that and the calculation will be executed once. The result of that first calculation will be remembered and applied to subsequent appearances of that calculation.

In this case, when the five iterations of the alias are totaled, the result of the calculation behind each alias is already known, and that result will be summed to get the “HDGTOTAL” value for each row. This will perform better than if you were summing a different calculation each time.

Empty partitions
Dear Carrie: Will there be a problem if we have too many empty partitions in a partitioned primary index (PPI) table? We are planning to partition a table based on date and plan for a very broad range of dates out into the future to avoid maintenance. It’s likely we’ll have empty partitions from previous years as well, as data gets purged out of old partitions. We do collect stats on the partitioning column. Are there any problems I should be aware of with this approach?
—Partitioning in Pennsylvania

Dear Partitioning: Collecting statistics on the partitioning column is always a good idea. However, it may not be enough if you have a large number of empty partitions because it doesn’t tell the optimizer whether a partition is empty or not. So, with a large number of empty partitions there is a greater risk of less-than-optimal query plans. This is especially true for join plan costing, where the number of rows in the table could be misrepresented due to the empty partitions.

In V2R6.1 you are able to collect statistics on the system-derived column PARTITION, which will tell the optimizer how many partitions are empty and other statistical information about the partitions. Collecting statistics on PARTITION is a very quick operation because rather than reading all the base table rows, it usually just scans the cylinder index for that PPI table. Under some conditions, when partitions start or end in the middle of a data block, the statistics collection process may have to read a few of the base table data blocks. But the number of data blocks read will never be more than the number of partitions plus one. I would definitely recommend taking advantage of that enhancement as soon as you can.

Monitoring memory usage
Dear Carrie: I’ve got a simple question for you: How can I find out how memory is being used on my system and if I have any memory problems?
—Looking for Bottlenecks

Figure 2: Physical memory on a Teradata node
Figure 2
enlarge

Dear Looking: In Teradata, physical memory is separated into the file segment (FSG) cache (used by the Teradata file system for database work) and so-called “free memory” (used by the operating system to manage the execution of code). What you really care about is watching free memory. Free memory is used the various caches, including dictionary and request caches, AMP and parsing engine tasks and by various buffers and other applications (see figure 2).

When available free memory dips too far below 100MB (with 40MB considered the minimum floor), some sites have experienced performance issues. Available free memory can be seen in the ResUsage SPMA table, column MemFreeKB. Monitoring MemFreeKB can tell you whether or not there is a shortage of memory. Increasing memory per node, adding nodes or reducing the FSG cache (which is tunable) are the alternatives to take if memory shortages exist.

Paging/swapping is the main symptom of memory shortage. The following ResUsage columns in the ResMemMgmtByNode macro help identify swapping: “Aloc Fail %” (percent of memory allocation attempts that failed), “# Proc Swp” (current number of processes that are swapped out), “Pg Drps/Sec” (average number of text pages dropped from memory per second in order to increase the amount of available memory).

Monitoring memory usage usually comes down to observing its behavior over time in ResUsage.

It’s becoming more important to monitor memory today because many of the new features recently added to Teradata require more memory. In V2R5, when Teradata went from 14 parser tasks per parsing engine (PE) to 120, there was the potential for more memory buffers to be in use. Database Query Log (DBQL), introduced around the same time, uses in-memory buffers for its caches. Both Value List Compression and PPI require larger table headers, which are held in the memory of each AMP. Now the request text buffer is 1MB whereas it used to be 64KB, and the dictionary cache has a larger default. Hash joins eliminate costly row redistributions by doing more work in memory. All of these enhancements provide good benefits but, in combination, they should alert you to the need to keep your eye on this critical resource.

If you believe you have a memory shortage issue, simple tuning steps are available involving adjustments to the FSG cache percent. It’s best to work with your local support team if you think you have an issue in this area.

Also, I want to remind you that Teradata on 64-bit platforms will open up memory opportunities for Teradata users. Today, Teradata is running on 64-bit LINUX platforms; Windows 64-bit is coming soon as well.

All-AMP operations within group-AMP queries
Dear Carrie: I’m looking at an explain statement on one of our tactical group-AMP queries and am surprised to see that the first retrieve step, which is a single-AMP step, includes an all-AMP row redistribution. I had been told that a group-AMP query avoided all AMP operations.

EXPLAIN
SELECT SP_ CURRQTY
   ,S_NAME
FROM SUPPLIER_PRODUCT
   ,SUPPLIER
WHERE SP_SUPPKEY = S_SUPPKEY
AND SP_PRODUCTKEY = 395884;

Explanation 
1)  First, we do a single-AMP RETRIEVE step from RMPROD.supplier_product 
    by way of the primary index "RMPROD.supplier_product.SP_ productkey = 
    395884" with no residual conditions into Spool 2 (group_amps), which is 
    redistributed by hash code to all AMPs. Then we do a SORT to order Spool 
    2 by row hash. The size of Spool 2 is estimated with high confidence to 
    be 4 rows. The estimated time for this step is 0.01 seconds.  
2)  Next, we do a group-AMPs JOIN step from RMPROD.supplier by way of a 
    RowHash match scan with no residual conditions, which is joined to Spool 
    2 (Last Use) by way of a RowHash match scan. RMPROD.supplier and Spool 2 
    are joined using a merge join, with a join condition of ("SP_SUPPKEY = 
    MPROD.supplier.S_SUPPKEY"). The result goes into Spool 1 (group_amps), 
    which is built locally on that AMP. The size of Spool 1 is estimated with 
    low confidence to be 4 rows. The estimated time for this step is 0.05 seconds.  

In the above query, SP_PRODUCTKEY is the non-unique primary index (NUPI) of the SUPPLIER_PRODUCT table, which has one row for every combination of product and supplier. There are usually four rows, and never more than 10 rows with the same NUPI value. Each row returned in the first step of the plan contains a foreign key to join into the primary index (PI) of the supplier table.

Is the explain text inaccurate, or is there actually an all-AMP operation being performed within this group-AMP query?
—Grouping Quandary

Dear Grouping: First, just a reminder that group-AMP functionality is designed to replace all-AMP plans when processing involves more than one, but less than all, AMPs in the system. Group-AMP minimizes the scope of the query and the resources required to support it. Group-AMP plans, such as the one above, do not have all-AMP table lock steps or all-AMP access steps, and their END TRANSACTION steps only go to the AMPs that actually performed work. That’s a large resource savings for a short query such as this.

But because it is not known in Step 1 of the plan which specific AMPs will be involved in receiving redistributed rows, the optimizer has to generate an all-AMP row redistribution. As a result, every AMP will receive a WorkOne (spawned work) message and will have to acquire an AMP worker task (AWT) to be ready for any redistributed rows that land there. In point of fact, only four AMPs will receive redistributed rows in this case, but receivers are set up on all AMPs.

After those four AMPs receive their rows, the other AMPs that are waiting to receive rows are told to clean up and release their AWTs. Only the AMPs that actually received rows will be placed on the list to do subsequent group-AMP processing.

I simulated your query and logged it to the DBQL step table (see table below). Step-table logging will record how many AMPs each step actually touches (in V2R6.0 or above releases). The DBQL step-table output on my 20-AMP system confirms that in Step 1 of this query all the AMPs were accessed.

Next time you are wondering about something like this, try using the DBQL step table to validate what’s really happening. T

Table: DBQL step table

StepLev1
Num
Step
Name
CPU
Time
IO
Count
EstRow
Count
Row
Count
NumOf
ActiveAMPS
1
RET
0
78
4
4
20
2
JIN
0
28
4
4
4
3
Edt
0
0
0
1
1

Teradata Magazine-December 2006

Related Link

Reference Library

Get complete access to Teradata articles and white papers specific to your area of interest by selecting a category below. Reference Library
Search our library:

Teradata.com | About Us | Contact Us | Media Kit | Subscribe | Privacy/Legal | RSS
Copyright © 2008 Teradata Corporation. All rights reserved.