Teradata Magazine Cover Teradata Magazine Online  
Register Help Password
Password:
Quick Links
Current Issue
Archives
Teradata.com
Teradata Magazine Rss Feed
ARCHIVES Search Teradata Magazine Online:  
TECH2TECH

Tech2Tech
table of contents

Ask the expert
Teradata Warehouse 8.0 meets the performance challenge with new functionality.

Quest for quality
Poor quality data can eat into your profit margins, but you don't have to settle for less.

Who's driving?
Let Teradata Warehouse 8.0's event-based features take the wheel.

Now playing
Teradata CRM 5.1 offers new capabilities for the customer-driven enterprise.

Flex your muscles
Now it's easier to create the right function for the job at hand.

Tech support
Hear the voice of experience! A Teradata Certified Master shares great tech tips.


Printable versionPrintable version Send to a colleagueSend to a colleague

Looking for the right solution?

A Teradata Certified Master is here to help.

Seems like hardly a workday goes by that I don't bump up against one thought-provoking technical inquiry or another. Because good technical tips deserve to be recycled, I'd like pass on a few interesting Q & As.

Teradata Dynamic Query Manager and full table scans
Q. When I specify 'No Full Table Scans Allowed' under a Teradata Dynamic Query Manager (TDQM) Query Resource rule, TDQM rejects queries that performed a row hash match scan involving that table. Am I doing something wrong or is TDQM functioning correctly?
A. There may be situations where restrictions on table scans are enforced by TDQM beyond just the obvious full-table scans. Teradata can, for example, interpret joins and row hash match scans as requiring all rows to be read and, therefore, fall under the TDQM full-table scan restriction umbrella.

However, spool files are not considered against full-table scan restrictions. If the tables are spooled first, a row hash match scan of two spool files will not trigger query rejection. Further, retrieving rows from a base table with a partitioned primary index in the cases where partition elimination is taking place will not be interpreted as a table scan by TDQM, nor will such a query be a candidate for rejection on that basis alone.

PPI and synchronized table scans
Q. I have an Explain that shows a retrieve step against a partitioned primary index (PPI) with only a single partition accessed. The step says that this PPI table is eligible for synchronized scanning.What is the advantage of sync scan if similar queries are accessing different partitions of the same PPI table?
A. The PPI table is eligible for sync scan from the optimizer's point of view, which is why you are seeing that text in the Explain. But the AMP (specifically, the file system) will not initiate a sync scan if partition elimination has been selected for the step.

Remember that the optimizer is just expressing eligibility for a specific optimization, not mandating it be used. Even if the AMP decides to support a sync scan in a particular case, there might not be anything to scan with at that time, or the other scanners might be working at different paces. A number of factors come into play before eligibility is translated into reality.

There may be little value in trying to synchronize scans targeted to different partitions. However, if it turns out to be desirable to provide sync scan with partition elimination, such as in cases where there is repetitive access to the same partition, then Teradata might consider adding that feature to a future release.

Value list compression and load time
Q. I've heard value list compression (VLC) provides a performance advantage during the load process. VLC can dramatically reduce space needs, but I don't understand how it can speed up loads.
A. I've witnessed performance boosts during loading from two V2R5 features.

With VLC, if enough columns are compressed in rows prior to loading, then more of these rows can be placed into a single data block. Therefore, the load utility is doing less physical I/O to store more rows. Less I/O demand often translates to faster loads. In addition, ARCMAIN may perform better with VLC in place as well, as there will be fewer blocks to back up.

If the table you are loading is defined with a PPI and you are loading into one or a few partitions only (e.g. loading one day's transactions into a PPI table partitioned by day), then the data blocks will be so few in number and the data so concentrated that they will often stay in the cache. This reduces the I/O required for the load job.

In one test my group ran, loading 120,000 rows into a single partition of a PPI table was 20% faster than loading the same number of rows into an equivalent non-PPI table. Wait I/O for the PPI load was noted to be one-third what it was for the non-PPI case, indicating reduced I/O demand.

Usage resource rundown
Q. Can Teradata Priority Scheduler resource usage data be directly related to either ResUsage or AMPUsage data?
A. Neither ResUsage nor AMPUsage are the same as the Teradata Priority Scheduler Monitor output in terms of their slant on CPU usage.

ResUsage captures resource usage for an entire node, including some activity that is not under the control of the Teradata Database (gateway overhead and operating system CPU, for example). So ResUsage will never line up exactly with either AMPUsage or Teradata Priority Scheduler Monitor output, both of which only reflect work within Teradata.

AMPUsage will never look the same as Teradata Priority Scheduler Monitor output, either, because collections are at the AMP level, and the AMPs don't report AMPUsage consumption until the end of the AMP step. This means that long-running queries won't attribute their consumption to the points in time that they were actually consumed in AMPUsage.

Teradata Priority Scheduler Monitor output, on the other hand, gives you a snapshot of who is using how much CPU at one point in time on a node. It reports usage for each active Allocation Group over the most recent age interval (usually 60 seconds), but it does not accumulate total usage, as does AMPUsage.

In Teradata Database V2R5.1, Database Query Log (DBQL) offers an alternative for viewing resource usage that is similar to AMPUsage because it represents an accumulation of usage over the life of a query.

Table-level statistics recollection
Q. If you already have column and index statistics collected on a table and you recollect stats at the table level, will one single scan be performed with all stats recalculated in one sweep?
A. Recollection at the table level does not combine the scans. Individual collection and table-level recollection are essentially the same, in terms of performance.

If you do an Explain of COLLECT STATISTICS at the table level, you'll see a separate (and serial) COLLECT STATISTICS step for each column or index. However, the steps share a final spool, and a single update is made to the dictionary tables.

Producing statistics is similar to a very intense aggregation and requires very high CPU usage. Different trade-offs exist for statistics collection compared to query execution, where parallel steps are often a performance advantage. Independent statistics collection (with some overlap in final processing) has proven to be a more balanced approach in Teradata than trying to build all of them at once. However, this approach is continually reevaluated with every new Teradata release.

Need tech support? Review the FAQ archives on teradatamagazine.com or e-mail your questions to teradata.query@teradata-ncr.com. T

© Teradata Magazine-September 2004



back to top



Copyright by Teradata Corporation 2001-2007.