Looking for the right solution?
A Teradata Certified Master is here to help.
by Carrie Ballinger
Seems like hardly a workday goes by that I don't bump up against one thought-provoking
technical inquiry or another. Because good technical tips deserve to be
recycled, I'd like pass on a few interesting Q & As.
Teradata Dynamic Query Manager and full table scans Q.
When I specify 'No Full Table Scans Allowed' under a Teradata Dynamic Query
Manager (TDQM) Query Resource rule, TDQM rejects queries that performed
a row hash match scan involving that table. Am I doing something wrong or
is TDQM functioning correctly?
A. There may be situations
where restrictions on table scans are enforced by TDQM beyond just the obvious
full-table scans. Teradata can, for example, interpret joins and row hash
match scans as requiring all rows to be read and, therefore, fall under
the TDQM full-table scan restriction umbrella.
However, spool files are not considered against full-table scan restrictions.
If the tables are spooled first, a row hash match scan of two spool files
will not trigger query rejection. Further, retrieving rows from a base table
with a partitioned primary index in the cases where partition elimination
is taking place will not be interpreted as a table scan by TDQM, nor will
such a query be a candidate for rejection on that basis alone.
PPI and synchronized table scans Q. I
have an Explain that shows a retrieve step against a partitioned primary
index (PPI) with only a single partition accessed. The step says that this
PPI table is eligible for synchronized scanning.What is the advantage of
sync scan if similar queries are accessing different partitions of the same
PPI table?
A. The PPI table is eligible for sync
scan from the optimizer's point of view, which is why you are seeing that
text in the Explain. But the AMP (specifically, the file system) will not
initiate a sync scan if partition elimination has been selected for the
step.
Remember that the optimizer is just expressing eligibility for a specific
optimization, not mandating it be used. Even if the AMP decides to support
a sync scan in a particular case, there might not be anything to scan with
at that time, or the other scanners might be working at different paces.
A number of factors come into play before eligibility is translated into
reality.
There may be little value in trying to synchronize scans targeted to different
partitions. However, if it turns out to be desirable to provide sync scan
with partition elimination, such as in cases where there is repetitive access
to the same partition, then Teradata might consider adding that feature
to a future release.
Value list compression and load time Q.
I've heard value list compression (VLC) provides a performance advantage
during the load process. VLC can dramatically reduce space needs, but I
don't understand how it can speed up loads.
A. I've
witnessed performance boosts during loading from two V2R5 features.
With VLC, if enough columns are compressed in rows prior to loading, then
more of these rows can be placed into a single data block. Therefore, the
load utility is doing less physical I/O to store more rows. Less I/O demand
often translates to faster loads. In addition, ARCMAIN may perform better
with VLC in place as well, as there will be fewer blocks to back up.
If the table you are loading is defined with a PPI and you are loading into
one or a few partitions only (e.g. loading one day's transactions into a
PPI table partitioned by day), then the data blocks will be so few in number
and the data so concentrated that they will often stay in the cache. This
reduces the I/O required for the load job.
In one test my group ran, loading 120,000 rows into a single partition of
a PPI table was 20% faster than loading the same number of rows into an
equivalent non-PPI table. Wait I/O for the PPI load was noted to be one-third
what it was for the non-PPI case, indicating reduced I/O demand.
Usage resource rundown Q. Can Teradata Priority
Scheduler resource usage data be directly related to either ResUsage or
AMPUsage data?
A. Neither ResUsage nor AMPUsage are
the same as the Teradata Priority Scheduler Monitor output in terms of their
slant on CPU usage.
ResUsage captures resource usage for an entire node, including some activity
that is not under the control of the Teradata Database (gateway overhead
and operating system CPU, for example). So ResUsage will never line up exactly
with either AMPUsage or Teradata Priority Scheduler Monitor output, both
of which only reflect work within Teradata.
AMPUsage will never look the same as Teradata Priority Scheduler Monitor
output, either, because collections are at the AMP level, and the AMPs don't
report AMPUsage consumption until the end of the AMP step. This means that
long-running queries won't attribute their consumption to the points in
time that they were actually consumed in AMPUsage.
Teradata Priority Scheduler Monitor output, on the other hand, gives you
a snapshot of who is using how much CPU at one point in time on a node.
It reports usage for each active Allocation Group over the most recent age
interval (usually 60 seconds), but it does not accumulate total usage, as
does AMPUsage.
In Teradata Database V2R5.1, Database Query Log (DBQL) offers an alternative
for viewing resource usage that is similar to AMPUsage because it represents
an accumulation of usage over the life of a query.
Table-level statistics recollection Q.
If you already have column and index statistics collected on a table and
you recollect stats at the table level, will one single scan be performed
with all stats recalculated in one sweep?
A. Recollection
at the table level does not combine the scans. Individual collection and
table-level recollection are essentially the same, in terms of performance.
If you do an Explain of COLLECT STATISTICS at the table level, you'll see
a separate (and serial) COLLECT STATISTICS step for each column or index.
However, the steps share a final spool, and a single update is made to the
dictionary tables.
Producing statistics is similar to a very intense aggregation and requires
very high CPU usage. Different trade-offs exist for statistics collection
compared to query execution, where parallel steps are often a performance
advantage. Independent statistics collection (with some overlap in final
processing) has proven to be a more balanced approach in Teradata than trying
to build all of them at once. However, this approach is continually reevaluated
with every new Teradata release.
Need tech support? Review the
FAQ
archives on teradatamagazine.com or e-mail your questions to
teradata.query@teradata-ncr.com.
T
Carrie Ballinger, a Teradata Certified Master V2R5, is
a senior technical consultant in Teradata's Active Data Warehouse Center
of Expertise, working in the El Segundo, Calif., office.
© Teradata Magazine-September
2004