Making the most of it
Sometimes small solutions make a big difference.
by Carrie Ballinger
One of my guilty pleasures is working jigsaw puzzles. Drop by our house, and chances are you'll see the pieces of one puzzle or another spread out on a corner of our dining room table. It's fun to work on things in bits and pieces.
Interestingly, my job is often that way as well. I get more interruptions than I would like—one-off questions, the odd appeal for help, ad hoc brainstorming—but I actually welcome those breaks in the routine. Putting together small solutions or coming up with novel answers—these mini-accomplishments are as rewarding to me as completing a puzzle. And sometimes it's the small technical additives that complete a business's big picture.
Partitioned primary index table and packdisk
Q: Currently we run packdisk with freespace=3 every other week on the transaction tables. Since these are PPI tables, partitioned by date, would it be beneficial to run a packdisk with freespace=0 instead?
A: I like your approach, but I think you might be doing some unnecessary work. For those readers unfamiliar with the command, packdisk is a utility that packs cylinders on disk, leaving a percentage of free space (if requested) within each cylinder for growth.
As you have already concluded, freespace=3 is doing nothing for you, because with date partitioning everything except your current partition has remained unchanged since the last packdisk. Even with freespace=0, there's no advantage to continually repacking a PPI table (and you don't have the option of just packing one partition at a time). Additionally, if your purpose in running packdisk is to get more efficient full-cylinder read operations, be aware that PPI tables are currently not candidates for full-cylinder reads.
I suggest only doing a packdisk, when necessary, on the interval of time that matches the partition boundary. If it's a monthly partition, wait until one, two or three months have passed before doing the packdisk, and use freespace=0.
Try to avoid doing a packdisk when the inserts into the current partition are in mid-stream, because it's a good bet you'll get further blocksplits and cylsplits from the remaining inserts into that partition. Inserts into a date-partitioned PPI table are very intensely focused on a limited set of blocks and cylinders; for that reason, PPI tables often have fewer holes between data blocks than a non-PPI table, reducing the need to run packdisk.
Keep in mind: Packdisk is cylinder-based and works on each AMP separately. You need to consider what the average number of cylinders/partitions/AMPs are in your PPI table to determine whether packdisk will add any value at a particular point in time. For example, if your partitions are so fine or your inserts so low a volume that you only have two or three cylinders/partition/AMP, then there will probably be less benefit in using packdisk as each partition fills up.
Using Database Query Log LIMIT THRESHOLD
Q: I'm on Teradata Database V2R5 and using Database Query Log (DBQL) summary logging with the LIMIT THRESHOLD option. If I introduce a Teradata Dynamic Query Manager (Teradata DQM) workload limit rule to limit concurrency for the same group of users, what will happen if a query is delayed and the delay causes the query to exceed the DBQL threshold?
A: I'm glad to see you are using LIMIT THRESHOLD—I'm a big fan of that option. When you enable DBQL logging you can define a threshold, which in Teradata Database V2R5 is the number of seconds of execution time. All queries that execute faster than that threshold are logged to the summary table where they are counted; conversely, queries that run longer cause an entire row to be logged to the DBQLogTbl. In Teradata Database V2R6 that threshold has been expanded to include CPU or I/Os used by the query.
DBQL uses the entire execution time (StartTime to LastRespTime) to determine whether a query will be counted in the summary table or written to the default logging table. DBQL does not consider whether the query was sitting in the delay queue for some of that time. As a result, delay time counts against you.
Figure 1 helps illustrate a solution. Define your Teradata DQM workload limit (known as a "throttle" in Teradata Database V2R6) using the step threshold option, making that step threshold a number of seconds that correlates to the DBQL limit threshold value.
If your DBQL limit threshold is set at five seconds, your Teradata DQM throttle step threshold might be set at one or two seconds. Because DBQL is considering actual execution time, whereas Teradata DQM is looking at estimated processing time, the two values are usually different. To keep these settings closely aligned, you might want to crosscheck them. Things like concurrency levels and priority of execution will impact actual execution times reported by DBQL, but they are not taken into consideration in explain-step estimates used by Teradata DQM.
Advice on query milestones
Q: I've set up query milestones for my ad hoc users so long-running queries get bumped down. But milestones don't seem to be working for me. The really complex queries overwhelm the short queries. Any advice?
A: Query milestones frequently require some trial and error. For readers unfamiliar with query milestones, they are a feature of Priority Scheduler that will automatically demote the priority of a running query, based on the query reaching a threshold of CPU usage on a node. The threshold number of seconds, how far a drop in priority they experience, and how many times you demote can all influence how well the milestone scheme works for you.
Here's the advice I usually pass on to those starting out with query milestones, with regard to those three factors:
- If you are instituting query milestones to protect short queries, use a very tight CPU usage threshold, even as low as a fraction of a second. Since the threshold is enforced at the node level, the more nodes in the configuration, the more appropriate a lower number of seconds will be. The smaller the threshold, the sooner the "not-short" queries are moved out of the way.
- Keep a healthy contrast in priority between levels. I like to see at least a 4:1 ratio in the assigned weight between the first allocation group (AG) and the one demoted into. For example, if the first AG carries a weight of 20, give the next one a weight of five. Strong priority differences will help to protect the short-running work, so be aggressive.
- Fewer demotion levels usually provide better results. All active AGs within a resource partition (RP) share that RP's priority (technically, they share its relative weight). The more active AGs in that RP, the fewer relative weight points each gets. You may be seriously diluting the priority of the first level AG if you've got more than one or two demotion levels in place. And if you've got three or four levels, you may not be able to get that strong priority contrast I mentioned above, inadvertently slowing down the short query response times.
Re-collecting statistics at the table level
Q: I've collected statistics on several columns and indexes on my largest table. If I re-collect statistics at the table level, will that allow all the defined statistics on that table to be gathered with a single scan of the table?
A: I can understand why you might think so, but re-collection at the table level does not combine the scans. The time involved for you to use the two approaches (collection of each statistic individually or re-collection at the table level) should be essentially the same.
Run an EXPLAIN on the table level command. You'll see a separate (and serial) COLLECT STATISTICS step for each column or index for which stats already exist. Notice that they do share their final spool in common and a single update is made to the dictionary tables.
Here's why they don't share a scan. The effort to produce statistics is similar to a very intense aggregation, with very high CPU usage required. This is a different animal from query execution, where parallel steps are usually a performance advantage. To help you understand this, I just ran a statistics collection job on a single column in a 75 million-row table, on a quiet 2-node system. The CPU usage was between 70% and 80% most of the time. Another point to consider: If there are a number of columns or indexes involved, the amount of spool space could be quite large if they were generated at the same time.
Performing the collection of stats independently (with some overlap in final processing) has proven to be a more balanced approach than trying to build all of them at one time. Let me add, though, that this approach is being continually re-evaluated with each new release of the Teradata Database.
Reserving resources
Q: I was told that Teradata Priority Scheduler has a mechanism to reserve 2% of the machine for single-AMP queries and 98% for other queries. How would you implement this?
A: Teradata is biased toward maximizing throughput and so does not directly support reserving resources, such as CPU, ahead of time. However, there are two different Teradata Priority Scheduler features that are sometimes credited with doing just that: CPU limits and reserved AMP worker tasks.
CPU limits (see figure 2) set a ceiling on the percentage of CPU that can be consumed at any point in time by an allocation group, a resource partition or the entire system. Setting such a limit does not fence off a given percentage of CPU for that particular work or for any other category of work active at the same time. Instead, the CPU limit throttles back queries that are under its control when they try to consume more CPU than the defined limit allows.
It's like having a governor on your car that won't let you drive faster than 65 mph. At 50 mph, the governor has no impact. It only kicks in when you exceed 65 mph.
While you cannot reserve 2% of CPU for your tactical queries ahead of time, you can make sure those queries meet expectations by giving them a high contrast in priority. With a high priority, well-tuned single-AMP queries can get quick access to CPU when they need it but not prevent others from using that CPU when they don't.
Teradata Priority Scheduler's reserve AMP worker task (AWT) functionality sets up a small pool of AWTs on each AMP that can only be used to service AMP work coming from selected allocation groups—most frequently, those containing single-AMP tactical queries. This reserve only provides a benefit when the general pool of AWTs, which all incoming queries draw from first, is exhausted. A reserve pool of AWTs is similar to a reserve tank of gasoline in your car. It serves no purpose unless the main tank runs out of gas.
Although reserved AWTs can pave the way for quick tactical query turnaround, this reserve only addresses AWTs. It does not reserve CPU or other platform resources ahead of time. I usually recommend against creating such a reserve pool unless you have established that a shortage of AWTs actually exists and that this shortage is contributing to tactical query inconsistency. T
Teradata senior technical consultant Carrie Ballinger works in the Active Data Warehouse Center of Expertise in El Segundo, Calif.