Teradata Magazine Cover Teradata Magazine Online  
Register Help Password
Password:
Quick Links
Current Issue
Archives
Teradata.com
Teradata Magazine Rss Feed
ARCHIVES Search Teradata Magazine Online:  
























An active data warehouse is its own ecosystem, home to a variety of diverse activites, each with different profiles and resource needs.


THE WILD WORLD OF MIXED WORKLOAD
Priorities and resources learn to get along

by Carrie Ballinger

IMAGINE THAT YOU ARE A ZOOKEEPER, ONLY instead of managing a large park with distinct areas for different animal species, you watch over a single cage containing all the animals.

The challenges are obvious: How do you keep the larger animals from eating food intended for smaller animals? How do you adjust feeding schedules to reflect each animal's seasonal and daily needs? How do you ensure each type of animal has adequate water for drinking, bathing and playing? How do you limit the population so that it doesn't exceed the cage's finite capacity?

The idea of this single-cage zoo highlights the challenges of active data warehousing, which must contend with many types of work running on a single platform. DBAs have to understand users' needs and consider how to allocate resources so that the system meets those needs in a timely fashion. Mixed workloads also challenge DBAs to consider query priorities and determine which queries should have first dibs on the system and which can wait. At the same time, they need the ability to dynamically adjust the system's parameters to respond to current needs.

Essentially, an active data warehouse is an ecosystem full of of diverse activities, each with different profiles and resource needs. And because the needs often conflict, DBAs must understand how different expectations can coexist, and then impose some level of guidance in order to sustain the system. In the Teradata Warehouse, this guidance takes the form of workload management.

All creatures great and small
Teradata offers a number of facilities to support the data warehouse, but when it comes to managing mixed workloads, Teradata Priority Scheduler is one of the key offerings. This facility uses weight assignments that allow administrators to mandate a high priority for the most important and the most response-sensitive work, while assigning a lower priority to less time-dependent activities. One result of this weighting system is that high-priority work is continually offered a large share of the platform resources, even if the requests themselves use very little of it.

In one Teradata implementation, for example, the highest priority work is an application that does cost validation. Because its queries are quick and streamlined, this application consumes less than 1% of the total CPU. However, to make sure all cost validation queries get to the CPU immediately upon arrival, this work has a weight assignment that entitles it to nearly 90% of the CPU. Running at the same time are data mining and business intelligence applications, which together consume almost 80% of the platform resources. Yet because speed of response is less critical, these applications have a much lower weight assignment, and thus a much lower priority. This pushes their initial allocation of system resources down into the single digits and reduces their frequency of CPU access.

This pattern, repeated at almost every Teradata active data warehouse site today, offers very high access to resources for high-priority queries, even though they will only use a fraction of what is offered, then gives the lower priority work the leftovers. In the fictional zoo cage, this is the equivalent of offering 90% of the food to the birds and only 10% to the lions.

Although it seems odd, this intentional over-allocation of resources works well on Teradata systems for achieving fast access to CPU for things such as tactical queries. That's because tactical query applications are only one of many applications active on the platform, and by themselves they simply can't use all the resources they are being offered, just as birds can't possibly consume 500 pounds of raw meat a day. In the active data warehouse, unconsumed resources flow immediately to the lower priority work.

Carnivorous queries
Most data warehouses have a query category that soaks up resources like a sponge. This might be a query that performs a high number of CPU operations for each row read or one that is just poorly written. Because such queries can eat up more CPU without the interruption of performing an I/O, as compared to other more balanced work, they often appear to dominate platform resources. With Teradata Priority Scheduler, these queries can be, and often are, placed in a lower priority group. If the users can be identified, then they can be assigned their own low priority group.

Another approach is being used at a retailer that uses a Teradata Warehouse. This retailer gives all its MicroStrategy users the same high priority, but then uses Teradata Priority Scheduler's milestone limits to demote the longer-running queries. By using milestone limits, which accumulate CPU usage by session or by query, each query at this site is allowed to use up to one second of CPU at the higher priority level before being demoted to a lower priority. This allows all queries, short or long, to enjoy a performance benefit when they first begin, while only queries that need more than one second of CPU experience the drop in performance that comes with the lower priority.

Seasonal feeding patterns
A mixed workload is truly mixed—a hodgepodge of variety and change. In fact, it is the nature of mixed workloads to be constantly changing in terms of balance, content and workload priority. Just as animal eating patterns can be linked to seasons or times of day, patterns in the type of work running on an active data warehouse platform can be linked to business events that take place at regular, predictable intervals. Reports that might have low priority throughout the week could become the most important work in the entire company on Monday morning.

Many Teradata users have analyzed the shifting patterns of resource usage and priority on their platforms and developed multiple "processing windows," each representing a different priority profile. A simple example of a set of processing windows designed by a Teradata customer is shown in "Window of opportunity."

Population control
Just as animals reproduce, so do queries in a successful data warehouse. Teradata Dynamic Query Manager offers something similar to birth control for data warehouse queries. By differentiating between queries that are short and sweet and ones that are slower and more complex, the DBA can place a limit on how many massive queries can be active at any point in time. Restricting concurrency levels for these queries allows the other part of the mix (low-volume, quick turnaround work) to be amplified and hastened. Most mixed work running on Teradata has some percentage of these slow-moving queries; as such, population control may be desirable for maintaining overall balance.

Teradata users sometimes like to keep a low head count for queries running in a very low priority group. Substantive queries that are slowed down by having a very low priority may become glacier-like, inching along in slow motion while at the same time holding on to system resources. Restricting the number of very large and very slow-moving queries that can be active at the same point in time is a technique for increasing overall throughput in the system. It also helps ensure that the low-priority work already running is turned around without undue delay. Population control is immediately reversible by removing the restriction, giving the administrator a clear edge over the zookeeper in this regard.

Using Teradata Dynamic Query Manager, you can set limits for any priority group. For example, you can keep the data mining priority group down to only three or four queries active, or you can restrict very complex decision support queries to 10 at a time, if you wish.

Balanced ecosystem
The challenges faced by a single-cage zookeeper and the manager of an active data warehouse are similar. Both need to be sensitive to the different needs—and potential conflicts—of those under their control. Both need to be aware of the tempo and direction of change, and both need to make sure appropriate processes are in place to monitor and re-balance the environment.

But unlike the zookeeper, the active data warehouse manager has ever-improving tools and facilities at his service to enable responsible and proactive administration of the platform. With workload management routines firmly in place, you'll never have to crack your whip again. T

Traffic cops keep the work moving

When bicycles are the only vehicles out on the streets, you don't need speed limits and carpool lanes. But when trucks, buses, cars and bicycles compete for space, it would be difficult to keep traffic flowing without controls in place. That's precisely why an active data warehouse requires greater workload management than its predecessor, the traditional decision support system.

Teradata Dynamic Query Manager (DQM) and Teradata Priority Scheduler manage mixed workloads in two ways. Teradata DQM evaluates queries before they start and chooses to run, delay or reject them based on administrator-specified criteria. In contrast, Teradata Priority Scheduler divides up the available resources among work that is already running.

The manager
Teradata DQM lets you manage access to the Teradata Warehouse by controlling which queries, and what portion of them, will be allowed to begin execution. It does this by examining log-on and query requests, and comparing the objects referenced in the requests to DBA-defined restrictions.

DBAs can place limits on the concurrency of specified groups of users, similar to metered freeway access where stoplights at the top of on-ramps manage the rate at which new traffic is allowed to merge in. Administrators can also enact object restrictions based on static criteria such as user IDs, tables, or date/time. Teradata DQM controls which query requests run and which do not, based on dynamic, query-dependent criteria such as expected number of rows being returned or estimated processing time. In addition, the DBA can schedule requests to run periodically or at off-hours.

The scheduler
Using a weight-based approach to prioritization, Teradata Priority Scheduler allows you to control the frequency of access and amount of CPU allocated to different applications. This facility does not attempt to control what work is allowed to begin execution; rather, it prioritizes work once it is under way. Traffic engineers in San Francisco can manipulate the number of lanes on the Golden Gate Bridge that serve outbound versus inbound traffic in such a way that morning rush hour has more lanes coming into the city, and evening rush hour has more lanes leaving. In the same way, Teradata Priority Scheduler has facilities to shift or reassign the same pool of resources, based on changing patterns throughoutthe business day.

Some Teradata users rely on Teradata Priority Scheduler to contain applications that are capable of consuming very large amounts of resources, such as data mining, so that they have less influence on the performance of other active work. Other users have chosen to give a high priority to short-running, tactical queries, so that they will complete quickly. Still others use Teradata Priority Scheduler to divvy up resources based on subdivisions within the data warehouse community.

Carrie Ballinger, a Teradata senior technical consultant, works in the Active Data Warehouse Center of Expertise in El Segundo, Calif.




Copyright by Teradata Corporation 2001-2007.