
An active data warehouse is its own ecosystem, home to a variety of diverse activites, each with different profiles and resource needs.

|
THE WILD WORLD OF MIXED WORKLOAD
Priorities and resources learn to get along
by Carrie Ballinger
IMAGINE THAT YOU ARE A ZOOKEEPER,
ONLY instead of managing a large park with
distinct areas for different animal species, you watch over
a single cage containing all the animals.
The challenges are obvious: How do you keep the larger animals
from eating food intended for smaller animals? How do you
adjust feeding schedules to reflect each animal's seasonal
and daily needs? How do you ensure each type of animal has
adequate water for drinking, bathing and playing? How do you
limit the population so that it doesn't exceed the cage's
finite capacity?
The idea of this single-cage zoo highlights the challenges
of active data warehousing, which must contend with many types
of work running on a single platform. DBAs have to understand
users' needs and consider how to allocate resources
so that the system meets those needs in a timely fashion.
Mixed workloads also challenge DBAs to consider query priorities
and determine which queries should have first dibs on the
system and which can wait. At the same time, they need the
ability to dynamically adjust the system's parameters
to respond to current needs.
Essentially, an active data warehouse is an ecosystem full
of of diverse activities, each with different profiles and
resource needs. And because the needs often conflict, DBAs
must understand how different expectations can coexist, and
then impose some level of guidance in order to sustain the
system. In the Teradata Warehouse, this guidance takes the
form of workload management.
All creatures great and small
Teradata offers a number of facilities to support the data
warehouse, but when it comes to managing mixed workloads,
Teradata Priority Scheduler is one of the key offerings. This
facility uses weight assignments that allow administrators
to mandate a high priority for the most important and the
most response-sensitive work, while assigning a lower priority
to less time-dependent activities. One result of this weighting
system is that high-priority work is continually offered a
large share of the platform resources, even if the requests
themselves use very little of it.
In one Teradata implementation, for example, the highest
priority work is an application that does cost validation.
Because its queries are quick and streamlined, this application
consumes less than 1% of the total CPU. However, to make sure
all cost validation queries get to the CPU immediately upon
arrival, this work has a weight assignment that entitles it
to nearly 90% of the CPU. Running at the same time are data
mining and business intelligence applications, which together
consume almost 80% of the platform resources. Yet because
speed of response is less critical, these applications have
a much lower weight assignment, and thus a much lower priority.
This pushes their initial allocation of system resources down
into the single digits and reduces their frequency of CPU
access.
This pattern, repeated at almost every Teradata active data
warehouse site today, offers very high access to resources
for high-priority queries, even though they will only use
a fraction of what is offered, then gives the lower priority
work the leftovers. In the fictional zoo cage, this is the
equivalent of offering 90% of the food to the birds and only
10% to the lions.
Although it seems odd, this intentional over-allocation of
resources works well on Teradata systems for achieving fast
access to CPU for things such as tactical queries. That's
because tactical query applications are only one of many applications
active on the platform, and by themselves they simply can't
use all the resources they are being offered, just as birds
can't possibly consume 500 pounds of raw meat a day.
In the active data warehouse, unconsumed resources flow immediately
to the lower priority work.
Carnivorous queries
Most data warehouses have a query category that soaks up resources
like a sponge. This might be a query that performs a high
number of CPU operations for each row read or one that is
just poorly written. Because such queries can eat up more
CPU without the interruption of performing an I/O, as compared
to other more balanced work, they often appear to dominate
platform resources. With Teradata Priority Scheduler, these
queries can be, and often are, placed in a lower priority
group. If the users can be identified, then they can be assigned
their own low priority group.
Another approach is being used at a retailer that uses a
Teradata Warehouse. This retailer gives all its MicroStrategy
users the same high priority, but then uses Teradata Priority
Scheduler's milestone limits to demote the longer-running
queries. By using milestone limits, which accumulate CPU usage
by session or by query, each query at this site is allowed
to use up to one second of CPU at the higher priority level
before being demoted to a lower priority. This allows all
queries, short or long, to enjoy a performance benefit when
they first begin, while only queries that need more than one
second of CPU experience the drop in performance that comes
with the lower priority.
Seasonal feeding patterns
A mixed workload is truly mixed—a hodgepodge of variety
and change. In fact, it is the nature of mixed workloads to
be constantly changing in terms of balance, content and workload
priority. Just as animal eating patterns can be linked to
seasons or times of day, patterns in the type of work running
on an active data warehouse platform can be linked to business
events that take place at regular, predictable intervals.
Reports that might have low priority throughout the week could
become the most important work in the entire company on Monday
morning.
Many Teradata users have analyzed the shifting patterns of
resource usage and priority on their platforms and developed
multiple "processing windows," each representing
a different priority profile. A simple example of a set of
processing windows designed by a Teradata customer is shown
in "Window of opportunity."

Population control
Just as animals reproduce, so do queries in a successful data
warehouse. Teradata Dynamic Query Manager offers something
similar to birth control for data warehouse queries. By differentiating
between queries that are short and sweet and ones that are
slower and more complex, the DBA can place a limit on how
many massive queries can be active at any point in time. Restricting
concurrency levels for these queries allows the other part
of the mix (low-volume, quick turnaround work) to be amplified
and hastened. Most mixed work running on Teradata has some
percentage of these slow-moving queries; as such, population
control may be desirable for maintaining overall balance.
Teradata users sometimes like to keep a low head count for
queries running in a very low priority group. Substantive
queries that are slowed down by having a very low priority
may become glacier-like, inching along in slow motion while
at the same time holding on to system resources. Restricting
the number of very large and very slow-moving queries that
can be active at the same point in time is a technique for
increasing overall throughput in the system. It also helps
ensure that the low-priority work already running is turned
around without undue delay. Population control is immediately
reversible by removing the restriction, giving the administrator
a clear edge over the zookeeper in this regard.
Using Teradata Dynamic Query Manager, you can set limits
for any priority group. For example, you can keep the data
mining priority group down to only three or four queries active,
or you can restrict very complex decision support queries
to 10 at a time, if you wish.
Balanced ecosystem
The challenges faced by a single-cage zookeeper and the manager
of an active data warehouse are similar. Both need to be sensitive
to the different needs—and potential conflicts—of
those under their control. Both need to be aware of the tempo
and direction of change, and both need to make sure appropriate
processes are in place to monitor and re-balance the environment.
But unlike the zookeeper, the active data warehouse manager
has ever-improving tools and facilities at his service to
enable responsible and proactive administration of the platform.
With workload management routines firmly in place, you'll
never have to crack your whip again. T
Traffic
cops keep the work moving |
When bicycles
are the only vehicles out on the streets, you
don't need speed limits and carpool lanes.
But when trucks, buses, cars and bicycles compete
for space, it would be difficult to keep traffic
flowing without controls in place. That's
precisely why an active data warehouse requires
greater workload management than its predecessor,
the traditional decision support system.
Teradata Dynamic Query Manager
(DQM) and Teradata Priority Scheduler manage mixed
workloads in two ways. Teradata DQM evaluates
queries before they start and chooses to run,
delay or reject them based on administrator-specified
criteria. In contrast, Teradata Priority Scheduler
divides up the available resources among work
that is already running.
The manager
Teradata DQM lets you manage access to the Teradata
Warehouse by controlling which queries, and what
portion of them, will be allowed to begin execution.
It does this by examining log-on and query requests,
and comparing the objects referenced in the requests
to DBA-defined restrictions.
DBAs can place limits on the
concurrency of specified groups of users, similar
to metered freeway access where stoplights at
the top of on-ramps manage the rate at which new
traffic is allowed to merge in. Administrators
can also enact object restrictions based on static
criteria such as user IDs, tables, or date/time.
Teradata DQM controls which query requests run
and which do not, based on dynamic, query-dependent
criteria such as expected number of rows being
returned or estimated processing time. In addition,
the DBA can schedule requests to run periodically
or at off-hours.
The
scheduler
Using a weight-based approach to prioritization,
Teradata Priority Scheduler allows you to control
the frequency of access and amount of CPU allocated
to different applications. This facility does
not attempt to control what work is allowed to
begin execution; rather, it prioritizes work once
it is under way. Traffic engineers in San Francisco
can manipulate the number of lanes on the Golden
Gate Bridge that serve outbound versus inbound
traffic in such a way that morning rush hour has
more lanes coming into the city, and evening rush
hour has more lanes leaving. In the same way,
Teradata Priority Scheduler has facilities to
shift or reassign the same pool of resources,
based on changing patterns throughoutthe business
day.
Some Teradata users rely on
Teradata Priority Scheduler to contain applications
that are capable of consuming very large amounts
of resources, such as data mining, so that they
have less influence on the performance of other
active work. Other users have chosen to give a
high priority to short-running, tactical queries,
so that they will complete quickly. Still others
use Teradata Priority Scheduler to divvy up resources
based on subdivisions within the data warehouse
community.
|
|
Carrie Ballinger,
a Teradata senior technical consultant, works in the Active
Data Warehouse Center of Expertise in El Segundo, Calif.
|