Mixed workload management helps traffic jobs for more efficient processing.
by Neil Raden
The gap between analytical and operational processes is converging. This is
changing the mission of data warehouses by adding the requirement of absorbing
data in near real time and diversifying the types of queries in the workload.
Fresher data enables dynamic operational processes, but adding that to all of
the legacy analytical functions is not trivial. Balancing the mixed workload of
high-volume simple queries, long-running complex queries, near real-time
updating and other functions is a huge challenge.
The implications are clear. A fast database and good DBAs, who are always in
short supply, no longer suffice. DBAs simply can't respond quickly enough.
Their role is changing too, from manual intervention to modeling the workload
so algorithmic processes can take over in real time. A people-machine
cooperative is required to manage the mixed workload with optimal performance,
with humans defining the problem and its variables, and computers firing the
rules.
Alternate routes
An optimization model's usefulness is measured by how well it identifies the
variables of the problem and the dependencies between them. For a single
database application, the complexity and rate of change don't allow time to
build a good optimization model. Instead, organizations usually deploy one or
more of the following alternatives:
|
Excess capacity. Creating capacity a safe margin above expected peak
usage provides some assurance that a spike in use will not cause poor
performance. This is expensive and not completely reliable.
|
|
Limitations. Data governors limit access to the resources, place time
constraints on queries or even scale down when access is available. For most
organizations, this is unacceptable.
|
|
Isolation. Separating processes on different platforms is a common
method for managing workloads. Individual applications are more manageable in
isolation and allow for more accurate planning and better performance. Mixing
multiple applications on the same platform often results in unpredictable
combinations. This is even true in data warehousing where many enterprise data
warehouse (EDW) implementations separate the EDW from the data marts that drive
the business intelligence (BI) operations. In these scenarios, the excess
capacity plan and isolation method are combined. With separate data marts, each
subject area gets a server of its own, resulting in increased overall costs,
heightened complexity and vulnerability for multiple points of failure.
|
While isolation works for a single application, data warehouses exist to serve
the varied needs of people in organizations; thus, multiple applications are
inherent. Clear drawbacks to the isolation approach include a constellation of
servers and data-base licenses, data inconsistency, data latency and additional
application interfaces—all of which cost time and money.
The operational data store (ODS) was devised to provide users more timely data,
especially operational data that was not integrated into the data warehouse.
For example, a common ODS application might provide live data for customer
touchpoints such as a call center or Web site. All of the drawbacks of an
isolation optimization solution are endemic in the ODS approach: expense,
interfaces, inconsistency and latency.
The real-time, fine-grained operational data handled by an ODS can be
accommodated in an EDW that has adequate mixed workload management
capabilities. The solution is a database platform that can provide predictable
performance and scalability while handling a variety of user types, query types
and loads. (See figure.) A data warehouse designed to accommodate the most
detailed data and allow for required periodic, intra-day updates can eliminate
the need for ODS structures and associated programs.
Save space, time and money
Certainly, managing data loads and job queues are important for an efficient
and productive system, but more accurate data, better time management and
greater return on investment are some underlying benefits of a mixed workload
management solution.
Mixed workload management facilitates consolidation in many separate database
servers and interface programs. In addition, both logical and physical database
schemas can be consolidated into a single model. This eliminates the need for
costly maintenance and enhancement activities in an environment with many
strong dependencies. A significant portion of a maintenance budget can be
exhausted in researching, developing and testing changes in complicated,
workload isolation environments.
With data warehouses expected to provide 24x7 performance and data load cycles
disappearing, performance bottlenecks can be problematic. For an organization
to maintain its service level agreements (SLAs), warnings of these job holdups
coupled with programmatic suggestions for resolving them are necessary. DBAs no
longer have the time to study logs and query system tables to analyze
performance problems. A dynamic, model-based approach to workload management
not only ensures SLAs will be met, but also enables SLA negotiations based on
realistic expectations.
Managing workloads can also help the company's bottom line. Since jobs are
efficiently and systematically maneuvered through mixed workload management, it
is likely that a company's resources can be more fully leveraged and system
expansions more accurately predicted without the need for expensive buffers of
excess hardware or to implement usage restrictions. In addition, dynamic
optimization capabilities, especially in mixed workload data warehouses, can
provide for better utilization of hardware and avoid costly upgrades.
Dynamic navigations
The alternative to the isolation approach of managing jobs is with a dynamic
mixed workload management tool—but this also presents an optimization problem.
All relational databases have query optimizers, but they are limited to finding
the most efficient way to resolve one query without considering the entire
system. It's not unlike finding the fastest route home without considering the
traffic, weather or time of day. When a data warehouse serves only a single
purpose, such as production reporting or feeding data marts, query optimizers
often suffice. The system workload is more or less predictable and the
optimization schemes are basically alike. Systems tuners and DBAs can configure
a database to perform well and predictably in these cases because of the
homogeneity of the workload.
However, data warehouses used for BI purposes typically handle multiple
applications and are relied on by a variety of users. So a narrowly focused
query optimizer is insufficient for the different query types and sizes. To be
effective, the workloads must be processed based on criticality, size or time
of day.
Current workload management processes consider the running and queued jobs by
their pre-assigned priorities, and they weigh that against in-process
utilization. In data warehousing, true dynamic mixed workload management
assigns resources based on the type of work being performed and reassigns
resources, making almost instantaneous judgments based on the model and
corresponding rules.
Because only the database platform understands the characteristics of the work
in progress and in queue, to be useful, it must detect and address problems
autonomously, without assistance from console operators. A successful
optimization scheme operates as a closed loop when:
|
A model of the expected workload is developed
|
|
Rules apply that model to the environment as it operates, determining resource
entitlements and priorities
|
|
Workload metrics are gathered, such as service levels, uptime percentages,
turnaround times and any metrics the system operators determine are useful in
demonstrating data compliance and diagnosing problems
|
|
Metrics are evaluated and modifications are made to the model and rules
|
Next in line
Mixed workload management is an optimization solution to a difficult computer
science problem. It removes several technical barriers to having one database
performing multiple tasks as efficiently as possible.
Blended analytical and operational processing means more timely access and use
of information. More users vying for answers provided through BI means a
greater need for data warehouse scalability and diversity. A dynamic mixed
workload management capability creates value in two directions: cost savings
and enablement of other high-value applications that had not been possible
before. Managing a mixed workload effectively is the first step in closing the
loop between analysis and action.
Dynamic mixed workload management is a departure from the data warehouse's
original concept of isolation. This technique works well when there is time to
spare, but in a busy environment, there is none. Mixed workload management
helps organizations control time—or at least make better use of it. And that
spells success.
T
| Many ways to manage workloads |
|
Dynamic workload management solutions helps prevent runaway queries or system
overloads with rules that can filter jobs or throttle them up or down. For
starters, when queries or load jobs are submitted, they are identified as
belonging to an allocation group (AG). These fairly coarse-grained models can be
set up for finance, marketing or sales and have a pre-assigned "relative weight"
(i.e., priority) inside the relational database management system. AGs work well
with a wide range of queries; however, there are generally fewer than a dozen
groups, so always classifying users into the same AGs limits them to one
priority, even when they may have widely varying queries to submit.
When setting up the workload management rules, the system administrator can
utilize these AGs. Among other actions, the rules can limit the number of
concurrent queries allowed from a single AG, either absolutely or based on the
time of day; restrict the size of answer sets; and stop queries from initiating a
full-table scan.
This automated exception handling does what a team of experts cannot—it makes
changes hundreds of times per second. And because it's a dynamic optimization,
the workload manager can allocate 100% of the resources to low-priority tasks
when nothing else is running, then switch resource allocations when high-priority
tasks arrive.
A technique helpful to companies with stringent service level goals is allowing
high-priority tasks a "fast path" through the system. The workload manager
analyzes each database request before and during execution, and it assigns
priorities based on the rules. As "fast path" jobs appear in the queue, the
workload manager assigns them the highest priority and sets lower priorities for
reporting and data.
The Priority Scheduler solution from Teradata is a good example of workload
management in action. Another example, Teradata Active System Management, a more
active process, classifies tasks on the actual query. This fine-grained
classification means a user (or another application) can first submit a
complicated query, then submit a second query with each task assigned to a
different priority queue.
—N.R.
|
|
Neil Raden, co-founder of Smart (enough) Systems, is a consultant and industry
influencer in BI, analytics and decision services. His book, "Smart (Enough)
Systems: How to Deliver Competitive Advantage by Automating Hidden Decisions,"
was co-written with James Taylor and released by Prentice Hall. He can be
reached at neil@smartenoughsystems.com.
Teradata Magazine-December 2008
|