Register | Log in


Subscribe Now>>
Home News Tech2Tech Features Viewpoints Facts & Fun Teradata.com
Insider's Warehouse
Download PDF|Send to Colleague

Taking turns

Mixed workload management helps traffic jobs for more efficient processing.

by Neil Raden

The gap between analytical and operational processes is converging. This is changing the mission of data warehouses by adding the requirement of absorbing data in near real time and diversifying the types of queries in the workload. Fresher data enables dynamic operational processes, but adding that to all of the legacy analytical functions is not trivial. Balancing the mixed workload of high-volume simple queries, long-running complex queries, near real-time updating and other functions is a huge challenge.

Traffic Circle

The implications are clear. A fast database and good DBAs, who are always in short supply, no longer suffice. DBAs simply can't respond quickly enough. Their role is changing too, from manual intervention to modeling the workload so algorithmic processes can take over in real time. A people-machine cooperative is required to manage the mixed workload with optimal performance, with humans defining the problem and its variables, and computers firing the rules.

Alternate routes
An optimization model's usefulness is measured by how well it identifies the variables of the problem and the dependencies between them. For a single database application, the complexity and rate of change don't allow time to build a good optimization model. Instead, organizations usually deploy one or more of the following alternatives:
Excess capacity. Creating capacity a safe margin above expected peak usage provides some assurance that a spike in use will not cause poor performance. This is expensive and not completely reliable.
Limitations. Data governors limit access to the resources, place time constraints on queries or even scale down when access is available. For most organizations, this is unacceptable.
Isolation. Separating processes on different platforms is a common method for managing workloads. Individual applications are more manageable in isolation and allow for more accurate planning and better performance. Mixing multiple applications on the same platform often results in unpredictable combinations. This is even true in data warehousing where many enterprise data warehouse (EDW) implementations separate the EDW from the data marts that drive the business intelligence (BI) operations. In these scenarios, the excess capacity plan and isolation method are combined. With separate data marts, each subject area gets a server of its own, resulting in increased overall costs, heightened complexity and vulnerability for multiple points of failure.

While isolation works for a single application, data warehouses exist to serve the varied needs of people in organizations; thus, multiple applications are inherent. Clear drawbacks to the isolation approach include a constellation of servers and data-base licenses, data inconsistency, data latency and additional application interfaces—all of which cost time and money.

The operational data store (ODS) was devised to provide users more timely data, especially operational data that was not integrated into the data warehouse. For example, a common ODS application might provide live data for customer touchpoints such as a call center or Web site. All of the drawbacks of an isolation optimization solution are endemic in the ODS approach: expense, interfaces, inconsistency and latency.

The real-time, fine-grained operational data handled by an ODS can be accommodated in an EDW that has adequate mixed workload management capabilities. The solution is a database platform that can provide predictable performance and scalability while handling a variety of user types, query types and loads. (See figure.) A data warehouse designed to accommodate the most detailed data and allow for required periodic, intra-day updates can eliminate the need for ODS structures and associated programs.

Save space, time and money
Certainly, managing data loads and job queues are important for an efficient and productive system, but more accurate data, better time management and greater return on investment are some underlying benefits of a mixed workload management solution.

Mixed workload management facilitates consolidation in many separate database servers and interface programs. In addition, both logical and physical database schemas can be consolidated into a single model. This eliminates the need for costly maintenance and enhancement activities in an environment with many strong dependencies. A significant portion of a maintenance budget can be exhausted in researching, developing and testing changes in complicated, workload isolation environments.

With data warehouses expected to provide 24x7 performance and data load cycles disappearing, performance bottlenecks can be problematic. For an organization to maintain its service level agreements (SLAs), warnings of these job holdups coupled with programmatic suggestions for resolving them are necessary. DBAs no longer have the time to study logs and query system tables to analyze performance problems. A dynamic, model-based approach to workload management not only ensures SLAs will be met, but also enables SLA negotiations based on realistic expectations.

Managing workloads can also help the company's bottom line. Since jobs are efficiently and systematically maneuvered through mixed workload management, it is likely that a company's resources can be more fully leveraged and system expansions more accurately predicted without the need for expensive buffers of excess hardware or to implement usage restrictions. In addition, dynamic optimization capabilities, especially in mixed workload data warehouses, can provide for better utilization of hardware and avoid costly upgrades.

Dynamic navigations
The alternative to the isolation approach of managing jobs is with a dynamic mixed workload management tool—but this also presents an optimization problem.

All relational databases have query optimizers, but they are limited to finding the most efficient way to resolve one query without considering the entire system. It's not unlike finding the fastest route home without considering the traffic, weather or time of day. When a data warehouse serves only a single purpose, such as production reporting or feeding data marts, query optimizers often suffice. The system workload is more or less predictable and the optimization schemes are basically alike. Systems tuners and DBAs can configure a database to perform well and predictably in these cases because of the homogeneity of the workload.

However, data warehouses used for BI purposes typically handle multiple applications and are relied on by a variety of users. So a narrowly focused query optimizer is insufficient for the different query types and sizes. To be effective, the workloads must be processed based on criticality, size or time of day.

Current workload management processes consider the running and queued jobs by their pre-assigned priorities, and they weigh that against in-process utilization. In data warehousing, true dynamic mixed workload management assigns resources based on the type of work being performed and reassigns resources, making almost instantaneous judgments based on the model and corresponding rules.

Because only the database platform understands the characteristics of the work in progress and in queue, to be useful, it must detect and address problems autonomously, without assistance from console operators. A successful optimization scheme operates as a closed loop when:
A model of the expected workload is developed
Rules apply that model to the environment as it operates, determining resource entitlements and priorities
Workload metrics are gathered, such as service levels, uptime percentages, turnaround times and any metrics the system operators determine are useful in demonstrating data compliance and diagnosing problems
Metrics are evaluated and modifications are made to the model and rules

Next in line
Mixed workload management is an optimization solution to a difficult computer science problem. It removes several technical barriers to having one database performing multiple tasks as efficiently as possible.

Blended analytical and operational processing means more timely access and use of information. More users vying for answers provided through BI means a greater need for data warehouse scalability and diversity. A dynamic mixed workload management capability creates value in two directions: cost savings and enablement of other high-value applications that had not been possible before. Managing a mixed workload effectively is the first step in closing the loop between analysis and action.

Dynamic mixed workload management is a departure from the data warehouse's original concept of isolation. This technique works well when there is time to spare, but in a busy environment, there is none. Mixed workload management helps organizations control time—or at least make better use of it. And that spells success. T

Many ways to manage workloads

Dynamic workload management solutions helps prevent runaway queries or system overloads with rules that can filter jobs or throttle them up or down. For starters, when queries or load jobs are submitted, they are identified as belonging to an allocation group (AG). These fairly coarse-grained models can be set up for finance, marketing or sales and have a pre-assigned "relative weight" (i.e., priority) inside the relational database management system. AGs work well with a wide range of queries; however, there are generally fewer than a dozen groups, so always classifying users into the same AGs limits them to one priority, even when they may have widely varying queries to submit.

When setting up the workload management rules, the system administrator can utilize these AGs. Among other actions, the rules can limit the number of concurrent queries allowed from a single AG, either absolutely or based on the time of day; restrict the size of answer sets; and stop queries from initiating a full-table scan.

This automated exception handling does what a team of experts cannot—it makes changes hundreds of times per second. And because it's a dynamic optimization, the workload manager can allocate 100% of the resources to low-priority tasks when nothing else is running, then switch resource allocations when high-priority tasks arrive.

A technique helpful to companies with stringent service level goals is allowing high-priority tasks a "fast path" through the system. The workload manager analyzes each database request before and during execution, and it assigns priorities based on the rules. As "fast path" jobs appear in the queue, the workload manager assigns them the highest priority and sets lower priorities for reporting and data.

The Priority Scheduler solution from Teradata is a good example of workload management in action. Another example, Teradata Active System Management, a more active process, classifies tasks on the actual query. This fine-grained classification means a user (or another application) can first submit a complicated query, then submit a second query with each task assigned to a different priority queue.

—N.R.

Neil Raden, co-founder of Smart (enough) Systems, is a consultant and industry influencer in BI, analytics and decision services. His book, "Smart (Enough) Systems: How to Deliver Competitive Advantage by Automating Hidden Decisions," was co-written with James Taylor and released by Prentice Hall. He can be reached at neil@smartenoughsystems.com.

Teradata Magazine-December 2008

Related Links

Reference Library

Get complete access to Teradata articles and white papers specific to your area of interest by selecting a category below. Reference Library
Search our library:


Protegrity

Teradata.com | About Us | Contact Us | Media Kit | Subscribe | Privacy/Legal | RSS
Copyright © 2008 Teradata Corporation. All rights reserved.