Teradata Dual Active Solution ensures a smooth ride.
by Stan Mlynarczyk, Rick Stellwagen and Imad Birouty
Today's data warehouses are expected to have much higher availability than
those of the past. No longer simply back-end systems supporting a few power
users, data warehouses are now operational systems that support hundreds,
sometimes thousands of users in their daily tasks. Companies have come to
realize just how important their data warehouses are to their businesses, not
only for strategic decisions but for daily tactical decisions as well. Service
disruptions to the data warehouse can translate into serious disruption to the
organization.
As a result, a high level of continuous availability to users is now a
requirement, and more and more companies are moving to dual-system environments
to ensure that their service level agreements (SLAs) are met. Additionally,
companies employing dual systems want to maximize the use of both systems for
production workloads, which requires workload balancing and system-to-system
capacity adjustments. With dual systems becoming mainstream, it is important to
have a product that allows powerful yet simplified monitoring, administration
and control of this environment.
Coordinating efforts
In a single-system environment from Teradata, many tools allow administrators
to monitor the system's status and take action as necessary. Here, "system" is
defined as a set of coordinated servers, software and processes that accomplish
a common goal. The same is true in a dual-system environment. While each system
runs independently of the other, the goal of a dual-system environment from
Teradata is to have coordination between the two—and their supporting servers
and processes—to accomplish a common goal of meeting pre-defined service levels
for availability and disaster recovery. To achieve this level of coordination,
the systems must have these capabilities:
 |
Monitoring. The two Teradata systems must be monitored along
with their supporting servers and processes; in effect, the entire data
warehouse environment. This includes tracking the operational state of each
system/server, the start/stop/completion status of processes such as data
loading and the currency of data between the two Teradata systems. It is
important to monitor these and other parameters to understand whether a given
system is operational and has the required data to service a query. Knowing
what is happening in the environment is essential before any action can be
taken.
|
 |
Administration. Invoking changes to query routing rules, for
example, is required to maintain SLAs for certain classes of users. It does not
matter whether these changes are manual or automated, as long as the ability to
administer the changes exists.
|
 |
Control. A dual-system environment has the ability to start
and stop processes, take a system down for planned maintenance or re-engage a
system after an unplanned outage.
|
While monitoring, administration and control are described as distinct
capabilities, they are in reality used in a coordinated fashion to understand
what is happening within the dual-system environment and, consequently, to take
the desired actions.
Automated management
Teradata Multi-System Manager is part of the Teradata Dual Active Solution and,
as its name suggests, is designed to handle multiple Teradata systems. The
following set of features enables coordination between systems and allows an
organization to manage its data warehouse environment from a single console at
the system and application levels:
 |
A unified view of a multi-system environment. A multi-system environment
includes supporting servers for data loading, query routing and applications. A
central console enables a view of the system's status and key events within the
environment—everything from a summary view of the components to a detailed
drill-down of which processes are running or have been completed on each
system. (See figure 1, above.) In addition, the summary view provides
feedback on the health and operational status of all components.
|
 |
Monitoring of servers, processes and applications. Beyond a unified view,
intelligent monitoring of numerous events within the environment is provided,
including detection of system operational status and the per-system completion
status of processes such as load jobs.
|
 |
Simple point-and-click controls. Features include a graphical user interface
portal with easy-to-use controls to manage query routing rules, engage or
disengage a system for planned maintenance, or bring a system back online and
up to date after an unplanned outage.
|
 |
User routing based on application readiness state. Sophisticated controls
manage which applications and users can access which system based on its
particular data currency and process completion. This capability ensures that a
system is available and has current data before allowing application and user
access. (See figure 2.)
|
 |
Monitoring thresholds and discrepancy reporting. Managing multiple loosely
coupled systems requires a level of control as to how current the data is
between the systems. Depending on how the systems are used and which users are
accessing them, some amount of data-drift can be tolerated. Anything beyond
this acceptable amount will require the synchronization lag to be tightened.
Alerts will indicate when the thresholds are crossed. The system can also be
programmed to take pre-defined action when this occurs.
|
 |
Graceful failover and failback during planned and unplanned outages. When the
goal is continuous access for users, one of the most valuable features in
managing a dual-system environment is the orchestration of events to handle
system outages. This requires shifting workloads and users between systems
during planned and unplanned outages. These different scenarios have varying
requirements to handle the multiple permutations.
|
Flying high
To accomplish these tasks, Teradata Multi-System Manager works with other
dual-system products and processes such as Teradata Query Director and dual
loading of data. It controls the Teradata Query Director profiles and makes
modifications on the fly to accommodate system status changes. For example, if
System A encounters an unplanned outage, the Query Director profile is changed,
which, in turn, will engage a new set of routing rules. The new rules might,
for instance, limit routing to only users with strict SLAs.
During the dual-loading process, the system monitors the start/stop/completion
status of load jobs. It keeps track of which jobs have completed and which have
not, and it runs synchronization checks on the tables to ensure that all of the
data was loaded properly and that the data between systems is in sync.
Teradata Multi-System Manager can also be used in a single Teradata production
environment to monitor workflows among the many application and load processes.
This will benefit Teradata customers who plan to move from a single production
system to a dual-system environment in the future.
With dual systems becoming mainstream, it is important to have a solid product
that simplifies monitoring, administration and control of the environment.
Teradata Multi-System Manager provides these capabilities with powerful
features that are efficient, dependable and simple to use.
T
| Relying on autopilot |
|
Monitoring and control are critical elements of a dual system, but no product previously existed that met
the requirements of this type of environment. So the Teradata Professional Services team created
Dual System Monitor and Control—the foundational architecture for Teradata Multi-System Manager.
In a dual-system environment, Dual System Monitor and Control can be compared to the instrument panel in
an aircraft's cockpit—both monitor and manage the system, both display multitudes of information, and
when problems occur, both activate autopilot.
The solution allows customers to monitor the status of the dual-system environment, including load jobs,
system alerts, and data and application states. It provides control over the software, processes and
servers that comprise the dual-system environment, and sets limits on the users and applications
that route between the systems.
Various levels of detailed information enable users to see that their applications are operational
and performing as expected. A customer's load architecture is analyzed to determine how and where to
inject instrumentation points. Data loads are then instrumented to automatically broadcast their
progress. An alert is sent to operations when the data latency between the two systems is above
tolerance, and database administrators can be notified when data consistency issues are identified.
The system's critical threshold levels are based on service level agreements for application
availability, data latency and load and response times. When these thresholds are crossed,
appropriate alerts are generated and status changes are visually displayed.
These automated features make it possible for customers to easily handle a complex dual-system
data warehouse that must be duplicated for reliability and consistency while at the
same time be query-accessible.
So, like pilots who rely on their instrument panel, users can feel assured that the Teradata
Multi-System Manager will help control and enhance the performance of their
dual-system environment.
—Stan Mlynarczyk, director, Enterprise Architecture Center of Excellence, Teradata Professional Services
—Mark Mitchell, principal consultant, Teradata Professional Services
|
|
Stan Mlynarczyk is director of the Enterprise Architecture Center of Excellence
in Teradata Professional Services.
Rick Stellwagen is the engineering architect for Teradata multi-system
solutions.
Imad Birouty is program marketing manager for the high-availability solutions
and data mart consolidation program at Teradata.
Teradata Magazine-September 2008
|