Teradata Magazine Cover Teradata Magazine Online  
Register Help Password
Password:
Quick Links
Current Issue
Archives
Teradata.com
Teradata Magazine Rss Feed
ARCHIVES Search Teradata Magazine Online:  
TECH2TECH
Tech2Tech
table of contents


Ask the expert
A closer look at the availability features Teradata offers to support the lifecycle of your data warehouse as it matures into an integral part of the real-time enterprise.

Advanced analytics
Discover the strengths of Predictive Model Markup Language within Teradata's optimal analytic environment.

Getting to SOA
How to get the staff, infrastructure and other components in place to expose the business value of the data warehouse to the enterprise via a service-oriented architecture.

Tech support
Got questions? A Teradata Certified Master has the answers you need.



Printable versionPrintable version Send to a colleagueSend to a colleague

Data warehouse availability

Survival of the prepared.

A real-time enterprise (RTE) uses current facts from all aspects of the business to make operational and strategic decisions and to drive delays out of the processes that run the business. An active data warehouse is an integral part of the RTE IT infrastructure, delivering decision-making services to users in all areas of the enterprise and beyond.

But what happens if the data warehouse is not available? Will the business come to a screeching halt? Will important decisions be delayed or sales opportunities missed? Worse yet, will customers become dissatisfied and take their business elsewhere? Clearly, the business value is at risk if a failure occurs without a proper disaster recovery plan.

How available does a data warehouse need to be?
There is no single answer to this question. The business criticality of each application or group of users must be evaluated and protected at an appropriate level. If a basic reporting application slips occasionally, it may not have a negative impact on the business. But as a traditional data warehouse matures into an active data warehouse, required availability levels rise quickly. Business-critical operational processes demand that the warehouse be available continuously.

(For more details on data warehouse availability requirements, see "Out of hiding: Active Data Warehousing comes front and center" by Todd Walter.)

How are transactional systems different from a data warehouse?
Transactional applications tend to have a single, unchanging availability-level requirement. A successful data warehouse is continuously growing and changing, so it requires regular review to ensure the appropriate protection for the business. Steadily increasing requirements will cause the data warehouse implementation to move to new availability levels, often with different availability requirements for different applications.

How does Teradata address availability?
Today there are high expectations on even a traditional reporting data warehouse. Teradata's goal is to maximize single system availability and provide a multi-system approach when a single system cannot meet the enterprise's requirements. Teradata provides a foundation with many built-in and automatic high-availability features and builds upon it with additional optional features to balance system cost with the desired availability. Teradata's Dual System offer provides continuous operation for critical applications that cannot be offline even under extreme disaster conditions. (See figure 1 for a description of Teradata's different availability levels.)

What are the built-in availability features?
Redundant components throughout the server platform are included in the base configuration and are fully utilized during normal operation to maximize system performance. Software automatically manages component failover. Redundant components can be replaced while the system is running, minimizing repairs' impact.

Automatic failover for the software units of parallelism minimizes the impact of compute node failures. This "vproc migration" facility is standard and requires no extra software, installation or management.

Software installation is automated and performed with the system online except for a reboot to begin running the new version. Hardware upgrades are automated and downtime is repaid by the immediate and fully scalable utilization of the new capacity. If there is a failure, applications can automatically reconnect and utilities can use checkpoint logic to automatically restart.

(For more details on Teradata's built-in availability features, see "Teradata: It's there when you need it" by Todd Walter.)

What are the optional features for increasing platform and database availability?
There are two recent additions in this area:

Large cliques (up to eight nodes) reduce the overall performance impact when one node fails by distributing the down node's workload across a greater number of surviving nodes.

Hot standby node (HSN) allows an idle node to be configured within each clique. When another node fails, the HSN takes on all of its work, eliminating the extra burden on other nodes in the same clique. Further, when the failed node is repaired and returned to the configuration, it becomes the HSN, removing the need for a restart cycle to return the failed node to the configuration.

On any database system, major faults can occur in the I/O subsystem, leading to damaged or destroyed data and significant offline time. Teradata's unique "fallback" feature allows recovery from this type of rare, yet major, system failure to be a simple restart cycle and redistribution of work rather than an extended offline event.

These features are not without costs:

  • Large cliques require fiber channel-switch components
  • HSN requires the switches plus an extra node in each clique
  • Fallback requires room to store a second copy of the data and additional cost during update operations

Just keep in mind that the return on these investments comes from the additional system availability provided to users even in the face of serious failures.

(For more details on Teradata's optional features for increasing platform and database availability, read the white paper "Single System Availability Features" by Lynn Hedegard.)

With all these built-in features and options, why is anything else required?
Low-probability events such as natural disasters or multi-point concurrent failures can and do happen. A disaster protection and recovery plan must be put into place.

Options include but are not limited to the following:

  • Keeping a copy of the data offsite
  • Implementing a full-scale backup, archive and restore (BAR) plan
  • Maintaining multiple, geographically distributed systems

Whatever plan you choose must be based upon the business criticality and acceptable recovery time for users and applications within the enterprise.

When is BAR appropriate?
Protecting the data with a backup copy is acceptable for systems that can tolerate several days of offline time should an unlikely disaster or data loss occur. (This period of time may be significantly longer if new equipment must be acquired and installed.) An offsite copy can protect from building-level or area-level disasters. BAR is often used in conjunction with other high-availability solutions because it can also help businesses recover from application failures or human error.

When is a single system with backup not good enough?
Long systems outages are either planned or unplanned. Planned outages can include: power servicing, physical system relocation or major upgrades. Unplanned outages can include: floods, earthquakes, storms, fires, plumbing failures and power outages.

Understanding the business criticality of the system, knowing the outage time and estimating the time it would take to recover from a major outage-whether planned or unplanned-will make it clear when the business can no longer tolerate risk from these low-probability events. When that time comes, a dual- or multi-system implementation is required to provide continuity to the business.

How is the cost of business continuity balanced with the cost of the business impact?
There are many options for increasing availability beyond those that a single system can deliver. While outages that destroy the physical system generally require weeks or months to recover, some outages are recovered in hours or a few days. These include outages that leave the system and data intact but prevent them from being accessed.

If the business can tolerate a brief outage but cannot risk being offline for a week or more, then an offsite disaster recovery plan is indicated. When a disaster is declared, BAR tapes can be brought to a disaster recovery center, which then operates the data warehouse until the damaged or destroyed system is operational.

If the business cannot tolerate the risk of several days of outage but can tolerate some number of hours offline, a "warm" dual system option may be appropriate. A warm system is not used for normal operations but does include a copy of the data and is kept current and offsite. A warm system may be somewhat behind the state of the data on a production system and will require some catch-up to bring the data current. If a disaster occurs or a planned outage is approaching, the business-critical workload is switched over to the warm system; otherwise, it can be used as a certification or development system.

If the business cannot tolerate even a few hours of outage, a dual active system is indicated. A dual active system, located offsite, maintains an up-to-date copy of the data at all times. If a disaster, failure or planned outage occurs, the workload is routed to the redundant system.

Users may see failure of one piece of work or may see hesitation while the system is switching over, but otherwise the failure is masked from business-critical users. Extreme availability requirements may require more than two systems to provide the necessary business-critical service.

(For more details on Teradata's disaster recovery options, see "Are you ready for disaster?" by Margaret Mills.)

How is a business-critical data warehouse different from a transactional system?
Transactional systems typically support a single application or group of users, making it easier to determine the business criticality and the protection level. A mature data warehouse supports many applications and many groups of users who may have widely varying service-level and availability requirements. Each application should be reviewed independently and the underlying shared data identified and protected at the highest required availability level of the applications that use it.

In a transactional system, the workload is primarily performing data change operations, capturing the transactions of the enterprise. A dual system protecting a transactional system simply duplicates this work, providing no value beyond the protection it renders during a disaster.

What benefits come from these differences?
Because it is rare that all of the warehouse data must be protected at business-critical levels, a dual system can be implemented quite differently to protect only business-critical data, such as recent history.

One of the dual systems may be configured at a fraction of the size and capacity of the larger system, providing just enough resources to service all of the business-critical applications during a disaster. The configurations can also grow independently, based upon the requirements of the workloads.

Even when not being used during a disaster, a dual-system implementation provides significant additional business value. Since only a portion of the system resources-typically 20% to 25%-is utilized for data maintenance (contrary to the transactional system), additional data warehouse query workload can use the remainder of the system.

What does Teradata provide for a dual system implementation?
A dual (multiple) system implementation requires a number of components that have not typically been part of building a data warehouse in the past: query routing and data synchronization, for example.

Teradata Query Director, first seen in Teradata Warehouse 7.1, delivers the ability to route queries evenly among multiple Teradata systems, by percentage or by application. Teradata Replication Services will enable direct data synchronization with a number of options, including scope.

Other familiar technologies, such as table copy, triggers, archive/restore and journals, will continue to be available and are useful for supporting less critical needs for dual copies of the data.

For very high-volume incoming data, a dual-load architecture is indicated. Customers utilizing an extract, transform and load (ETL) tool can specify the workflow for dual-loading directly in the tool, e.g., split the feed, execute two load processes to the two Teradata systems and handle the error and recovery cases. Those using the Teradata load utilities directly will require some steps prior to the execution of the utility to manage the dual data streams.

(For more on Teradata's Dual Active Solution, see "Don't wait for a quake" by Margaret Mills and Bob Manning.)

Is there help available to get started or take the next step?
Teradata provides a number of services in the availability area, such as BAR evaluation, configuration, implementation and best practices. Usage of the data warehouse can be evaluated and business criticality of applications and data can be determined. Teradata provides a service for hosted-site disaster recovery. When a dual system is indicated, implementation services can deliver a solution, including customized management components.

Be prepared, or else have your resume in offline storage!
A successful data warehouse becomes a required part of the IT infrastructure as more applications and users come to count on it. A successful data warehouse becomes a business-critical component, especially when active data warehouse applications are deployed for the RTE.

A data warehouse team must be aware of the business criticality of the applications they support and must consistently review and update their implementation to match. Failure to do so will be a disaster not only for the warehouse but also for the team, the users and the business.

Teradata builds high availability into every aspect of the product. Built-in features provide a core, high-availability solution for all customers. Optional features match the availability of the system to the required availability of the applications while balancing the cost with the business risk. T

© Teradata Magazine-June 2005

RELATED LINKS:

Teradata Warehouse Solutions
Single System Availability Features


back to top




Copyright by Teradata Corporation 2001-2007.