Teradata Magazine Cover Teradata Magazine Online  
Register Help Password
Password:
Quick Links
Current Issue
Archives
Teradata.com
Teradata Magazine Rss Feed
ARCHIVES Search Teradata Magazine Online:  





















Teradata is fault resilient so recovery from a system outage is automatic and fast.








































































We will continue to remove or reduce all sources of planned and unplanned outages.



Teradata: It’s there when you need it

by Todd Walter

When I started in the data warehouse field, expectations were low. A successful warehouse had a small number of users who were doing long-range analysis. If the system was unavailable, a few people were unhappy, but the business as a whole did not suffer.

Data warehousing has evolved to a very different level of visibility. Today, a successful active data warehouse has tens of thousands of users with business-critical access to the warehouse. The organization’s tactical decision making—the daily or even up-to-the-minute interactions with customers, supply chain, personnel and financials—depends on data from the warehouse. The technology managing the data and supplying the organization with the answers it needs to do business must deliver a very high level of reliability and availability.

How does Teradata do this?
Teradata has two basic philosophies that drive the product and its functions.

First, we acknowledge that parts fail. If we simply say, “the system failure couldn’t be avoided because the part failed,” then the overall availability of the system is no better than the sum of the failure rates of all of its parts. Since we acknowledge and expect failure, we design the appropriate level of redundancy, failover and recovery into areas where failures occur, minimizing the failure of a single part as a cause for a system failure.

Second, we consider availability from the user’s point of view. System vendors have traditionally viewed and measured availability of each major component of the system separately—platform, disks, database, interfaces and utilities. Users don’t care which component isn’t working if they can’t get to their data. End-to-end availability of the Teradata components is measured to ensure that we see what the users see.

Is Teradata fault tolerant?
Fault tolerance is defined as the ability to provide continuous service by hiding the failure of one or more components (Gray & Reuters). By that definition, Teradata is not 100% fault tolerant. However, the hardware is built with fault-tolerant parts, and the software is designed to be fault tolerant. Teradata is becoming more fault tolerant while maintaining a reasonable balance between cost and system availability. Teradata also is fault resilient so that recovery from a system outage is automatic and fast.

What parts are protected by redundancy?
Dual controllers and dual point-to-point cabling protect the disk I/O paths from component failure. BYNET is fully fault tolerant, reconfiguring the network automatically around any failure so messages still get delivered. And there are always two BYNETs in a configuration so that one can fail without any interruption in service. All redundant components in a sub-system are 100% active, delivering maximum performance during normal operation.

The cabinets are also reliable. Redundant fans and power supplies ensure that little things don’t cause big problems. Power is a leading cause of system failures; dual AC power from separate sources minimizes this risk.

What happens if a processing node fails?
Teradata’s parallel architecture ensures that part of every request is executing on every node. When a processing node fails, all work is affected. This is an area where Teradata is fault resilient rather than fault tolerant. Our nodes are achieving such a large mean time between failures (MTBF) today that this is a rare occurrence.

If a node fails, the rest of the system is immediately notified and cycles through a recovery process. All requests are halted and rolled back. All applications and users are notified of the status of their outstanding requests so that appropriate action can be taken. Teradata initiates a system reallocation process called “VPROC migration,” which moves units of parallelism from the failed node to an operational one. This fully automatic procedure requires only two to three minutes on current-release hardware and software, after which all data is available to all users for all tasks.

How does Operations manage all of these components?
The Teradata platform comes with an administrative workstation (AWS). The AWS brings together command and control for all components of the system into a single user interface that displays any level of system status—overall system health at the highest level to drill down to an individual failed component. The AWS eliminates the requirement to manage the parallel processing platform as a set of individual servers by providing a single system view of all components. Further, the AWS can communicate its knowledge to the rest of the world, such as other server management tools, operations or DBAs via pager, Teradata service center or the organization’s service method of choice.

How does VPROC migration work?
Processing nodes in a Teradata configuration are organized into groups called “cliques.” A clique is a set of processing nodes that have full I/O interconnection to a set of disk arrays. In the case of a node failure, other processing nodes within the clique have full connectivity to the disk storage that was utilized by the failed node. A typical clique is two to four nodes. Larger systems are made up of multiple cliques.

VPROC migration takes advantage of the clique structure. During recovery and re-allocation, units of parallelism (AMPs) that were running on the failed node are re-instantiated on surviving nodes in the clique. AMPs are reallocated evenly across the other nodes to balance the performance impact of the failed node. In a four-node clique, for instance, the system experiences a 25% power loss because each of the other nodes receives an equal portion of the failed node’s processing.

VPROC migration is often confused with clustering or simple failover implemented at the operating system level. It is actually quite different because it is implemented within the DBMS itself. This architecture ensures that recovery is fully automatic and fully integrated with the database recovery. Several solutions on the market offer only single node failover, resulting in 50% degradation in performance during the time a failed node is out of the configuration. All of the other solutions depend on separate operating system clustering or failover functionality, requiring detailed system level setup and administration at significant additional cost.

What about disk failure?
Data warehouses require many disks to store the vast amounts of data required by today’s organizations. The lifetime of a single disk is very long, but when systems have a large number of disks, disk failure is to be expected. We rely upon our strong disk array partners LSI Logic and EMC, both leaders in supplying high-availability disk array solutions to customers. RAID technology allows single disks to fail without impact to any user. These vendors also provide many other data integrity and availability features that ensure data on disk is protected from loss.

What if a disk array or RAID 1 pair fails?
This type of failure is unlikely but not impossible. For business-critical data, all other vendors cover this eventuality with dual systems. Teradata offers a unique feature called fallback that allows protection from these events to be built into a single configuration. Fallback can be chosen as an option for the entire system or for any table or set of tables that require extreme protection. Failure of even an entire disk array will result in only a two to three minute outage, after which all fallback protected data will be available to all users for all tasks and applications. Recovery after the failed components are replaced is automatic and contained within the system, without the need for tapes or extended complex procedures.

How does fallback work?
All system configurations have two hash location maps: one to determine the primary location for each row, and a fallback map, which determines a location for each fallback row. When the fallback option is chosen for a table, each row is written twice in the system as directed by the two different maps (with no change or effort in the ETL procedures). Map layout ensures that the fallback copy is on completely separate hardware (disk array, controller, node and clique) from the primary copy. If storage or accessibility of the primary copy fails, the fallback copy is still accessible. Teradata software knows the current configuration status. If some of the data is inaccessible, Teradata automatically answers questions from and routes updates to the fallback copy.

Ensuring that the fallback copy is on other hardware also protects the data from failures other than the disks or disk array. Controllers, cables, I/O drivers and even file system software can fail in destructive ways. A completely separate I/O stack ensures that the fallback copy is not equally corrupted.

If a critical failure occurs and the system switches to the fallback copy, a special recovery log is automatically invoked. When the components are repaired, the fallback copy and the recovery log are used to recreate the primary copy with any updates that occurred during the failure. This recovery is done automatically with the system online, with minimum oversight and without backups or other off-system media to return the system to full operation.

Fallback is not free, of course. A full copy of the data is needed for each fallback table requiring disk space. Updating or inserting a row costs more (but not double). Queries, however, experience no performance impact from having fallback enabled. The return for this investment is that a class of infrequent but very high impact failures is reduced to a very low impact from the user point of view.

What happens if the software fails?
VPROC migration, Teradata’s response to node failure, also covers operating system failure, an event often requiring a significant recovery and reboot time. Teradata initiates a quick VPROC migration to avoid the extended outage that would otherwise be experienced.

For database failures, every effort is made to isolate the impact to a single user or job. If the impact of the failure can be confined to a single query, Teradata will abort that request, clean up any residual effects and inform the user that the query requires the attention of the system administrator, all the while continuing to execute all other work on the system. If this is not possible, a full system reset is executed, rolling back all outstanding requests, informing all users of the status of their requests and bringing the system back to full operation automatically within two to three minutes.

What planned outages does Teradata require?
The repair process of a failed component must not make the system unavailable. Any redundant part is also hot-swappable (it can be replaced or repaired without taking the system offline), minimizing planned downtime for hardware repair.

Teradata utilizes a dual-boot environment to minimize software-install outages. All software distribution, loading, package adds and install verification are done against an inactive, secondary copy of the system software, without affecting the running system. Once all installation work is complete, a system reset is required. Resets after database code changes require only two to three minutes. Operating system updates require a reboot (15 to 30 minutes). No other downtime is required. Further, if there is an issue with the new software, returning to the previous version requires the same reset time as going forward, and it’s not necessary to reinstall the software.

Hardware additions or upgrades require a reconfig outage. Upon completion, added components are fully utilized to deliver scalability to users with no additional DBA work. Recent software changes have shortened the average elapsed time significantly, and more work is in progress.

Teradata long ago eliminated many planned database outages. By eliminating entire concepts like reorgs and index rebuilds, Teradata returned large amounts of available access time to users. Teradata utilities have always run with the system online: Load, export, bulk update, continuous update and backup are all online utilities that do not require planned outages. The utilities have checkpoint restart code built in so that if any of them are affected by any failure, they start from where they left off and finish the job—no downtime to clean up and recover the process.

What is the future of Teradata system availability?
First, we will continue to remove or reduce all sources of planned and unplanned unavailability time for single-system Teradata instances. There are always more things we can do in both the platform and the software to ensure that a single Teradata system delivers the highest possible overall availability.

Beyond that, there are some fundamental changes occurring in the industry as the warehouse becomes more mission critical. The applications and tools surrounding the warehouse are beginning to understand lessons long ago learned by operational systems. Applications and tools need to be fault resilient or fault tolerant. Even if the warehouse is up, users still do not have service when the application server is down. More warehouse applications will be built in the style of operational applications, using EAI middleware, redundant servers and failover capabilities.

Finally, the most mission-critical warehouses will require dual, geographically separate systems to cover extremely low probability but extremely high-impact disaster scenarios. These will have to be dual active to justify their cost. The technologies ensuring that data is up-to-date on both systems and that users get the service they require will have to be extremely robust to deal with the update volumes and the user volumes of the active data warehouse.

Conclusion
The active data warehouse brings together users from all areas of the business who rely on an enormous amount of up-to-date data. Users are dependent on getting any answer to any question at any hour of the day or night on any day of the week. The technology used to embrace the mountain of data and the crush of users must be capable of delivering service at all times. Many leading-edge companies have entrusted business critical functions to their Teradata Warehouses. Teradata has been designed from the ground up to deliver service at this level and is continuing to grow with the requirements of the active data warehouse. T

E-MAIL ME
Looking for answers to life’s mysteries? Or would you just like to know more about the Teradata Warehouse and related applications? Ask the Expert! E-mail questions and comments to Todd at: todd.walter@ncr.com

Photo by Alex Hayden




Copyright by Teradata Corporation 2001-2007.