How does Teradata do this?
Teradata has two basic philosophies that drive the product
and its functions.
First, we acknowledge that parts fail. If we simply say,
the system failure couldnt be avoided because
the part failed, then the overall availability of
the system is no better than the sum of the failure rates
of all of its parts. Since we acknowledge and expect failure,
we design the appropriate level of redundancy, failover
and recovery into areas where failures occur, minimizing
the failure of a single part as a cause for a system failure.
Second, we consider availability from the users point
of view. System vendors have traditionally viewed and measured
availability of each major component of the system separatelyplatform,
disks, database, interfaces and utilities. Users dont
care which component isnt working if they cant
get to their data. End-to-end availability of the Teradata
components is measured to ensure that we see what the users
see.

Is Teradata fault tolerant?
Fault tolerance is defined as the ability to provide continuous
service by hiding the failure of one or more components
(Gray & Reuters). By that definition, Teradata is not
100% fault tolerant. However, the hardware is built with
fault-tolerant parts, and the software is designed to be
fault tolerant. Teradata is becoming more fault tolerant
while maintaining a reasonable balance between cost and
system availability. Teradata also is fault resilient so
that recovery from a system outage is automatic and fast.

What parts are protected by redundancy?
Dual controllers and dual point-to-point cabling protect
the disk I/O paths from component failure. BYNET is fully
fault tolerant, reconfiguring the network automatically
around any failure so messages still get delivered. And
there are always two BYNETs in a configuration so that one
can fail without any interruption in service. All redundant
components in a sub-system are 100% active, delivering maximum
performance during normal operation.
The cabinets are also reliable. Redundant fans and power
supplies ensure that little things dont cause big
problems. Power is a leading cause of system failures; dual
AC power from separate sources minimizes this risk.

What happens if a processing node
fails?
Teradatas parallel architecture ensures that part
of every request is executing on every node. When a processing
node fails, all work is affected. This is an area where
Teradata is fault resilient rather than fault tolerant.
Our nodes are achieving such a large mean time between failures
(MTBF) today that this is a rare occurrence.
If a node fails, the rest of the system is immediately
notified and cycles through a recovery process. All requests
are halted and rolled back. All applications and users are
notified of the status of their outstanding requests so
that appropriate action can be taken. Teradata initiates
a system reallocation process called VPROC migration,
which moves units of parallelism from the failed node to
an operational one. This fully automatic procedure requires
only two to three minutes on current-release hardware and
software, after which all data is available to all users
for all tasks.

How does Operations manage all
of these components?
The Teradata platform comes with an administrative workstation
(AWS). The AWS brings together command and control for all
components of the system into a single user interface that
displays any level of system statusoverall system
health at the highest level to drill down to an individual
failed component. The AWS eliminates the requirement to
manage the parallel processing platform as a set of individual
servers by providing a single system view of all components.
Further, the AWS can communicate its knowledge to the rest
of the world, such as other server management tools, operations
or DBAs via pager, Teradata service center or the organizations
service method of choice.

How does VPROC migration work?
Processing nodes in a Teradata configuration are organized
into groups called cliques. A clique is a set
of processing nodes that have full I/O interconnection to
a set of disk arrays. In the case of a node failure, other
processing nodes within the clique have full connectivity
to the disk storage that was utilized by the failed node.
A typical clique is two to four nodes. Larger systems are
made up of multiple cliques.
VPROC migration takes advantage of the clique structure.
During recovery and re-allocation, units of parallelism
(AMPs) that were running on the failed node are re-instantiated
on surviving nodes in the clique. AMPs are reallocated evenly
across the other nodes to balance the performance impact
of the failed node. In a four-node clique, for instance,
the system experiences a 25% power loss because each of
the other nodes receives an equal portion of the failed
nodes processing.
VPROC migration is often confused with clustering or simple
failover implemented at the operating system level. It is
actually quite different because it is implemented within
the DBMS itself. This architecture ensures that recovery
is fully automatic and fully integrated with the database
recovery. Several solutions on the market offer only single
node failover, resulting in 50% degradation in performance
during the time a failed node is out of the configuration.
All of the other solutions depend on separate operating
system clustering or failover functionality, requiring detailed
system level setup and administration at significant additional
cost.

What about disk failure?
Data warehouses require many disks to store the vast amounts
of data required by todays organizations. The lifetime
of a single disk is very long, but when systems have a large
number of disks, disk failure is to be expected. We rely
upon our strong disk array partners LSI Logic and EMC, both
leaders in supplying high-availability disk array solutions
to customers. RAID technology allows single disks to fail
without impact to any user. These vendors also provide many
other data integrity and availability features that ensure
data on disk is protected from loss.

What if a disk array or RAID 1
pair fails?
This type of failure is unlikely but not impossible. For
business-critical data, all other vendors cover this eventuality
with dual systems. Teradata offers a unique feature called
fallback that allows protection from these events to be
built into a single configuration. Fallback can be chosen
as an option for the entire system or for any table or set
of tables that require extreme protection. Failure of even
an entire disk array will result in only a two to three
minute outage, after which all fallback protected data will
be available to all users for all tasks and applications.
Recovery after the failed components are replaced is automatic
and contained within the system, without the need for tapes
or extended complex procedures.

How does fallback work?
All system configurations have two hash location maps: one
to determine the primary location for each row, and a fallback
map, which determines a location for each fallback row.
When the fallback option is chosen for a table, each row
is written twice in the system as directed by the two different
maps (with no change or effort in the ETL procedures). Map
layout ensures that the fallback copy is on completely separate
hardware (disk array, controller, node and clique) from
the primary copy. If storage or accessibility of the primary
copy fails, the fallback copy is still accessible. Teradata
software knows the current configuration status. If some
of the data is inaccessible, Teradata automatically answers
questions from and routes updates to the fallback copy.
Ensuring that the fallback copy is on other hardware also
protects the data from failures other than the disks or
disk array. Controllers, cables, I/O drivers and even file
system software can fail in destructive ways. A completely
separate I/O stack ensures that the fallback copy is not
equally corrupted.
If a critical failure occurs and the system switches to
the fallback copy, a special recovery log is automatically
invoked. When the components are repaired, the fallback
copy and the recovery log are used to recreate the primary
copy with any updates that occurred during the failure.
This recovery is done automatically with the system online,
with minimum oversight and without backups or other off-system
media to return the system to full operation.
Fallback is not free, of course. A full copy of the data
is needed for each fallback table requiring disk space.
Updating or inserting a row costs more (but not double).
Queries, however, experience no performance impact from
having fallback enabled. The return for this investment
is that a class of infrequent but very high impact failures
is reduced to a very low impact from the user point of view.

What happens if the software fails?
VPROC migration, Teradatas response to node failure,
also covers operating system failure, an event often requiring
a significant recovery and reboot time. Teradata initiates
a quick VPROC migration to avoid the extended outage that
would otherwise be experienced.
For database failures, every effort is made to isolate
the impact to a single user or job. If the impact of the
failure can be confined to a single query, Teradata will
abort that request, clean up any residual effects and inform
the user that the query requires the attention of the system
administrator, all the while continuing to execute all other
work on the system. If this is not possible, a full system
reset is executed, rolling back all outstanding requests,
informing all users of the status of their requests and
bringing the system back to full operation automatically
within two to three minutes.

What planned outages does Teradata
require?
The repair process of a failed component must not make the
system unavailable. Any redundant part is also hot-swappable
(it can be replaced or repaired without taking the system
offline), minimizing planned downtime for hardware repair.
Teradata utilizes a dual-boot environment to minimize software-install
outages. All software distribution, loading, package adds
and install verification are done against an inactive, secondary
copy of the system software, without affecting the running
system. Once all installation work is complete, a system
reset is required. Resets after database code changes require
only two to three minutes. Operating system updates require
a reboot (15 to 30 minutes). No other downtime is required.
Further, if there is an issue with the new software, returning
to the previous version requires the same reset time as
going forward, and its not necessary to reinstall
the software.
Hardware additions or upgrades require a reconfig outage.
Upon completion, added components are fully utilized to
deliver scalability to users with no additional DBA work.
Recent software changes have shortened the average elapsed
time significantly, and more work is in progress.
Teradata long ago eliminated many planned database outages.
By eliminating entire concepts like reorgs and index rebuilds,
Teradata returned large amounts of available access time
to users. Teradata utilities have always run with the system
online: Load, export, bulk update, continuous update and
backup are all online utilities that do not require planned
outages. The utilities have checkpoint restart code built
in so that if any of them are affected by any failure, they
start from where they left off and finish the jobno
downtime to clean up and recover the process.

What is the future of Teradata
system availability?
First, we will continue to remove or reduce all sources
of planned and unplanned unavailability time for single-system
Teradata instances. There are always more things we can
do in both the platform and the software to ensure that
a single Teradata system delivers the highest possible overall
availability.
Beyond that, there are some fundamental changes occurring
in the industry as the warehouse becomes more mission critical.
The applications and tools surrounding the warehouse are
beginning to understand lessons long ago learned by operational
systems. Applications and tools need to be fault resilient
or fault tolerant. Even if the warehouse is up, users still
do not have service when the application server is down.
More warehouse applications will be built in the style of
operational applications, using EAI middleware, redundant
servers and failover capabilities.
Finally, the most mission-critical warehouses will require
dual, geographically separate systems to cover extremely
low probability but extremely high-impact disaster scenarios.
These will have to be dual active to justify their cost.
The technologies ensuring that data is up-to-date on both
systems and that users get the service they require will
have to be extremely robust to deal with the update volumes
and the user volumes of the active data warehouse.
 |

Conclusion
The active data warehouse brings together users from all
areas of the business who rely on an enormous amount of
up-to-date data. Users are dependent on getting any answer
to any question at any hour of the day or night on any day
of the week. The technology used to embrace the mountain
of data and the crush of users must be capable of delivering
service at all times. Many leading-edge companies have entrusted
business critical functions to their Teradata Warehouses.
Teradata has been designed from the ground up to deliver
service at this level and is continuing to grow with the
requirements of the active data warehouse. T
E-MAIL ME
Looking for answers to lifes mysteries?
Or would you just like to know more about the Teradata Warehouse
and related applications? Ask the Expert! E-mail
questions and comments to Todd at: todd.walter@ncr.com
Photo by Alex Hayden