Tips to determine the proper availability for your mission-critical data warehouse.
by Imad Birouty with contributions from Christine Percopo
The role of the data warehouse and its importance to the daily operations of a business have evolved over time. The data warehouse not only
continues to deliver analytics to support strategic decisions but has also become an integral part of daily decision making by providing the
analytics necessary for front-line workers to make operational decisions. This evolution has brought with it a change in the availability
expectations of the data warehouse.
The data warehouse has traditionally been viewed as a non-mission-critical system in which system outages measured in days were not
catastrophic. On the other hand, operational systems that serve front-line workers have been treated as mission-critical, making outages
of any measure unacceptable.
As companies compete to maximize their profits, streamline their operations and serve their customers, they are recognizing the data warehouse
is mission-critical when it comes to making business decisions of all kinds. That means that companies now need to measure the impact of data
warehouse downtime in hours or minutes.
Every company must determine proper availability targets for its data warehouse based on business need. While there is no universal right
answer, there is one universal guideline: Data warehouse systems serve people who use applications and business processes to do their jobs.
Each user or group of users will have system availability requirements, outside of which the ability to do their jobs is adversely affected
and business productivity is lessened. As such, every data warehouse environment needs high availability.
Let's be clear: High availability does not necessarily equal 24x7. Your system can most likely be unavailable once a quarter for maintenance,
once a month for software patches or every night for batch loads. However, you probably cannot afford to have your system unavailable during
your prime business hours when key decisions are made, customers are served or revenue is generated; therefore, you do need high availability.
The first step in determining the needed level of availability for your data warehouse environment is working with your users to understand
and document their workflow requirements and how those requirements affect the business. With some data warehouses supporting hundreds of
applications and tens of thousands of users, this is clearly not a simple task, but it is the best place to start.
| enlarge |
|
Two separate systems can have the same availability percentage yet very different availability patterns.
User-experienced availability follows the operational state of a single system.
|
|
Different groups of users will have different requirements; some will be measured in days, others in hours and still others in minutes. Having
this level of information is a great benefit that allows you to selectively apply the right technology where needed. As an example, you may
choose to enable fallback protection only to select tables that serve high-value applications, thus consciously consuming system resources
consistent with business value.
Availability can be measured at many levels, including the operating system, database, application and user. The user is the ultimate consumer
of system resources, and it is at this level that availability impacts productivity and brings the most challenges. From an operational
perspective, there are several potential failure points outside the data warehouse, such as client hardware, client software, network
connections and the application itself. Potential failure points can be found inside the data warehouse as well, including failure of
hardware, disk arrays, connectivity and software.
Expressing availability requirements
The goal of providing high availability in the data warehouse environment is to ensure user productivity and uninterrupted business operations.
Thus, availability requirements should take into account the user, group of users or application, and time of day as well as the business
tolerance per individual incident.
The industry has traditionally used a single metric as a generally acceptable measure of availability as expressed in nines notation
(e.g., 99.xx%, with "xx" representing one or more digits of increasing accuracy). If no other metrics are available, this is an acceptable
place to start. However, we need to recognize this metric for what it represents and understand its limitations. This metric is a percentage,
which is derived from a fraction. Thus, a yearly total availability of 99.7% is really a fraction expressed as 99.7/100.0. The denominator
represents the total number of hours in a year (8,766) and the numerator represents the number of hours the system was operational and
available for processing. The resulting fraction would be 8,740/8,766.
This tells us that during the course of the year, the system was unavailable for processing for 26 hours, but it lacks important detail. Was
it 26 one-hour outages or one 26-hour outage? (See figure 1, above.) The former is annoying but may be tolerable, while the latter
can be devastating to a business. This summary-level metric provides no information about the specific incidents that compose it and provides
no indication of the company's tolerance per outage. As such, we need to accept it for what it is: a simple metric that provides simple
information.
Two other measures have traditionally been associated with disaster recovery planning but when used together are quite useful for capturing
and articulating user availability requirements. These measures state user requirements in terms of a recovery time objective (RTO) and a
recovery point objective (RPO).
|
RTO: How long it takes to return a system to normal operation
|
|
RPO: How current the data is once the system is returned to operation
|
| enlarge |
|
Dual systems can provide continuous availability to users.
|
|
Because these measures are termed "objectives," they are understood as forward-looking requirements. These measures are also understood as
"per incident" requirements. Using these measures gives a much clearer picture of how the final data warehouse environment must recover during
each incident and aids in the selection of technologies and services to accomplish this.
A highly available data warehouse environment
High availability can be achieved in numerous ways; for example, through a single system, through add-on features and through multiple systems.
(See figure 2, right.) Therefore, the data warehouse environment can be composed of any combination of these. However, a data warehouse
environment that delivers continuous availability must be designed with multiple systems and should:
|
Have no single point of failure
|
|
Eliminate planned and unplanned downtime
|
|
Protect against incidents due to local, regional, geographical, technological and human factors
|
|
Be transparent to users and applications
|
|
Allow users to experience consistent performance and guaranteed response times, even following a failure
|
|
Maintain a single view of the business
|
|
Provide an environment suitable for hosting mission-critical applications
|
Rising to the challenge
Designing a high-availability data warehouse takes determination and dedication to fully understand user and business requirements and to
select the appropriate metrics for measuring them. Fully understanding these needs is also the first step to selectively applying technologies
and services to cost effectively satisfy availability requirements.
One thing is certain: The role of the data warehouse in companies has changed and will continue to change. Companies must design
high-availability capabilities into the data warehouse environment as a forethought, not an afterthought. Many companies have already done
this, and many more are en route. T
| Views on data warehouse availability |
|
Ever wondered how your data warehouse availability compares to others? Users were polled at the 2006 Teradata
PARTNERS Conference & Expo and shared their views on availability in a data warehouse environment. The results of the
survey, which was conducted at the Dual Active station in the Teradata booth, are made available here.
Customers answered the following four questions (note � multiple answers were allowed):
|
|
Imad Birouty is program marketing manager for Teradata's high-availability solutions, including the Dual Active Solution from Teradata.
Imad would like to thank Christine Percopo, VP Single System Availability, Teradata, for her contribution to this article.
Teradata Magazine-June 2007
|