Teradata Magazine Cover Teradata Magazine Online  
Register Help Password
Password:
Quick Links
Current Issue
Archives
Teradata.com
Teradata Magazine Rss Feed
ARCHIVES Search Teradata Magazine Online:  














Tech2Tech:
Ask the Expert:
Teradata 7.1

Teradata 7.1 meets active data warehousing's ever-increasing requirements.

Applied Solutions:
Striking a balance

Strategic queries aren't the only ones that can run happily on a Teradata Warehouse.

Insider's Warehouse:
Good performance

New tools make it possible to define a business strategy and monitor performance.

Just the FAQs
We simplify the complicated. Read FAQs posted online or ask the experts.














 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 



Users looking to decrease their storage costs would typically look at higher capacity drives as a positive industry trend, but this isn't always the case.




























 

 

 

 

 

 

 

 







 


Multi-value compression, like higher capacity disks, is a cost-effective way to add more data into the warehouse by liberating capacity while improving performance.

 

THE DATA STORAGE EVOLUTION

Has disk capacity outgrown its usefulness?

by Ron Yellin

DISK STORAGE CAPACITIES DOUBLE every 12 to 18 months as density rapidly increases, and the amount of available storage is growing exponentially with each disk size capacity increase (figure 1). Users looking to decrease their storage costs would typically look at higher capacity drives as a positive industry trend; however, it is critical to understand the performance implications of rapidly increasing disk capacity.

Like capacity, disk performance is increasing, but at a much slower rate. While capacity has almost doubled every 12 months, disk-seek times (time to position the disk head on the required track) only decreased by approximately 12% during that time. Disk transfer times (the rate at which data is moved from or to the disk) have increased by approximately 40% over the same period. As disk capacity evolved from 2GB to 73GB (a 36-fold increase), average I/O access times (seek, latency—time for disk to revolve, and transfer) have decreased from 13.25 milliseconds to 6.5 milliseconds, just a two-fold improvement (figure 2).

In the past six years, Teradata has seen improvements in disk capacity, disk array bandwidth (throughput in MB/sec.), bandwidth per disk and bandwidth per GB of storage capacity.

Disk capacities in the Teradata environment have increased approximately eight-fold. The disk array bandwidth has seen greater than a five-fold improvement during this same period.

With capacity growing more quickly than disk bandwidth, the bandwidth per GB of storage capacity has actually decreased by 50% (figure 3).

Given this trend, it is challenging to take advantage of abundant storage capacity while maintaining required performance levels.

Managing the workload
As capacity and performance increase at different rates, the actual impact on performance depends on the workload and the nature of the data being transferred. In a data warehousing environment, I/O bandwidth is the critical performance metric, whereas I/Os/sec. may be more relevant in an operational system or OLTP environment.

In the Teradata environment, a typical workload generates an extremely random disk access pattern. I/O sizes vary, with an average of approximately 48 kilobytes. Cache in a disk array is not effective since any data likely to be reused is already cached within the memory of a Teradata compute node, which eliminates the need to issue that I/O in the first place.

Given this workload, the disk performance is the limiting factor in the Teradata environment. Here, a 15,000-rpm disk can generate approximately 5.5 MB/sec. of bandwidth. Based on the I/O bandwidth requirement of each generation of Teradata compute node, the number of disks required can be easily calculated.

As disk capacities increase, businesses could reduce the number of disks they use, but with a finite amount of I/O bandwidth available per disk, they often find it necessary to configure more disks for performance than are required for capacity.

The performance and capacity technology trend is not only a problem in data warehousing environments but it is also an issue in transactional system environments (e.g. OLTP), which are more sensitive to the number of I/Os per second that can be performed. Regardless of capacity, a 15,000-rpm disk is capable of performing approximately 154 small block (512 byte) random I/Os per second (based on a 6.5 millisecond access time as shown in figure 2).

As disk capacity doubles, disk performance (I/Os per second) essentially remains static. As such, there is now twice as much data capacity that the same number of I/Os per second must service. This is leading a number of OLTP vendors to recommend that some of the capacity on the larger disks not be used.

Balancing the technology
Three methods address these disk trends. First, you can do everything possible to extract maximum performance from a disk. Second, you can use advanced database technologies to reduce I/O demand. Third, and inevitably, you can identify good uses for the onslaught of disk capacity. Let’s examine each of these methods.

1) How does Teradata extract the maximum performance possible from a disk?
Teradata uses high-performance disk array controllers and storage interconnects to deliver the disk’s performance capabilities without intervening bottlenecks.

Teradata recommends using RAID-1 instead of RAID-5 redundancy. RAID-5 has a high I/O overhead for updates and writes of intermediate tables. Furthermore, RAID-5 exacerbates the capacity problem.

Teradata uses high-performance enterprise class disks, with high rpm, low seek times and fast transfer rates. Avoid performance-wasting, high-capacity, slow rpm, non-enterprise PC-class SATA disks that require extra revolutions to acquire data in the face of normal vibration.

Teradata tracks external and internal form factor reductions in enterprise-class disks to apply maximum spindles and disk actuators to disk capacity.

Where appropriate, Teradata uses larger transfer sizes, via large block sizes or Teradata Database Cylinder Read to access maximum data with minimum overhead.

2) How does Teradata use advanced database technologies to reduce I/O demand?
The partitioned primary index (PPI) is a Teradata Database V2R5 feature that organizes data such that examining only a small portion of data can satisfy many queries. When applied to range query workloads, Teradata customers have achieved a 10-fold to 100-fold reduction in I/O per query.

Secondary indexes in general are an excellent way to trade ample capacity for reduction in I/O to base tables. The Teradata Index Wizard can identify opportunities for the use of indexing.

The Teradata Database V2R5 sparse join index feature is not only highly efficient on capacity, but it can also be very effective in reducing I/O.

Teradata Tools and Utilities, such as Teradata Dynamic Query Manager, Teradata Priority Scheduler and Teradata Manager, manage the workload and, by extension, the system resources. These capabilities can identify waste and inefficiency, tuning opportunities, and workloads that can be shifted to low-demand periods.

3) How can you mitigate the inevitable onslaught of capacity?
Despite our best efforts to extract maximum disk performance and reduce I/O demand, the disk capacity wave is unstoppable. Shall we throw in the towel like OLTP vendors and suggest that customers use only a fraction of each disk drive? Absolutely not! There is one more solution: Find good, low-access uses for the excess capacity.

Tempering the trend
Temperature is used to represent frequency of data access—the more frequently accessed the data, the hotter its temperature. Likewise, as access frequency decreases, so does the temperature. The Teradata Multi-Temperature Warehouse describes an implementation where the Teradata Database manages both the active (hot) data and the inactive (cool) data, which traditionally has not been kept in the data warehouse. By sharing and managing the storage across temperature ranges, the active data can maintain its required performance levels while rarely accessed data becomes available to users.

To combat the capacity-versus-performance issue, the Teradata Multi-Temperature Warehouse allows you to direct the performance of the disk at the hot data while using the remaining capacity to store and access cool data. (See figure 4.)

Industry-specific requirements, government regulations (e.g., Sarbanes-Oxley Act) and unique business needs generate an endless supply of cool data that can be introduced into the data warehouse.

As data ages, its temperature typically cools. Some companies already maintain some historic data within the data warehouse, but many find a strategic value in increasing the amount of history they maintain. Where six to 18 months’ worth of history might have been typical for some customers, many now find value in keeping several years’ worth of historical data online.

Another possible use for excess capacity is to trade it for higher system and data availability. The Teradata Fallback feature duplicates Teradata objects on an independent hardware domain within the system to maximize single-system availability. The Fallback copy receives only updates, whereas the primary copy receives both updates and reads. So, by definition, the Fallback copy is cooler in nature. Fallback systems can tolerate a large set of unexpected catastrophic failure scenarios. To learn more about Fallback, download the white paper “Single System Availability Features” from the Library section of teradata.com.

Large objects supported in Teradata Database V2R5.1 offer yet another potential source of cool data because Teradata can support up to 2GB character or binary objects for applications requiring data types such as video, picture and audio.

Using excess capacity
High-capacity disks, Teradata Priority Scheduler, Teradata Dynamic Query Manager, PPI and multi-value compression are a few of the technologies that enable you to manage and access multi-temperature data.

High-capacity disks provide a cost-effective means of storing a mix of hot and cool data. Since higher-capacity disks offer a lower overall storage cost per MB, their higher initial cost is negligible.

Teradata Dynamic Query Manager can control query issuance. For example, users accessing cool data may be managed to a low number of queries that can be executed at one time (known as low query concurrence) thus preserving resources for users accessing hot data. Other controls can be implemented based on user role, time of day or query cost.

Teradata Priority Scheduler allocates system resources among the various workload constituents once queries are issued. Low-priority, full-table scans of all cool data may occur in the background, allowing resources to focus on the active data warehouse queries against hot data.

PPI allows a hash-distributed table to be physically partitioned for storage on each virtual AMP according to the partitioning columns. In practice, customers will use time as the partitioning key and store both lightly accessed historic data and more heavily accessed recent data together on high capacity disks. PPI allows the resources to be focused on more recent data. When a customer query is issued with a date-range specification, the Teradata Database Optimizer eliminates the need to scan partitions known not to contain the date range desired by the query.

Multi-value compression, like higher capacity disks, is a cost-effective way to add more data into the warehouse by liberating capacity while improving performance. Liberated capacity can be used for other purposes, such as deeper history or performance enhancing indexes. Compression can result in smaller rows and, hence, more rows per data block. Increased rows-per-data-block causes fewer disk I/Os for scan-oriented query workloads.

With disk capacity growing more rapidly than disk performance, it becomes challenging to configure high-performing data warehousing systems without requiring excess storage capacity.

The Teradata Multi-Temperature Warehouse addresses this industry-wide trend. Through this implementation, customers can configure their systems for high performance while utilizing the excess capacity to increase business value. T

Ron Yellin, director of Storage Product Management at Teradata, has focused on storage for the past six years, primarily managing Teradata's external disk storage solutions and planning future offers. E-mail him at ron.yellin@ teradata-ncr.com.

PHOTO BY CHARLIE SWICK




Copyright by Teradata Corporation 2001-2007.