Register | Log in


Subscribe Now>>
Home News Tech2Tech Features Viewpoints Facts & Fun Teradata.com
Why Teradata
Send to Colleague

Take a quantum leap with super-linear scalability

Reduce resource demands while increasing performance.

by Dan Higgins, Director of Teradata Warehouse sales support

As a child, I enjoyed Madeleine L'Engle's book A Wrinkle In Time, in which the characters traveled through the universe by bending time. In addition, growing up in the 1960s I often watched Star Trek, the science-fiction television series where warp drives made travel faster than the speed of light.

Take a quantum leap with super-linear scalability

Most physicists claim warp travel is impossible. Too bad. But as an IT employee, if you could shorten the time it takes to process even more workloads, you would feel like you've been given free hours.

In our real world of computing—specifically, data warehousing—we constantly strive for improved performance while simultaneously providing more capabilities to more users using more data. This is referred to as scalability. Taking it a few steps further, super-linear scalability is the ability to deliver consistent and predictable performance, simplified capacity planning and easier system management while growing the data warehouses to tens or hundreds of terabytes and thousands of users. (See figure, below.)

Linear scalability has always been a key differentiator between Teradata and its competitors. While many vendors have claimed to provide scalability or linear scalability, not all claims are justified. Furthermore, we need to go beyond linear scalability, because linear scalability alone is no longer sufficient to meet the ever-growing demands of decision support.

Effects of scalability on performance
enlarge
A system's performance is determined by its level of scalability. After adding balanced capacity, super-linear scalability can exceed others, especially if tuning and optimization capabilities are utilized.

Linear scalability
Linear scalability in decision support systems (DSSs) can take two basic forms. In the first case, consider those queries (or reports) that interrogate large volumes of data. With these queries, performance is directly related to data volume. If the volume of data used by the query is doubled (e.g., from one year to two years of history, or from 1 million to 2 million customer records) then the query execution times will often double.

But what if you want to maintain the same performance when doubling the volume of data? If you have a linearly scalable system, then all you need do is double the system capacity. (Note that it is usually best to grow the system in a balanced manner, maintaining the relationships among CPU, memory, disk and I/O bandwidth capacity.)

But what about tactical or operational decision support where the queries interrogate very small amounts of data, leverage one or more forms of indexes and engage only one unit of parallelism (access module processors [AMPs] in the Teradata system)? An example would be a primary index query in Teradata. Increases in overall data volume have negligible impact on the performance of these queries. Can you still have linear scalability? Yes. In this case, when the system is expanded it provides a linear increase in the number of these queries that can be processed concurrently.

Sub-linear scalability
Scalability is not linear when doubling the system's capacity does not yield a corresponding doubling in system performance. Sub-linear scalability refers to a system that is not functioning at its potential capacity because of system bottlenecks, overhead from managing shared resources or insufficient use of parallel processing.

For example, sub-linear scalability occurs when doubling the number of CPUs in a shared-memory parallelism system does not yield a doubling in performance because of memory access contention. Another example is in a shared-disk cluster when doubling the number of nodes does not result in a corresponding doubling in performance because of the overhead associated with the management of shared data.

With sub-linear scalability, data warehouse growth is inhibited, the ability to service DSS workloads is limited and IT staff is impaired by additional complexity and effort.

To tune or not to tune

The use of indexes, partitioning and other performance tuning methods is often debated. Some database administrators (DBAs) would claim that their architectures do not require tuning, arguing that all tuning is unnecessary. Conversely, other DBAs create thousands of indexes and use other tuning structures, ending up with a huge, complex burden.

The fact is tuning always has a role and is necessary for optimal performance. Why would you scan a 1 billion-row table when you really need only a few hundred rows? Rather than debate about whether tuning is necessary, the debate should be how much tuning is required and whether that tuning can be done with built-in, automatic features such as indexes, partitioning and materialized views.

Considerations should also be made as to whether the tuning requires manual modifications to the queries, physical schema and system configuration.

The key is to carefully consider your tuning options and to use them only when necessary. Whenever possible, avoid tuning approaches that:
> Limit scalability
> Assume that users have intimate knowledge of the physical schema
> Place undue restrictions on end users
 
—D.H.

Super-linear scalability
As the demands on DSS have increased, linear scalability alone has become inadequate to meet performance requirements. With super-linear scalability, however, you can take a quantum leap in performance.

Super-linear scalability can be achieved in multiple, complementary ways. They all build on a foundation of linear scalability while significantly reducing system resource demands and increasing workload performance. The primary means of achieving super-linear scalability is by radically reducing resource use through indexes, data compression and partitioning.

Another type of super-linear scalability occurs when the system exploits concurrent workloads by servicing multiple requests with a single operation. A good example of this involves Teradata's Syncscan feature in which the Teradata software combines data requests from multiple, concurrent users into a single I/O operation.

Using workload management tools to balance and offset complementary workloads is another means of improving performance beyond simply increasing system size. These features are like adding "warp drives" to a linearly scalable platform.

When assessing your system's performance and return on investment (ROI), add these qualifiers to your checklist:
Are you benefiting from super-linear scalability?
Are you taking advantage of Teradata's various tuning features?
Are your applications scalable?
Are there bottlenecks and choke-points in your IT or business processes?
Are you leveraging Teradata's workload management capabilities to get the most work out of your investment?

If you want to increase your system's capability while limiting or avoiding increases in cost, you can benefit from super-linear scalability in many ways.

Warp speed
It might not be possible to travel faster than the speed of light or bend time to accommodate a busy schedule, but utilizing the super-linear scalability capabilities in your data warehouse can help decrease the time it takes to process workloads. And that can save you time and money. T

Five ways to improve scalability

Scalable technology has allowed data warehousing to go further than many IT experts thought possible a decade ago. But scalable technology by itself is not enough. The system consists not only of technology but also of applications, processes (IT and business) and organizations. If any of these is not scalable, a bottleneck or choke-point in the system will limit system throughput, reduce performance and result in wasted resources.

Michael McIntire, in his presentation on infinite scalability given at the 2006 Teradata PARTNERS User Group Conference, discussed the need for super-linear scalability to meet growing system demand while reducing cost. McIntire, principal architect at eBay, also outlined the importance of designing an overall architecture, user applications and IT and business processes that scale according to need. For application design, he offers the following five recommendations:

  1. Move functionality to the data
  2. Use SQL Set operations wherever possible
  3. Avoid third-party applications that do not scale
  4. Minimize or avoid order-dependent design that serializes processing
  5. Leverage Teradata features that add scalability, such as partitioned primary indexes, compression and indexes

—D.H.

Why Teradata

Although Teradata's shared-nothing architecture and parallelism have always yielded linear scalability, multiple enhancements to optimize the use of system resources and improve performance have been developed by Teradata over the years. Below are a few of the more significant improvements and features available that make super-linear scalability possible:
> Indexing (secondary indexes, join indexes, aggregate join indexes, covered index, sparse indexes) yields dramatic reductions in resource use and query response time.
> Partitioned Primary Indexes (PPIs) significantly decrease the amount of data that must be read from disk to service a query.
> Workload management manages queries and workloads to ensure predictable performance and improve overall system utilization.
> Syncscan improves performance as the number of users increases. When queries concurrently scan the same table, they share the disk read I/Os, thereby eliminating the need to read those disk blocks for each query.
> Multi-generation coexistence maintains balance and scalability when Teradata is running on a mixed-generation configuration. Because Teradata's shared-nothing architecture is software-based, the number of access module processors running on a particular generation of nodes can be set to match the performance characteristics of that node.
> System monitoring, management and utilization reporting tools monitor the system to ensure balanced resource use, which is critical to achieving linear scalability.
> Multi-valued compression utilizes Teradata's data compression to notably reduce the cost of storing increasing amounts of data.
> Parallel utilities exploit Teradata's parallelism with load utilities to enable an overall scalable decision-support environment.
> Scalable user session management linearly increases the manageable number of concurrent sessions by adding nodes. While some architectures use a single node to handle all concurrent user sessions, Teradata's parsing engine processes run on each Teradata node.

—D.H.

Photograph by Phillip and Karen Smith/Getty

Teradata Magazine-September 2007

Related Links

Reference Library

Get complete access to Teradata articles and white papers specific to your area of interest by selecting a category below. Reference Library
Search our library:


Protegrity

Teradata.com | About Us | Contact Us | Media Kit | Subscribe | Privacy/Legal | RSS
Copyright © 2008 Teradata Corporation. All rights reserved.