Reduce resource demands while increasing performance.
by Dan Higgins, Director of Teradata Warehouse sales support
As a child, I enjoyed Madeleine L'Engle's book A Wrinkle In Time, in which the characters traveled through the universe by bending
time. In addition, growing up in the 1960s I often watched Star Trek, the science-fiction television series where warp drives made travel
faster than the speed of light.
Most physicists claim warp travel is impossible. Too bad. But as an IT employee, if you could shorten the time it takes to process even more
workloads, you would feel like you've been given free hours.
In our real world of computing—specifically, data warehousing—we constantly strive for improved performance while simultaneously providing more
capabilities to more users using more data. This is referred to as scalability. Taking it a few steps further, super-linear scalability is the
ability to deliver consistent and predictable performance, simplified capacity planning and easier system management while growing the data
warehouses to tens or hundreds of terabytes and thousands of users. (See figure, below.)
Linear scalability has always been a key differentiator between Teradata and its competitors. While many vendors have claimed to provide
scalability or linear scalability, not all claims are justified. Furthermore, we need to go beyond linear scalability, because linear
scalability alone is no longer sufficient to meet the ever-growing demands of decision support.
| enlarge |
|
A system's performance is determined by its level of scalability. After adding balanced capacity, super-linear
scalability can exceed others, especially if tuning and optimization capabilities are utilized.
|
|
Linear scalability
Linear scalability in decision support systems (DSSs) can take two basic forms. In the first case, consider those queries (or reports) that
interrogate large volumes of data. With these queries, performance is directly related to data volume. If the volume of data used by the query
is doubled (e.g., from one year to two years of history, or from 1 million to 2 million customer records) then the query execution times will
often double.
But what if you want to maintain the same performance when doubling the volume of data? If you have a linearly scalable system, then all you
need do is double the system capacity. (Note that it is usually best to grow the system in a balanced manner, maintaining the relationships
among CPU, memory, disk and I/O bandwidth capacity.)
But what about tactical or operational decision support where the queries interrogate very small amounts of data, leverage one or more forms
of indexes and engage only one unit of parallelism (access module processors [AMPs] in the Teradata system)? An example would be a primary
index query in Teradata. Increases in overall data volume have negligible impact on the performance of these queries. Can you still have
linear scalability? Yes. In this case, when the system is expanded it provides a linear increase in the number of these queries that can be
processed concurrently.
Sub-linear scalability
Scalability is not linear when doubling the system's capacity does not yield a corresponding doubling in system performance. Sub-linear
scalability refers to a system that is not functioning at its potential capacity because of system bottlenecks, overhead from managing shared
resources or insufficient use of parallel processing.
For example, sub-linear scalability occurs when doubling the number of CPUs in a shared-memory parallelism system does not yield a doubling in
performance because of memory access contention. Another example is in a shared-disk cluster when doubling the number of nodes does not result
in a corresponding doubling in performance because of the overhead associated with the management of shared data.
With sub-linear scalability, data warehouse growth is inhibited, the ability to service DSS workloads is limited and IT staff is impaired by
additional complexity and effort.
| To tune or not to tune |
|
The use of indexes, partitioning and other performance tuning methods is often debated. Some database
administrators (DBAs) would claim that their architectures do not require tuning, arguing that all tuning is
unnecessary. Conversely, other DBAs create thousands of indexes and use other tuning structures, ending up with
a huge, complex burden.
The fact is tuning always has a role and is necessary for optimal performance. Why would you scan a 1 billion-row
table when you really need only a few hundred rows? Rather than debate about whether tuning is necessary, the
debate should be how much tuning is required and whether that tuning can be done with built-in, automatic features
such as indexes, partitioning and materialized views.
Considerations should also be made as to whether the tuning requires manual modifications to the queries, physical
schema and system configuration.
The key is to carefully consider your tuning options and to use them only when necessary. Whenever possible,
avoid tuning approaches that:
| > |
Limit scalability
|
| > |
Assume that users have intimate knowledge of the physical schema
|
| > |
Place undue restrictions on end users
|
| |
|
—D.H.
|
|
|
Super-linear scalability
As the demands on DSS have increased, linear scalability alone has become inadequate to meet performance requirements. With super-linear
scalability, however, you can take a quantum leap in performance.
Super-linear scalability can be achieved in multiple, complementary ways. They all build on a foundation of linear scalability while
significantly reducing system resource demands and increasing workload performance. The primary means of achieving super-linear scalability is
by radically reducing resource use through indexes, data compression and partitioning.
Another type of super-linear scalability occurs when the system exploits concurrent workloads by servicing multiple requests with a single
operation. A good example of this involves Teradata's Syncscan feature in which the Teradata software combines data requests from multiple,
concurrent users into a single I/O operation.
Using workload management tools to balance and offset complementary workloads is another means of improving performance beyond simply
increasing system size. These features are like adding "warp drives" to a linearly scalable platform.
When assessing your system's performance and return on investment (ROI), add these qualifiers to your checklist:
|
Are you benefiting from super-linear scalability?
|
|
Are you taking advantage of Teradata's various tuning features?
|
|
Are your applications scalable?
|
|
Are there bottlenecks and choke-points in your IT or business processes?
|
|
Are you leveraging Teradata's workload management capabilities to get the most work out of your investment?
|
If you want to increase your system's capability while limiting or avoiding increases in cost, you can benefit from super-linear scalability
in many ways.
Warp speed
It might not be possible to travel faster than the speed of light or bend time to accommodate a busy schedule, but utilizing the super-linear
scalability capabilities in your data warehouse can help decrease the time it takes to process workloads. And that can save you time and money.
T
| Five ways to improve scalability |
|
Scalable technology has allowed data warehousing to go further than many IT experts thought possible a decade ago.
But scalable technology by itself is not enough. The system consists not only of technology but also of
applications, processes (IT and business) and organizations. If any of these is not scalable, a bottleneck or
choke-point in the system will limit system throughput, reduce performance and result in wasted resources.
Michael McIntire, in his presentation on infinite scalability given at the 2006 Teradata PARTNERS User Group
Conference, discussed the need for super-linear scalability to meet growing system demand while reducing cost.
McIntire, principal architect at eBay, also outlined the importance of designing an overall architecture, user
applications and IT and business processes that scale according to need. For application design, he offers the
following five recommendations:
- Move functionality to the data
- Use SQL Set operations wherever possible
- Avoid third-party applications that do not scale
- Minimize or avoid order-dependent design that serializes processing
- Leverage Teradata features that add scalability, such as partitioned primary indexes, compression and indexes
—D.H.
|
|
| Why Teradata |
|
Although Teradata's shared-nothing architecture and parallelism have always yielded linear scalability, multiple
enhancements to optimize the use of system resources and improve performance have been developed by Teradata over
the years. Below are a few of the more significant improvements and features available that make super-linear
scalability possible:
| > |
Indexing (secondary indexes, join indexes, aggregate join indexes, covered index, sparse indexes)
yields dramatic reductions in resource use and query response time.
|
| > |
Partitioned Primary Indexes (PPIs) significantly decrease the amount of data that must be read
from disk to service a query.
|
| > |
Workload management manages queries and workloads to ensure predictable performance and improve
overall system utilization.
|
| > |
Syncscan improves performance as the number of users increases. When queries concurrently scan
the same table, they share the disk read I/Os, thereby eliminating the need to read those disk blocks
for each query.
|
| > |
Multi-generation coexistence maintains balance and scalability when Teradata is running on a
mixed-generation configuration. Because Teradata's shared-nothing architecture is software-based, the
number of access module processors running on a particular generation of nodes can be set to match the
performance characteristics of that node.
|
| > |
System monitoring, management and utilization reporting tools monitor the system to ensure
balanced resource use, which is critical to achieving linear scalability.
|
| > |
Multi-valued compression utilizes Teradata's data compression to notably reduce the cost of
storing increasing amounts of data.
|
| > |
Parallel utilities exploit Teradata's parallelism with load utilities to enable an overall
scalable decision-support environment.
|
| > |
Scalable user session management linearly increases the manageable number of concurrent
sessions by adding nodes. While some architectures use a single node to handle all concurrent user
sessions, Teradata's parsing engine processes run on each Teradata node.
|
—D.H.
|
|
Photograph by Phillip and Karen Smith/Getty
Teradata Magazine-September 2007
|