why1

Viewpoints

Why Teradata

Bursting at the Seams

A data warehouse must expand to accommodate growth.

Planning for the success of the data warehouse involves planning for growth. As a business develops, it's inevitable that the amount of data will increase. And if a developing business wants to truly succeed, satisfy its customers and beat its competition, it will use this data to help make smart analytic decisions.

To accommodate its goals, an organization must have a data warehouse that is flexible and adaptable—along with a plan for expanding it to meet growing needs. This entails preparing not only for the technical and architectural evolution that can be forecast but also for the sporadic, unforeseen and sometimes dramatic development that can occur.

GROWING PAINS

Many factors affect the expansion of the data warehouse. Some dimensions are quite predictable, such as:

  • The increase of core data in the data warehouse related to the organic development of the business
  • A need to add subject areas on the data warehouse roadmap

However, as this new data is integrated and additional capabilities are provided to the business, it becomes increasingly difficult to forecast the effects on the data warehouse due to this new data and the subsequent increased capacity demands.

What's more, many business events are difficult or impossible to anticipate. Take, for example:

  • Technology innovations or new competitors that enter the market can change how the organization's data is viewed. In its quest to remain competitive, an organization may realize an upswing in its own data needs, or a sudden surge of data may accompany the organization's counter-innovation.
  • Merger or acquisition can double the user demand on the data warehouse.
  • Evolution of the industry may mean significant changes to the business model, driving new and different analytic needs.
  • Legal decisions can quickly affect regulatory requirements and the associated reporting.

Although such things cannot be known in advance, how the data warehouse tech-nology and architecture must grow to support them can be pre-determined.

EXPANSION PLANS

A best practice is the execution of a thorough and consistent capacity-planning process. In this way, businesses can regularly measure and report system CPU and data storage consumption over time based on a variety of applications, subject areas and user groups.

Using historical trends as a foundation, capacity projections assess the size and timeframe of system expansions. However, many forecast needs are not necessarily reflections of prior events. The more dynamic the environment, the less reliable the estimate because the future is never identical to the past. Flexible, easy and incremental platform growth options mitigate the risk of capacity forecasts that cannot reflect actual system usage that will occur.

“To accommodate its goals, an organization must have a data warehouse that is flexible and adaptable.”

Also, once a system has reached capacity, upgrades and expansions are no longer available. Considerations must be made, therefore, to avoid reaching this point.

When planning for growth, it is crucial to review and understand the various system options that are dictated by the capabilities of the platform in place. The table to the right provides a breakdown of the advantages and disadvantages of each growth strategy, while the following sections describe these options in greater detail:

System replacement

To meet increasing business-capacity needs, a forklift upgrade must be performed to replace the current system with one that has the necessary capacity. Systems that do not have multi-node parallel processing are the most likely to require a system replacement.

Because this has a significant budget impact, opting to switch out systems involves a lengthy decision and approval process. If the forklift is not implemented well in advance of the need, the business may suffer limited capabilities for an extended period of time.

Multiple systems

A second production system can be deployed that supports a portion of the capacity requirements. This option is most viable when the data, applications and users can be easily and appropriately distributed between two systems.

With the enterprise-wide nature of a data warehouse system, some data overlap is inevitable. Therefore, a certain measure of data redundancy is required, which, in turn, necessitates data movement and synchronization. Because maintaining two systems will incur additional effort and costs, careful analysis must be done to decide whether this option is cost- and time-effective. It may be more efficient to deploy a single large system than to maintain two smaller ones.

image

System over-provision

To avoid the effort and disruption of system expansions, some organizations choose over-provision. Instead of budgeting for a system that will accommodate the expected growth requirements for an extended period of time, these companies buy a system that is larger than necessary to support their current data requirements.

The advantages of this approach are a single decision and budget cycle, and the ease and stability of accommodating expansion as it occurs. An obvious drawback is budgetary—paying in advance for capacity not yet needed. This approach also depends upon accurately matching the development rate with the system capacity, inviting the opportunity to over- or under-shoot the future capacity needs of the business.

Capacity on demand

Gaining popularity in recent years is the capacity-on-demand method, in which a system is sized and installed based upon its anticipated use for an extended period of time.

This method allows for ease of expansion and has extra budgetary benefits: The initial acquisition cost is reduced in line with the initial capacity requirement, and remaining expenditures are incurred as additional capacity is used.

The model is typically based on utilized data capacity and does not restrict the availability of key system CPU and input/output resources. This is an important point to note, because system performance is at its maximum at all times and does not increase as additional data and users are added to the system.

Multi-node parallel systems

The incremental and vast expansion capabilities offered through multi-node parallel systems make them a popular choice for data warehousing. As business requirements evolve, better performance and more capacity is achieved simply by adding nodes and storage.

The most advanced platforms include sophisticated facilities to minimize the effort and disruption to expand, contract or alter the system configuration. System expansion normally occurs incrementally over a period of years as node and storage technologies evolve.

The most adaptive systems can take advantage of innovations by seamlessly blending the new technology with the old. In this way, expanded capacity is delivered while leveraging the price/performance and reliability attributes of the latest advances.

SIDEBAR: Why the Teradata system for a flexible and growing environment?

Teradata is designed to easily provide for data warehouse growth to whatever degree is necessary to support an organization's business goals. Whether the evolution is moderate and consistent or explosive and sporadic, Teradata offers a straightforward, cost-effective solution. The native shared-nothing parallel architecture has stood the test of time and readily takes advantage of technology advances.

  • Multi-node architecture matches system performance and capacity to business needs by incrementally adding nodes and storage.
  • Co-existence of multiple generations of node/storage technology in a single system delivers flexible growth options while protecting the existing investment and leveraging the improved price/performance of the latest platform models.
  • Advanced reconfiguration utility automatically re-balances data across the new configuration during the expansion process. This method enables the removal of older nodes and storage without the need for unloading and loading data. Not only does this automation greatly simplify the expansion process but it also quickly puts the new performance and capacity to work.
  • Linear scalability provides predictable performance increases proportional to the added hardware capacity, thereby eliminating guesswork and reducing the risk of accommodating growth.
  • Cost-based Optimizer automatically selects the query's execution plan that is most efficient and performs the best. This enables the system to take advantage of expanded hardware configurations and the accompanying performance value without changing applications or load processes.
  • Teradata Virtual Storage, available in Teradata 13, provides additional configuration flexibility that matches the platform performance and capacity to a broader set of business requirements. By blending disks of different capacity and performance characteristics and optimizing data placement based upon their usage, Teradata Virtual Storage reduces the cost of managing very large but less frequently accessed data sets. The solution also enables expansion of an existing system by adding storage capacity independent of nodes.
  • Teradata Purpose-Built Platform Family provides a broad range of systems designed to meet unique business requirements. An organization can select a platform as a starting point to match its analytic environment, or it can augment an existing environment by adding multiple systems to address specific data warehousing needs.

FULLY DEVELOPED

Many factors will influence the growth rate of a data warehouse, and some are difficult or impossible to predict. When planning for capacity expansion, not only should likely scenarios be considered but system development plans should also be made to accommodate growth that is dramatically higher or lower than anticipated. Examining the full realm of possible scenarios and their impact may lead to different data warehouse platform or architectural choices.

With the rapid changes and ongoing innovation in the data warehouse industry, an informed company has more choices and flexibility than ever to ensure its systems meet its current—and future— data needs. T