Viewpoints
Why Teradata
Bursting at the Seams
A data warehouse must expand to accommodate growth.
by Keith Rimmereid
Planning for the success of the data warehouse involves planning for growth. As a business develops, it's inevitable that the amount of data will increase. And if a developing business wants to truly succeed, satisfy its customers and beat its competition, it will use this data to help make smart analytic decisions.
To accommodate its goals, an organization must have a data warehouse that is flexible and adaptable—along with a plan for expanding it to meet growing needs. This entails preparing not only for the technical and architectural evolution that can be forecast but also for the sporadic, unforeseen and sometimes dramatic development that can occur.
GROWING PAINS
Many factors affect the expansion of the data warehouse. Some dimensions are quite predictable, such as:
- The increase of core data in the data warehouse related to the organic development of the business
- A need to add subject areas on the data warehouse roadmap
However, as this new data is integrated and additional capabilities are provided to the business, it becomes increasingly difficult to forecast the effects on the data warehouse due to this new data and the subsequent increased capacity demands.
What's more, many business events are difficult or impossible to anticipate. Take, for example:
- Technology innovations or new competitors that enter the market can change how the organization's data is viewed. In its quest to remain competitive, an organization may realize an upswing in its own data needs, or a sudden surge of data may accompany the organization's counter-innovation.
- Merger or acquisition can double the user demand on the data warehouse.
- Evolution of the industry may mean significant changes to the business model, driving new and different analytic needs.
- Legal decisions can quickly affect regulatory requirements and the associated reporting.
Although such things cannot be known in advance, how the data warehouse tech-nology and architecture must grow to support them can be pre-determined.
EXPANSION PLANS
A best practice is the execution of a thorough and consistent capacity-planning process. In this way, businesses can regularly measure and report system CPU and data storage consumption over time based on a variety of applications, subject areas and user groups.
Using historical trends as a foundation, capacity projections assess the size and timeframe of system expansions. However, many forecast needs are not necessarily reflections of prior events. The more dynamic the environment, the less reliable the estimate because the future is never identical to the past. Flexible, easy and incremental platform growth options mitigate the risk of capacity forecasts that cannot reflect actual system usage that will occur.
“To accommodate its goals, an organization must have a data warehouse that is flexible and adaptable.”
Also, once a system has reached capacity, upgrades and expansions are no longer available. Considerations must be made, therefore, to avoid reaching this point.
When planning for growth, it is crucial to review and understand the various system options that are dictated by the capabilities of the platform in place. The table to the right provides a breakdown of the advantages and disadvantages of each growth strategy, while the following sections describe these options in greater detail:
System replacement
To meet increasing business-capacity needs, a forklift upgrade must be performed to replace the current system
with one that has the necessary capacity. Systems that do not have multi-node parallel processing are the most likely to require a system replacement.
Because this has a significant budget impact, opting to switch out systems involves a lengthy decision and approval process. If the forklift is not implemented well in advance of the need, the business may suffer limited capabilities for an extended period of time.
Multiple systems
A second production system can be deployed that supports a portion of the capacity requirements. This option is most viable when the data, applications and users can be easily and appropriately distributed between two systems.
With the enterprise-wide nature of a data warehouse system, some data overlap is inevitable. Therefore, a certain measure of data redundancy is required, which, in turn, necessitates data movement and synchronization. Because maintaining two systems will incur additional effort and costs, careful analysis must be done to decide whether this option is cost- and time-effective. It may be more efficient to deploy a single large system than to maintain two smaller ones.
System over-provision
To avoid the effort and disruption of system expansions, some organizations choose over-provision. Instead of budgeting for a system that will accommodate the expected growth requirements for an extended period of time, these companies buy a system that is larger than necessary to support their current data requirements.
The advantages of this approach are a single decision and budget cycle, and the ease and stability of accommodating expansion as it occurs. An obvious drawback is budgetary—paying in advance for capacity not yet needed. This approach also depends upon accurately matching the development rate with the system capacity, inviting the opportunity to over- or under-shoot the future capacity needs of the business.
Capacity on demand
Gaining popularity in recent years is the capacity-on-demand method, in which a system is sized and installed based upon its anticipated use for an extended period of time.
This method allows for ease of expansion and has extra budgetary benefits: The initial acquisition cost is reduced in line with the initial capacity requirement, and remaining expenditures are incurred as additional capacity is used.
The model is typically based on utilized data capacity and does not restrict the availability of key system CPU and input/output resources. This is an important point to note, because system performance is at its maximum at all times and does not increase as additional data and users are added to the system.
Multi-node parallel systems
The incremental and vast expansion capabilities offered through multi-node parallel systems make them a popular choice for data warehousing. As business requirements evolve, better performance and more capacity is achieved simply by adding nodes and storage.
The most advanced platforms include sophisticated facilities to minimize the effort and disruption to expand, contract or alter the system configuration. System expansion normally occurs incrementally over a period of years as node and storage technologies evolve.
The most adaptive systems can take advantage of innovations by seamlessly blending the new technology with the old. In this way, expanded capacity is delivered while leveraging the price/performance and reliability attributes of the latest advances.
FULLY DEVELOPED
Many factors will influence the growth rate of a data warehouse, and some are difficult or impossible to predict. When planning for capacity expansion, not only should likely scenarios be considered but system development plans should also be made to accommodate growth that is dramatically higher or lower than anticipated. Examining the full realm of possible scenarios and their impact may lead to different data warehouse platform or architectural choices.
With the rapid changes and ongoing innovation in the data warehouse industry, an informed company has more choices and flexibility than ever to ensure its systems meet its current—and future— data needs. T
Keith Rimmereid is a senior consultant with Teradata Data Warehousing Sales Support in San Diego.