Register | Log in


Subscribe Now>>
Home Tech2Tech Features Viewpoints Facts & Fun Teradata.com
Fresh Perspectives
Send to Colleague

Prove it!

Questions you need to answer when planning a proof of concept.

by Rob Armstrong, Director of data warehouse support at Teradata

When I was in Europe recently, a customer asked me for advice on how to plan a test that would help him evaluate various vendors and what to include in a proof of concept (POC). I asked him what concept he was trying to prove and what would satisfactorily prove it. He had no firm answers.

Many people do not have the answers to the first part of the question, let alone the second. Most will say they want to prove the data warehouse concept or its performance or the technology. These ambiguous answers must be more focused and specific. With the expense and time it takes to execute a POC, make sure your efforts are well-defined and will drive you to a usable outcome.

The concept to prove
The data warehouse is used to gain insight into how to modify your business and processes to reach your corporate goal while at the same time facilitating changes in your data and information needs. Your solution should be able to handle these changes.

So when designing a POC for your organization, look for ways to ensure that the environment is adaptable. But how do you determine what types of change are probable? No matter what happens in your organization, you can be assured of the following:

Data volumes will grow
This is a given. You will see an upsurge in data anywhere from doubling yearly to increasing in magnitudes every few years. Traditionally, a higher rate of growth up front will occur as critical subject areas are loaded. Base tables will increase in size and quantity over time as your diversity expands.

It is necessary to benchmark your system periodically, every six months or so, to ensure your system is properly handling your new data and business changes. When testing your system, include both types of growth, as inherent growth is easier to manage than a spreading diversity. In any case, as your data changes, make sure your queries take into account the greater depth of history and wider cross-functionality.

Workload mix will vary
It is insufficient to test only one workload, as every system—even a single department-focused data mart—has a variety of users and usage. Include all of your workloads in your test, as well as a mix of known reporting, pattern analysis and complex data mining, and change the percentage mix to see how the various workloads affect one another. The POC should also include tactical queries and continuous data loading, as this is the reality of your future. Your testing should ensure that your environment will handle these strategies today so it can evolve as your business needs evolve.

System availability will change
In the world of 24x7 work processes, not only is vital to make your data warehouse environment available to the users, but it must also be technically functioning. A faulty system will produce faulty results. To ensure your users can consistently and successfully access the data warehouse, test the system hardware and software. Actually pull a disk or turn off a node to see how a hardware failure affects the user. Simulate a software glitch or cancel a large update job—what happens to the user's query? Finally, identify whether common maintenance processes, such as indexes or backups, keep the user from accessing the data warehouse.

Vendor evaluation
When you are ready to check into vendors, consider these options:
Analyst reports. To decide on a vendor, review analysts' research on what is currently happening in the arena but also what they predict in the future of the vendors. The better you can define your company's vision and objectives, the better these reviews can narrow your data warehousing options.
Customer references. Define your workload and tools, then ask the vendor to connect you with customers whose needs match those of your organization. It is not critical to talk with people in the same industry—and chances are you won't because of competitive reasons. Look for companies that have been with the vendor at least three years, and have grown the data warehouse at least once. Talk with their IT and business communities separately. The main objective is to speak with organizations that have succeeded in creating and managing change in their environment.
Vendor benchmark. Your organization is investing in a data warehouse system, so make sure you are getting what you pay for. Test the vendor's system using your toolsets either at your company site or under your supervision at the vendor's locale.

Increase the hardware configuration, create outages and run tests using your toolsets. Find out when and where the system breaks. Introduce change by adding tables or columns. Run mixed workloads and use "bad" SQL. Not only will you see how the system responds, but you can get a good idea of how the vendor responds as well.

To be assured the system can handle your business growth spurts, test your two-year predicted data volumes and workloads. Is the system scalable? Can it accommodate future increases in data volume and diversity expansion? Ask the vendors what might be missing from your IT system that would help make your business more productive. And, conversely, ask what your business side of the organization can do to help IT understand its needs better.

The vendor you work with should know the system as well as the data warehousing world, so make sure you are comfortable with the company, the people and the platform.

The 4 steps of a proof of concept

STEP 1 Know your priorities.
STEP 2 Start with the basics.
STEP 3 Meet your primary objectives.
STEP 4 Introduce change.

The 4 steps of a proof of concept (POC)

Step 1: Know your priorities
When creating a POC, identify what you consider a successful outcome. Are you trying to meet your current performance, but at less cost and with fewer personnel? Do you want a percentage boost in performance while holding cost constant? Are new functionalities and applications your proof point? Individually rank your criteria—they will most likely be met to varying degrees and should not be equally weighted.

Step 2: Start with the basics
Most POC trials are run on a reduced set of data. However many options that work with smaller amounts of data become invalid as the data grows. That's why I recommend testing the system at data volumes you expect within 12 to 18 months. If the POC cannot handle the larger volumes, then it's proven the system would not likely meet your needs in live production.

You do not want a lot of indexes, summary tables or model changes in order to make your data warehouse work. The more of these your system has, the less resilient to change it becomes—and it also becomes more expensive to manage. Start your POC by proving you can run your queries against a model that preserves data relationships, regardless of function. Loading your data into the logical model as close as possible, and running your queries against that model, should ensure that the data relationships are intact and will support your changing analytics.

To truly test the system, do not supply the vendor specific information about the queries you intend to run. After all, the questions you ask will constantly change, so make sure you do not have to change the data model to run queries. To ensure the focus is on flexibility, you can provide some general ideas about access—whether the current quarter or year-to-year is most accessed, or if regional versus company-wide is important—but do not give the queries themselves. The system should be able to handle whatever query or combination of queries you might have now and in the future.

Step 3: Meet your primary objectives
Once the data is modeled and loaded onto the system, run your tests in a stand-alone mode using only 70% to 80% of your queries. Hold the rest of the queries for later.

Do the queries you are testing run in an acceptable time frame? Were the loads completed in your batch window? Did you get a sense of what it takes to set up the tables, users and access rights? Are the output and analytics what you were expecting?

If your objective cannot be met, then allow some tuning to give you an understanding of how much indexing, summarizing or pre-joining of the data is necessary. What you are looking for is whether the data has been modeled correctly and if the system optimizer has the ability to meet query demands without intensive IT involvement.

Step 4: Introduce change
Now that you have a baseline for the system, it is time for the real proof. Add into the mix the remaining queries you withheld earlier. Does the model and data setup allow for new analyses with no or minimal IT and user effort? If so, you're on the right track.

Or, in order for the system to properly function, do you need to create extracts and/or summaries, add new indexing or make model changes? These are major red flags. It shows the concept has not been proven, and to introduce or respond to change, you must involve more IT resources. The danger is if you add indexes and make model changes, you may also "break" what was previously working.

Assuming the system handled the unexpected queries with resiliency, the next change to introduce is query workload. Running a high concurrency of the same workload is one thing; running a widely varied workload is another. In your test, simultaneously mix long-running, reporting and quick tactical workloads. What happens? Can system tools be used to isolate problems, dynamically re-allocate resources and provide critical response to critical workload—all while allowing other workloads to run?

Once you are comfortable that the system can handle a variety of workloads and be managed without unreasonable effort, it is time for your next POC point. Can you add data to the environment easily? What happens if you double or triple the data volumes in the tables? Do the queries and load processes run as expected? Can you estimate the run times with the increase in data?

For additional testing, add tables to the system, then run queries with questions that will join the new tables to the original ones. All of this helps to prove that the system can absorb change easily.

Finally, any test should include failover and data availability. Hardware breaks, so the concept is not hard to fathom. While the system is running some of the above workloads, see what happens if you remove a disk, turn off the processing cabinet power or simply restart the system. The environment should respond automatically by alerting you to the outage while still running in a diminished state.

If everything functions properly during your well-planned POC, then you can feel assured your system will be a readily adaptable, dynamic environment that not only is resilient to technical changes but will scale as the rate of your business changes as well. T

Teradata Magazine-September 2008

Related Link

Reference Library

Get complete access to Teradata articles and white papers specific to your area of interest by selecting a category below. Reference Library
Search our library:


Manthan

Trillium

Protegrity

Teradata.com | About Us | Contact Us | Media Kit | Subscribe | Privacy/Legal | RSS
Copyright © 2008 Teradata Corporation. All rights reserved.