Tech2Tech

Ask the Experts

judge

 

You be the judge

Let your rules control the appliance benchmark to get the best fit for your company.

Realizing maximum return on investment (ROI) on any product is critical for every organization. Before purchasing a new data warehouse appliance, best practice suggests determining in advance how well the appliance will handle anticipated—and unanticipated—workloads. The most effective way to comparably measure workload performance is to conduct benchmarks for all of the appliances under serious consideration. Benchmarking vastly improves the odds that the appliance chosen is the right product to meet an organization’s goals and budget.

As a collaborative team, Gene Erickson, engineering director; Dan Higgins, sales support director; Alain Crolotte, senior benchmark analyst; Rick Burns, senior consultant; and Jim Morrissey, system engineer, all from Teradata, answered key questions about the benchmark process.

Q: What goals should a benchmark achieve?

A: Simply put, an effective benchmark adequately reflects an organization’s requirements for a data warehouse appliance, taking into consideration its workload volume forecast. The results will help indicate which appliance will be most successful once deployed. To achieve this, the application environment must be closely replicated.

How well an appliance will perform under anticipated and unanticipated activities should be apparent through benchmarking. Such situations include increases in data volume, new and varying query workloads, and additional users and applications. Whether the appliance being considered is capable of effectively answering unexpected queries that may occur with changing business conditions should also be revealed.

Q: How do I develop requirements to test?

A: First, know your problem or issue. When data warehouse appliances are being benchmarked, a working assumption is that an organization is seeking to solve a data analysis problem. It may sound simple, but a crucial part of creating requirements is an in-depth understanding of the problem, as well as the subject application.

Next, get a handle on data requirements by asking key questions: What data is involved? Is it internally owned and controlled? Does it come from outside the organization? Is it a mix of internal and external data? Can the data be scaled realistically to reflect future growth? If customer information is involved, are there multiple entries for a single entity? Will production data be used in the comparison environment? If so, it will make your results much more compelling.

Did you know?

Appliance: An integrated, complete technology stack made up of hardware and software components. The pre-assembled, pre-configured and pre-tested device is housed on its own server and runs or provides a single function.

It is important to establish benchmark requirements based on current workloads, including ad hoc queries. Let the vendors know how many and how often reports are produced, and how important they are to the end users. Include a breakdown of the size of the reports, the number of rows returned, joins to be performed and total rows in an answer set. When the reports are processed is also important—if the schedule affects other jobs in the production environment, this should be reflected in the benchmark.

Q: What types of queries should I give the vendor to run?

A: Provide each vendor the same test data and queries. However—and this is important—also give each prospect unanticipated queries to ensure that the appliance has not been tuned and prepared specifically for the benchmark. It is also useful to provide at least one unplanned business question so that a new query needs to be developed to answer it.

Vary the complexity of both prepared and unanticipated queries to show how the appliance will function in real-world situations. Structure queries so some have few joins, while others have 10 or more; this can differentiate between simple and complex queries. Make sure queries access different data and that they stress various functional capabilities such as online analytical processing tasks or correlated sub-queries. Avoid simplistic tests with extensive replication of a handful of queries; this lends itself to results caching (better known as “benchmark games”) and is in no way reflective of reality. A benchmark should reflect the complexities of the customer’s actual production environment.

Finally, take into account current infrastructure dependencies, meaning the vendors should be able to work with existing data management capabilities such as extract, transform and load tools.

Q: How can I verify each vendor is applying the same methodology?

A: To measure comparable results—and to be fair—all vendors must run the same workloads against the same data sets according to the same overall benchmark schedule. If any data issues or errors are discovered during the first vendor’s preparation, either these must be replicated for subsequent vendors, or benchmarking should be postponed until the issue has been corrected and the data is ready for testing.

Did you know?

Benchmark games: Another term for results caching in which only a few simplistic queries are run multiple times for better, but unrealistic, test results.

Because in production new data is added over time and some older data deleted, require each vendor to at least double the amount of data to assure that the benchmark process does not mask any appliance-scaling issues. Many appliances provide non-scalar performance based on how fully loaded the appliance is.

Verify that all test data is loaded by checking that the number of rows in the database is the same for each vendor. Make certain everyone is using the same testing data and queries. SQL generated from a business intelligence [BI] tool must be executed as is. Validate the accuracy of all query results. Ideally, the SQL and the results are identical among the vendors. Require the vendor to explain any differences.

Q: Besides raw performance, should I care about other benchmark metrics?

A: Certainly, a primary goal of any benchmark is to rank the performance of a vendor’s appliance. However, response time should not be the only metric measured. Most organizations also take other areas into account.

Ease of use, a non-performance-related metric, can be critical to the success of a data warehouse appliance. While it can be difficult to quantify, answering two questions can help: How long does it take to load all of the data into the appliance? How long does it take to tune for performance? If either of these times is excessive, it won’t really matter how fast the appliance runs, because the real question to measuring the success of an appliance is: How long does it take to get an answer from the beginning to the end of the query process?

Look for ease-of-use capabilities such as “load and go” features that automatically optimize performance during a benchmark. Examples include partitioning, compression and materialized views.

Another non-performance-related metric is system availability. System downtime can become an organizational bottleneck. Get specifics on mean time between failures. Ask for a demonstration of the system under workload during the occurrence of common failures (drive, storage, node, communication failures). When an appliance is affected by an outage, observe and measure the time it takes for the system to recover.

Q: What other less-apparent considerations should I be aware of?

A: Most organizations depend on third-party software. Does the appliance being considered support this software, and will it work with the existing tools? Bring a list of what is currently used on your system. If the software is critical, use it during the benchmark.

Also, how well does the appliance fit in with your overall IT infrastructure? If it is a unique device that has a different architecture from most systems, it could pose exclusive maintenance and support problems and result in potentially costly solutions. Open database connectivity is not just ODBC, and Java database connectivity is not just JDBC; the driver implementation can affect performance and functionality.

Q: How should I analyze the benchmark results?

A: Have a clear vision of your success criteria. Make a list of the top three to five goals for the appliance and weight them in importance. If a particular report must be processed within a defined timeframe, that must be achieved during the benchmark. If the potential to grow and change are essential, results for scalability and running unanticipated queries must be favorable.

Beyond the benchmark

The goal of a benchmark is to determine which system works best for the customer, everything else being equal. Quantitative data is the easiest to compare but is not the only criterion for choosing a high-performance data warehouse appliance. Before making an investment, take these additional steps to ensure that the appliance selected will fit your organization’s needs:

  • Request a breakdown of each vendor’s tools. Teradata has the right tools to help customers derive maximum value from their data. For example, Teradata Viewpoint is for system management, and Teradata Visual Explain graphically represents queries and can be used to compare multiple queries. By drilling down to the details such as database object definitions, data demographics, parallel optimizer costs and other factors, Teradata Visual Explain literally shows application developers, DBAs and users how to construct the best queries.
  • Verify advanced database capabilities. Certainly an appliance must be able to deliver stellar load rates and scan performance. But what do you do when a specific performance need arises? While advisable for a data warehouse appliance, tuning is not significant during a benchmark. Before finalizing your selection, examine the tools available for tuning the appliance for new tasks or squeezing out better performance of existing applications. Teradata’s simple “Set & Go” indexing techniques provide dramatic performance advantages for certain query types with minimal effort.
  • Validate time-to-value features. Time to value is always critical, especially for a data warehouse appliance. Tasks such as getting data loaded, files scanned, tables defined and indexes created take time. Teradata offers a range of modeling and load services, including a “Load N Go” service to get your data loaded and query-ready within 30 days of system delivery.

These are just a few capabilities that should be considered when an organization is selecting a new appliance. When carefully executed, benchmarking is a valuable tool to discriminate among competing technologies. Ultimately, success depends upon selection of a proven technology to meet your business needs.

—Gene Erickson

Knowing beforehand precisely what you want the appliance to do and how well you want it to perform will help you better evaluate the benchmark information. First, verify each vendor’s results from the different queries. If production data is being used in the testing process, the vendors’ query results should match the pre-existing answers.

Also ask the following: Were all of the queries run in the same manner? That is, did each vendor test query X under identical conditions? Were identical multiple queries running with same number of concurrent users against identical data volumes? Was the data denormalized for some subset of the workload?

To assure consistency and fairness among the vendors, inspect the system’s logs for each run. Do not accept performance results verbally or via PowerPoint. Review the logs to verify that all data was loaded during the test. If it was not, and if the appliance ran the query against a subset of data, not only would the response time artificially benefit the particular vendor, but the false numbers would also undercut the benchmark process, rendering the results useless. Vendors frequently run multiple system configurations, which can make comparisons confusing. In all cases, reported results need to identify the specific system configurations and test conditions.

Q: Do you have any other words of advice?

A: Verify with the vendors’ customers the reliability of their companies’ benchmarking results. Any serious player in the data warehouse appliance market will have customers willing to discuss their experience with the benchmark process and the appliance.

Executing a controlled benchmark correctly is expensive and time consuming, but it is a worthwhile process to make good investment decisions. Working with each vendor to assure that the benchmark is rigorous and consistent will provide results that enable you to select the best data warehouse appliance to suit your business needs.


Your Comment:
  
Your Rating:

Comments