Register | Log in


Subscribe Now>>
ARCHIVE: Vol. 7, No. 1
Home News Tech2Tech Features Viewpoints Facts & Fun Teradata.com
Applied Solutions
Download PDF|Send to Colleague

From information to insight

Make informed predictions with enterprise analytic data sets.

by Bill Franks

Using statistical, mathematical and additional algorithmic techniques, advanced analytics is about building models and executing analyses to determine which actions will drive the optimal outcomes. The resulting recommended actions, along with supporting information, are delivered to the systems or people that can effectively implement them.

Often referred to as predictive or behavioral modeling, these efforts help businesses make better decisions by providing an empirical, objective and consistent method of evaluating transactions, accounts or customers. Through these evaluations, the activity on accounts or behavior of customers is predicted to determine, for example, the likelihood for the accounts to become delinquent, or which customers might purchase certain products.

Figure 1: Building and using an ADS
enlarge
An enterprise analytic data set (ADS) standardizes the processes that make up 60% to 80% of the total time and effort required to build a model or analysis. The solution focuses on reducing lengthy data preparations.

The traditional analytic methodology
Analytic models require that the data being modeled is presented in a particular format. This analytic data set (ADS) contains all of the attributes of the subjects being modeled (often customers, households or accounts) in a single row of data.

To generate the ADS, past data is assembled to identify subjects that did, or did not, have a behavior of interest; for example, customers who have terminated their cellular phone service in the last six months. Other behavioral attributes and demographic data are added, creating a wide, single row of information for each subject. These data elements make up the ADS. An analytical modeling tool can then determine how the characteristics combine to best predict and/or describe the behavior of interest. Traditionally, an ADS has been built for each model as part of the process of generating that model. However, as this article discusses, there is an attractive alternative.

Advantages of an enterprise ADS
As models are built over time, certain standard metrics and manipulations become readily apparent. At the same time, any required cleansing or recoding of the detailed data required to facilitate such rollups will become constant once the right analytic procedures are established.

The enterprise ADS architecture takes the standard data rollups that are used in a variety of analytic tasks and centralizes their generation through a series of tables, views and processes. (See figure 1.) Instead of each analyst or process incorporating all of the logic and consuming all of the processing time needed to derive data for each analysis, standard metrics are created in an automated fashion on a regular schedule and made available to all analysts and processes. This eliminates the redundancy of re-creating these metrics each time the analysis is run. Any entity that is the focus of a wide range of analytics is a candidate for this solution. Examples include Customer, Location, Product, Employee and Vendor.

The advantages of this methodology are clear:
Consistency is assured in the methodology used by various analysts and processes to generate their ADSs. In addition, chances of errors due to the omission or altering of appropriate logic are removed.
Placing models into production is easy, since the same data structures used for building the models contain all of the data needed to deploy them.
Overall system processing cycles are greatly reduced now that variables requiring repeated processing will be computed just once and then stored and shared, rather than being run again and again.
Once the information is available, new uses for the data will be found that were not previously practical. For example, results can now be incorporated into standard reports.
Analysts can focus on more value-added work, rather than repeat the same basic preparatory tasks.

Eliminate data duplication challenges
Perhaps the single biggest enabler of an enterprise ADS is the ability of a Teradata Database to process massive amounts of data in a scalable and timely fashion. Many other systems are not capable of handling the degree of data manipulation, complex joins and full-table scan processing that are required for the generation and updating of such a data set. For this reason, many organizations using these other architectures regularly pull data off of their systems and process it with an external tool such as SAS. This approach introduces a wide range of challenges, including the need to:
Develop procedures for pulling data off of the host system.
Possess the required network bandwidth to execute the transfer, as well as the required disk space on the receiving system.
Create a duplicate copy of core data, which can quickly become obsolete. This copy can also develop its own data quality and integrity issues.
Frequently rely on samples—as opposed to using all the data—due to the architecture's inability to scale.
Replicate in the original environment any results found in the extracts.

When leveraging the Teradata Database for analytics, these challenges are eliminated. Using Teradata Warehouse Miner or hand coding Teradata SQL makes it possible to explore the data and generate all of the logic required for an enterprise ADS directly in the Teradata Database. Teradata Warehouse Miner eliminates the need to extract even the final data set from the system for the majority of common analysis and modeling techniques, including decision trees, regression, factor analysis and clustering. These common techniques can be executed directly against the enterprise ADS in the Teradata system without moving any data.

Time to update
Once an organization's rules for processing data have stabilized, and the same core set of metrics becomes common, it is time to consider an enterprise ADS. Leveraging the enterprise ADS architecture makes sense for organizations with a strong history of analytics and a solid understanding of their underlying detailed data. When a given data source is first analyzed, it is necessary to identify and account for data anomalies, which can be a somewhat lengthy process. As time passes and more projects are completed, however, the data becomes better understood, certain metrics begin to repeatedly appear, and many of the same computations begin to surface again and again.

Having a centralized, standardized repository of key metrics like an enterprise ADS will add both consistency and speed to analytic efforts. It does not take long for the amount of analysis to quickly rise to the point where this architecture is valuable. For example, an organization implementing new customer relationship management initiatives will most likely expect to execute many sophisticated analyses on their customers.

Even if an organization strongly feels that only real-time or near real-time data should be analyzed, the architecture�s concepts still apply. In these cases, a series of views, macros or stored procedures physically create the standardized data sets at run time. Most advantages of an enterprise ADS are retained; the only advantage lost is the "compute once, use many" aspect.

Market leaders in a wide range of industries have begun implementing this architecture. When they begin executing more analyses as a result, they will further distance themselves from their competition, forcing other organizations to play catch-up. Rarely will there be complaints about using less resources and time to generate more value. T

Enterprise analytic data sets in action
enlarge
For organizations not utilizing an enterprise ADS, the time to build and deploy models can take from weeks to months.


enlarge
Deploying additional models from the enterprise analytic data set (ADS) can reduce the time to build and deploy models to as little as a few days.

> PhoneA major cellular phone company creates a 450-variable customer enterprise analytic data set (ADS) in its Teradata Database, cutting the development of new models from weeks to days.
> A leading financial services company utilizes a Teradata enterprise ADS with 1,400 variables. This enabled its employees to complete an emergency analysis on their financial exposure to Hurricane Katrina within hours, while their competitors took weeks.
> A well-known retailer has a Teradata enterprise ADS with 1,200 variables. The data set was implemented as one component of an initiative that, in most cases, shortened model development from weeks to days.
> Phone A major Internet player maintains a large enterprise ADS in its Teradata Warehouse that allows the company to access the data needed for new models within hours, and to develop response models for new campaigns within days of execution.

Figures A and B show the benefits of deploying an enterprise ADS, such as a shortened timeline in figure B, and could be applied to any of the above scenarios.

Bill Franks, Partner, Teradata Advanced Business Analytics, oversees the Teradata Advanced Business Analytics Center of Expertise, assisting Teradata's retail, travel, hospitality and gaming customers with applying analytical and data mining techniques to their businesses.

Teradata Magazine-March 2007

More Applied Solutions

Related Links

Reference Library

Get complete access to Teradata articles and white papers specific to your area of interest by selecting a category below. Reference Library
Search our library:

Teradata.com | About Us | Contact Us | Media Kit | Subscribe | Privacy/Legal | RSS
Copyright © 2008 Teradata Corporation. All rights reserved.