Ade Atobatele
The Race to Real Time: Operationalizing the Data Warehouse
Without careful attention to new challenges such as e-commerce and CRM, your company may be in danger of losing the competitive edge your data warehouse provided when it was new. Heres how to make sure your data warehouse gives you years of happy returns.
The Active Lexicon
Many companies that have implemented data warehouses are pleased to discover that the implementation saved them millions of dollars. Over the years, theyve integrated the data warehouse so deeply into their business processes that its become a mission-critical system. But as e-commerce brings a new set of business challenges to the fore, its not enough that the data warehouse is giving you the ROI you expected when you built it. Its time to ask, What has the data warehouse done for me lately?
The demands of e-commerce are changing customer expectations and presenting a host of new challenges. And the success of Amazon.coms simple cross-selling technique has made the potential clear. Knowing your customer well enough to intelligently offer additional products and services can increase sales and loyalty.
You may already use data mining to discover propensities among your customers. But building and testing models can take weeks. The new challenge is to assess propensities, recognize your customers, and act on the information you know about them in real timewhether via the Web, phone, or mail.
A real-time data warehouse is one that is updated in real time rather than periodically (usually overnight or weekly). Teradata, with its parallel processing power, can support real-time data warehousing, but the technology isnt the only element needed. You also need an organizational structure that enables you to take advantage of the technology and applications that are designed to feed information back to your data warehouse in real time.
While a complete, real-time enterprise data warehouse might still be the Holy Grail, there are some things you can do to enable your data warehouse to react in real time to changing customer needs, supply chain demands, and financial concerns. The solution involves creating an operational data warehouse (ODW). An ODW facilitates proactive problem management by providing mechanisms for creating just-in-time (JIT) systems and for feeding results from these systems back into the data warehouse. It provides a data warehousing infrastructure that can address the security and data integrity needs of the organization while giving users the flexibility to turn on a dime when new business opportunities arise.
Take, for example, the case of a financial institution in the UK that counted many farmers among its customers. On a Friday evening in March of 1996after everyone had gone home for the daythe British government announced that as many as 10 people had died as a result of eating beef from cows infected with bovine spongiform encephalopathy, more commonly known as mad cow disease. Anticipating the financial difficulties farmers would face as a result of the international panic, bank employees needed to make careful decisions about agricultural loans in order to minimize the banks risk. To do so, they first had to identify all the farmers that actually had loan portfolios with them and what kind of herds each owned. That information, easily gleaned from the data warehouse, made up 95 percent of the information they would need to create a risk-management system. The other five percent consisted of new information, such as any government compensation to affected farmers, the likelihood the farmers would be able to continue to pay their loans, and so on.
In order to react quickly enough, the bank employees had to create a new systemseparate from the banks traditional loan management systemto identify the specific farmers likely to default and notify branch managers Monday morning to make a decision about what to do with their portfolios. Although 95 percent of the data in the system was drawn from the data warehouse, the bank could never have predicted needing to combine that data with the new five percent of data specific to the mad cow announcement.
And thats a benefit of the operational data warehouse. It allows you to create and populate new systems, based on the data in the warehouse, in a very short timeframe (less than 24 hours). Another key real-time capability that the ODW enables is active data warehousingthe ability to react to warehouse information in real or near time rather than waiting for periodic reports.
There are currently three main kinds of operational data warehouses that help create competitive advantage and ROI:
1. Customer: This category contains two main strategies: product profiling and customer profiling. Made popular by Amazon.com, product profiling allows you to cross-sell related products in real time by querying the data warehouse product subject areas for related sales opportunities. E-businesses also use statistically determined customer profiles to personalize the online customer experience, aided by the services of companies such as Personify. Others are using profiling in help-desk environments to understand what type of customer is being serviced and what level of help they might need. It can also be used to market items that expire, such as last-minute airline tickets and hotel rooms, to suitable customers. Other companies in this market space include Vignette, which deals up relevant content based on user profiles, and BroadVision and Verix (now owned by Inference), which operate in the customer relationship management (CRM) market to provide real-time, predictive customer help based on data mining-generated user profiles.
2. Supply chain/manufacturing: A number of companies are using aggregated customer information to provide infomediary services, acting as a middleman between sellers and buyers. Such companies do this by aggregating the needs of either the buyers or sellers (or, if they are really proficient, both) and negotiating those needs with the other party. For example, one UK utility uses infomediary services to help manage its power-generation and selling operations to the national electricity grid. As both a generator and purchaser of electricity, the utility had a unique problem: how to maximize electricity generation and minimize electricity purchase. By aggregating its consumer information with national consumption needs to create consumption profiles, it was able to both manage the generation of electricity from their own plants and minimize the need to purchase more costly electricity from the national grid.
3. Financial/asset management: A financial institution responding to a situation that exposes its loan portfolio to an unforeseen risk, such as the mad cow disease mentioned earlier, is a good example of using the data warehouse to manage financial assets. Fraud detection is another popular use. By using data mining-generated customer profiles to detect possible fraud, a credit card company can limit its (and the customers) exposure. Often, the profile is a first step filter that can reduce the amount of computing time needed to pinpoint fraud.
In each of these examples, the data warehouse may be used for more than one kind of operation within an organization. The data in the warehouse is mission-critical, but the applications built to put that data to use may or may not be. (The longer an application is used, however, the more likely it is to become mission critical.)
So how do you create an ODW to enable JIT systems and active warehousing? Well if youve already got a data warehouse that supports timely access to detailed data in place, youve got a potential advantage: The ability to store and analyze enormous quantities of information about your customers. But to exploit this advantage when unexpected business opportunities arise, youll need several additional elements, including:
An infrastructure that serves a multiapplication environment
A problem-solving approach
A metadata system based on current standards
People with vision.
MULTIAPPLICATION INFRASTRUCTURE
The ODW represents an evolution from traditional data warehouses. In the ODW, the infrastructure acts like the central nervous system, coordinating access to information in all parts of the organization. Most data warehouses are so tightly integrated with the application they serve that there is no distinction between data and function. In the ODW, built with a data-centric rather than an application-centric approach, the data warehouse serves up data to individual application functions.
This structure allows the data warehouse to be a multiplication environment, in the same way that an operating system is made up of a kernel with many functional parts that depend on it. It also provides future-proofing by separating data from function and allowing companies to proliferate JIT applications. Why are JIT applications so important? Because they free your employees to take advantage of standardized data in an unstructured way.
A client company of mine, which sold mobile phones, provided an example of the importance of this flexibility. An employee fell sick, collapsed on the clients premises, and had to be rushed to the hospital. When company staff accessed the personnel files to find the employees next of kin, they discovered that the employee, who worked in the personnel department, had wiped this information from the database for some reason. Coworkers knew that the employees family lived in the area and that they had purchased mobile phones from the company. Sales information was kept in Lotus Notes, while personnel data was kept in an application written in-house. Fortunately, the clients ODW infrastructure allowed us to feed all information from both systems into a centralized data warehouse. By interfacing the sales data in the data warehouse to the personnel data, we tracked down the two most likely candidates in less than five minutes. This solution, delivered in Microsoft Excel, was never used again. (The company later rewrote its personnel policy to make withholding such data an offense that would lead to job termination.)
In the ODW, the data warehouse interface acts as a central source to supply information to operational systems, operational data marts, and analysis tools. It does this by providing a standardized set of protocols to enable access. Regardless of how a third party tool stores information internally, it can access warehouse data through this interface. Both rapid application development (RAD) tools, such as Microsoft Access and Lotus Notes, and other data analysis tools can obtain a single version of the truth through this interface. Being able to access the information in the data warehouse from these types of tools is the key to a successful ODW.
Instead of each tool defining profit, for example, the tools draw on a common definition of profit from the metadata repository. This ensures that the value of profit used with RAD tools, such as Microsoft Access or Sybase PowerBuilder, will be the same as the value of profit used by any other tool. Of course this can already be done in a centralized Teradata environment. But the whole point of the ODW is to use Teradata as a centralized warehouse while allowing departments to create JIT systems or data marts controlled from a centralized environment, as needed.

Figure 1. Operational data warehouse infrastructure.
The main components of the ODW infrastructure are (see Figure 1):
- The operational systems,including current OLTP feeder systems. These systems handle transactional information in the organization and can include trading partner databases or demographic data purchased from third parties.
- The data warehouse,specifically a normalized, centralized data warehouse that contains detail-level datanot aggregated OLAP dataand is not application specific. This component can provide data to operational, application-focused data marts or JIT systems that solve a single business problem or single business problem category for a whole department. This could be a marketing, human resources, or sales data mart or a JIT system to facilitate last minute ticket sales. Teradatas parallel processing features allow fast joining of large tables and queries with many tables in a join, allowing it to populate data marts from the data warehouse very rapidly.
- The OLAP generator, a module that creates OLAP data, such as star schemas, multidimensional databases, hypercubes, and TeraCubes, for individual analysis tools. The ODW allows for a decentralized data warehousing environment, where analysis tools like Cognos PowerPlay and MicroStrategy DSS Agent are common. The OLAP generator produces the particular type of data storage these products need.
- The metadata repository, which stores the analysis tool profiles that allow the OLAP generator to produce OLAP data for individual tools, the business definitions used by the business information directory (BID), which Ill discuss in more detail, and information on data warehouse changes over time.
- The programming API, an OLE automation interface into the data warehouse that allows any RAD tool with a programming language to interface directly to the data warehouse in a consistent manner.
- The tools interface, which allows the individual analysis tools to interrogate both the OLAP data marts and the normalized data warehouse. Like the programming API, this interface allows third-party tools to access the data warehouse without regard to the format the tools use to store the data internally.
- The business information directory (BID), which allows users to navigate the data warehouse in a context-sensitive mode similar to the Help function in Microsoft Windows. It delivers data in the correct format for the desktop application or departmental server that requested it. It also manages the scheduling of the data delivery, ensuring that the concerns of the data center operators are taken into account.
Once you have the enabling infrastructure in place, youre ready to support JIT systems. The steps to build such systems include:
1. Determining the business problem at hand. In our mad cow example, the problem was how to manage the banks loan portfolio risks that arose from the BSE scare.
2. Profiling what data the target application will require from the data warehouse (for example, finding out which cattle farmers held outstanding mortgages).
3. Determining what attributes of this target data set are needed to solve this specific business problem. In our example, attributes included outstanding mortgages, repayment schedules, and so on.
4. Creating a data mart populated with these attributes. (Remember, a data mart can be 1MB or 20GB, as long as it contains the data sets needed to solve the problem.) The data marts physical locationwhether within the Teradata database or notis less important than the fact that it serves only this application.
5. Determining what new information (that is, data not available in the data warehouse) is needed to handle the problem (for example, herd size, number of cows affected by BSE, and amount of government compensation).
6. Creating the new JIT application. This involves creating data entry screens and validation routines for the new data using a rapid application development tool.
7. Interfacing the new data mart with extract and transform tools. This step ensures that any newly captured or calculated data is automatically fed back into the data warehouse and prevents an island of data from forming.
8. Using the JIT system to manage the problem (in this case, to identify farmers most likely to default).
9. Throwing the system away when the problem is solved. At this point, the system is just the data entry screens and any reports that may have been written. All new data has already been migrated into the data warehouse (see step 7). Of course, you may find in the course of using these systems that they have value beyond the immediate problem. If thats the case, you can make them available to solve future problems.
If the problem isnt solved, youll have to create a more robust production system. For example, a spreadsheet is probably a single user application. If you want to roll the application out from the head office to the overseas branches, you enhance the JIT application, using the original application as a template. (In the case of a spreadsheet, you can rewrite the application using a spreadsheet tool that allows multi-user access.) You may also want to apply traditional protocols, such as version control, to its development.
PROBLEM-SOLVING APPROACH
Much data warehousing takes place at a departmental level to implement a solution to a particular business problem. For example, every company must keep in mind this simple, but vital, equation: profit equals sales minus costs. An airline, for example, might approach the problem of how to increase profits by looking for ways to reduce the number of empty seats on flights. Now, this problem affects the organization as a whole, but one department probably has responsibility for it (in this case, sales or marketing).
This departmental approach, however, promotes the development of islands of information that block the synergies that can arise from a single corporate view. Instead of islands, the ODW approach creates temporary peninsulas of information connected to the data warehouse but individual in their function. JIT applications provide the solution.
JIT applications are created and used to solve a particular problem. Then, any information the system captured during its lifetime is fed back into the data warehouse to be used for analysis or for other applications, and the JIT system is thrown away.
One of my clients, who had a system for monitoring phone sales, did just this. One day, in going over a daily exception report generated from the data warehouse, he noticed that several customers werent getting value for their money. He then used Lotus 1-2-3 to create a spreadsheet from warehouse data that identified all the customers that were on the wrong usage plan. He then used Microsoft Word to generate letters to the customers offering to change them to a more appropriate plan. The customers were happy because they thought that their service provider was looking out for them. The provider was happy because he kept his customers. After a month the Lotus 1-2-3 system was thrown away because the main sales and invoicing system was altered to ensure that customers were offered the right usage plan based on their last three months of mobile phone usage.
The first thing you have to do when creating a JIT system is to throw away all preconceptions about what applications are. Get rid of the misconception that only applications built using the traditional waterfall method or with robust tools like COBOL or CICS are computer applications. JIT systems deliver solutions using rapid application development tools such as spreadsheets, personal databases, and third-party applications.
Line managers and others responsible for producing sales (in other words, profit) will tell you that they dont want software, they want tools that give them a competitive edge. And they want them when they need them, not after a long, drawn-out application development cycle. They usually dont have preconceived notions about what will deliver that advantage, they simply want to retain or attract customers, increase sales, and so on.
With the desktop proliferation of office suites (such as Microsoft Office and Lotus SmartSuite) and collaboration tools (such as Microsoft Exchange and Lotus Notes), users already have most of the tools that they need at their fingertips. These tools can be augmented with additional functionality on an as-needed basisnot everybody needs a data-mining tool on the desktop.
A JIT application is not a language, operating system, and hardware; its a simple problem-solving application. A JIT can be anything that actually gives you that competitive advantage, including a Microsoft Excel spreadsheet, a Microsoft Access application, a Lotus Notes database, a Web page, or an off-the-shelf package populated by information from the data warehouse. A JIT system is characterized by a development speed measured in hours rather than weeks, and it produces results now rather than later.
A STANDARD METADATA SYSTEM
To date, metadata standardization has not been very successful. While the Metadata Council has produced admirable standards, very few vendors have actually implemented the standard in their tools.
Once a standard is in place that vendors will comply with, youll be able to easily populate tools from different vendors with identical information from your metadata repository. In the meantime, companies have to use proprietary standards that do not allow data migration from one vendors tool to another.
Metadata standards will help prove the benefit of the ODW. In a matter of hours rather than days, a department that once had no idea they needed to use the corporate data warehouse would be able to create its own data marts sourced from the centralized warehouse, do its own modeling, and apply the system to their own profit equals sales minus cost business problems.
A business information directory (BID) is an integral part of the metadata repository and, indeed, the data warehouse infrastructure. It is the users interface to the data warehouse or, in other words, where customers go to browse and buy information.
Building a BID, like any user interface, can be a torturous undertakingespecially when there are no BID user interface standards. But as BIDs take on the features of popular search engines, the process will become easier. The widespread adoption of intranets and the inclusion of XML facilities in Microsoft Explorer and Netscape Navigator will mean that you can use XML to create BIDS that are essentially beefed-up, XML-based search engines.
XML might even help prove the value of metadata. People are only beginning to understand what metadata is and what it can do for them; XML can make it even clearer. Once you can present metadata on an HTML page, it becomes less esotericeven users less technical than those who created the repository will be able to understand what the metadata is and how they can put it to work.
PEOPLE WITH VISION
IT managers with a vision of what access to the data warehouse can do for line managers are crucial to a companys success. For example, an IT manager at BankBoston created a data access-enabling environment based on a corporate data warehousing structure.
After attempting to track down reasons for an increase in the stock price, his board came to the conclusion that he was responsible for several points of the increase. Several financial analysts had been recommending the stock based on the flexibilitywhich would allow the bank to move on opportunities and threats must faster than the competitionthe data access-enabling infrastructure offered the bank.
A successful ODW requires IT managers who understand the importance of giving users access to the data and letting them solve their own problemsletting risk managers create new risk simulations and assessments, letting marketing personnel find new market segmentations from existing customer databases, and letting salespeople make last minutes offers to the proper customers.
And for those problems users cant solve on their own, theres always the traditional application development cycle to fall back on.
THE BOTTOM LINE
The objective of data warehousing should be to produce an information infrastructure that provides the right information to the right people at the right timeand at the right cost. The ODW architecture is an enterprise data warehouse architecture for companies that are serious about reaping the rewards that data warehousing can bring by enabling both quick, short-term problem solving and effective long-term analysis. The architecture provides organizations with a single version of the truth, while allowing the creation of peninsulasrather than islandsof information in tune with the corporate view. It enables the incremental deployment of subject-oriented data marts, which are developed and deployed according to well-identified business needs.
When implemented by IT managers who understand how freeing the data locked up in their companies can help the bottom line, the ODW facilitates new types of information delivery applications, including management information systems, trend analysis and data mining, customer profiling, and consolidated marketing systems.
Thats got to be good for the bottom lineand thats what counts above all else.
Ade Atobatele is principal data architecture consultant with Tridas International Ltd (www.tridas.com), where he specializes in data warehousing and knowledge management projects. You can reach him at ade_atobatele@hotmail.com.
The Active Lexicon |
active data warehouse \'ak-tiv 'da-t 'war-haus\ A data warehouse that uses triggers and stored procedures to react to information in real- or near-time rather than waiting for periodic reports.
bovine spongiform encephalopathy (BSE) \'bo-vin 'spnj-i-form in-'se-f-'lä-p-the\ More commonly known as mad cow disease. After the deaths of several humans were linked to the consumption of beef from BSE-diseased cows in 1996, more than 2 million head of cattle were killed in the United Kingdom.
just-in-time (JIT) system \'jst 'in 'tim 'sis-tem \ A JIT is a simple problem-solving application that can be anything that gives competitive advantage, including a Microsoft Excel spreadsheet, a Microsoft Access application, a Lotus Notes database, a Web page, or an off-the-shelf package populated by information from the data warehouse. A JIT system is characterized by a development speed measured in hours rather than weeks, and it produces results now rather than later.
operational data warehouse (ODW): \ä-pr-'ra-shnl 'da-t 'war-haus\ A data warehouse whose architecture and infrastructure facilitates proactive problem management by providing mechanisms for creating just-in-time systems and the feeding of results from these systems back into the data warehouse.
|