Teradata Magazine Cover Teradata Magazine Online  
Register Help Password
Password:
Quick Links
Current Issue
Archives
Teradata.com
Teradata Magazine Rss Feed
ARCHIVES Search Teradata Magazine Online:  

Richard Winter

B2B Active Warehousing: Data on Demand

Bringing the data warehouse out of the back room and onto the front lines helps companies keep up with the blistering pace of business-to-business e-commerce.

Checking in over the weekend, a seminar manager discovers that an eleventh-hour surge in Web registrations means she needs 100 more binders than she has, along with all the materials to fill them. In a last-minute shopping trip, she finds 100 binders in the same size and color, but wipes out the stock at two branches of an office supply superstore.

The seminar leader solved her problem, but now the office supply stores have a problem: They're out of stock in those particular binders. And one of the main reasons customers go to superstores is to find everything they need in one trip.

But the stores aren't out of stock for long. A system at the chain headquarters reorders the out-of-stock product automatically within hours of the seminar manager's purchases. How? The transactions reached a central, active data warehouse shortly after they occurred. The warehouse then updated its store inventory model with the results of the transaction and triggered the reorder. Today, chances are that the supplier works under an agreement to restock within hours and that all communication concerning the out-of-stock item will occur over the Web.

Welcome to the world of business-to-business (B2B) active warehousing. To much of the business world - in which managers laboriously make and implement decisions one-by-one on the basis of week-old, month-old, or just plain untrustworthy data - this may sound too good to be true.

But the scenario I just described combines elements of systems in operation at several different retailers. And it's typical of the active, B2B warehouse applications in place and under near-term development at companies in many different industries. Let's look at another example.

Returning to the office after a meeting, a brand manager at a manufacturing company receives an alert from the data warehouse indicating that a product introduced last week is selling below expectations. Drilling down into sales by channel, he sees that the problem is mainly in Web-based sales to distributors. Examining the Web page that introduces the new product, he discovers that the introductory price is incorrectly stated: It doesn't show the 10 percent discount for distributors who order within the first two weeks.

He returns to the data warehouse application in which the introductory price was specified and corrects the error. The correction is stored in the warehouse immediately. Within minutes, the production Web page is corrected. A few minutes later, the change is picked up dynamically from the manufacturer's site by leading distributors.

After lunch, the brand manager checks his sales figures again and notes with satisfaction Web sales to distributors have picked up.

Although the systems described sound futuristic, they're based on real systems now in operation - and some that will debut later this year - at 3M Corp.

To better understand what an active warehouse can do, let me first describe what an active warehouse is.

Active Warehouse

The term "active warehouse" has its roots in the "active database" concept that first surfaced in computer science research circles in the 1970s. Certain early thinkers in artificial intelligence felt database engines should do more than wait around for people to submit queries. As useful as traditional database engines may be, they are passive entities. These early thinkers argued that, because the database engine controls a huge repository of information and the metadata to interpret it, the database should be able to figure out what people need to know and tell them. In other words, they felt the database engine should become active.

This notion that an active database engine could figure things out and tell people before they ask led to several important commercial developments. The first of these was the formulation of database triggers: Triggers are essentially database requests kicked off by the engine itself when a certain event occurs. For example, you can define a trigger that causes something to happen when the inventory falls below a given threshold. In database terms, this event occurs when the value of the quantity-on-hand field in the inventory record falls below the reorder quantity field for that item. The database engine can detect the event by checking whenever either of these fields changes. Both the condition for detecting the event and the action to be taken are readily expressed in SQL.

Triggers as I just described them (mechanisms that require database engine work every time a value in a given column is changed) are inefficient if values change frequently and you want to detect infrequent but important events. As the concept of the active database has evolved, triggers have often been supplemented or replaced by other mechanisms for event detection. In some cases, a query running at designated intervals could be more efficient. For example, suppose the quantity on hand, by product, by store in a large chain changes 100,000 times a minute, and you want to detect an out-of-stock event within 10 minutes of its occurrence. If a query that determined whether any critical product had fallen below its reorder point in any store could run to completion in one minute, event detection could be accomplished by running that query every nine minutes. This approach is much more efficient than checking one product at a time by means of a trigger, which would have to check a million times in the same period to accomplish an only marginally more valuable result.

Probably the best known outgrowth of the active database notion is data mining, the proposition that deep analysis of the data by automatons can produce new knowledge of value, simply by analyzing patterns in the data. Data mining has contributed to impressive developments in such areas as store layout, credit-card fraud detection, and cross-selling. Although data mining isn't usually initiated automatically, it's based on the idea of using the knowledge in the database to determine what should happen, rather than waiting for a human to generate SQL requests.

But the offshoot of the active database causing the most excitement right now is the active warehouse, a concept in which active database technology supports a business organization that reacts very rapidly to changes in its environment.

Stephen Brobst, NCR's chief technology officer, articulated the active data warehouse concept in an article in the Fall 1999 Teradata Review: An active warehouse is event driven, reacts in a timeframe appropriate to the business need, and makes operational decisions or causes operational actions. (See Figure 1.) Each of the examples I mentioned illustrates this concept in the framework of B2B interaction.


Figure 1: B2B Interaction Framework.

The examples show the breathtaking business impact active data warehousing provides. But before embarking on an active warehouse project, it's important to understand that the technical requirements are profoundly different from those of traditional data warehousing. Let's look at the requirements for each type.

Traditional Data Warehousing

The concept that ignited the data warehouse boom in the early 1990s, strategic decision-making, was built on the idea that you could make better strategic decisions as a consequence of a data warehouse program. Strategic decision-making was dependent on these concepts:

  • A central repository
  • Comprehensive data
  • Data successfully integrated from multiple sources
  • High data quality
  • A single version of the truth
  • Well-organized tools and applications for decision-making
  • Data, data definitions, and so on accessible to authorized users
  • An infrastructure for better decision-making
  • A platform for implementing analytic applications.

Many businesses implemented these concepts by creating a data warehouse operation with a huge volume of data and a relatively small number of users. The users are usually corporate staff people: strategic, marketing, and financial planners who need to do extensive analysis, modeling, and forecasting based on historical data. This approach is extremely useful and has helped many companies to recognize trends, set priorities, allocate resources, segment markets, and take other important actions that significantly impact business performance quarter-to-quarter or year-to-year.

Timeliness, however, is one concept that is missing - or at least not much emphasized - in the traditional data warehouse. The data warehouse program usually focuses initially on analysis and decision-making for which monthly or quarterly updates are sufficient. There are several reasons why. First, in many organizations, the data warehouse is huge. In the early years of a data warehouse program, it's hard enough to get the immense volumes of data extracted, cleansed, and integrated on a monthly or quarterly basis. More frequent updates seem overwhelming. Second, there are useful functions - mostly in the strategic planning and decision-making areas - that were not being performed well before the advent of the data warehouse and that could be done much better if the data were simply comprehensive, well integrated, and correct. And, third, most platforms could meet the requirements of data warehousing only if the definitions of the traditional concept were applied. For example, if you were to shift the concept of the data warehouse in the direction of incorporating new data on a frequent or continuous basis, many products used in data warehousing would be unable to meet this requirement while simultaneously handling large, complex queries. Thus, a "marriage of convenience" occurred between the initial business concepts behind data warehousing (strategic planning and decision-making) and the capabilities of most database engines on the market (which are not friendly to simultaneous update and complex query).

Active Warehousing Requirements

Active warehousing works on very different concepts from traditional data warehousing and imposes very different requirements on the database platform. The complexity of analysis and data access required places the typical active data warehouse query outside the scope of most transaction-processing systems. Similarly, the need to refer to large volumes of historical information (about purchase patterns, supplier performance, product quality, and so on) at the same time as new information (about stock levels, sales volumes, and imminently scheduled deliveries) is also beyond the scope of most transaction-processing systems.

Thus, the active data warehouse addresses a type of automated decision-making that has rarely been accomplished in the past. It is too complex and far reaching in its data needs to fit within the picture for operational systems; it is too time sensitive to have been attempted in most data warehouses. And yet it is emerging as the critical factor in enterprise performance as e-business takes hold and pervasive computing - including wireless and mobile devices - promises to speed the pace of many businesses even further.

Key concepts for active warehousing include:

  • Timely data. Updates must occur at least daily (and often hourly or continuously) because the data is used for operational and tactical decisions. These decisions depend in part on what happened in the last 24 hours, the last few hours, or the last few minutes.
  • Operational decision-making. By exploiting the full detail of timely information in the active warehouse, operational decisions can be made faster and more effectively. Note that the operational decision-making enabled by the active warehouse is different in character from the decision-making supported by the typical transaction-processing system. The active data warehouse supports a new class of rapid decision- making that has two defining characteristics: complexity and integration of new information with the full detail of older or broader information.
    And these decisions can be pushed out of headquarters into the hands of the people closest to the action: groups such as store associates, truck drivers, or customer service agents. These employees often make quick, narrow-scope decisions; they need fast responses to repetitive queries, and they depend on the system minute-to-minute. By giving these operational groups decision-making capability, businesses are enabling those closest to the situation to be acted on to make better, faster decisions. Consequently, they feel more in control and more motivated. A more motivated employee or agent contributes to better customer service, increased sales, and higher manufacturing productivity.
  • Event-driven operation. Many activities in the active data warehouse and its dependent applications occur when a particular event takes place in the business environment rather than when it is convenient in terms of the production schedule of the data center. The data warehouse must detect these events by means of such facilities as database triggers or periodically launched event detection queries. Having detected the events, the data warehouse must then stimulate appropriate responses: The data warehouse alerts either a person, who takes appropriate action, or a business system, which takes the action automatically. Thus, in the Web-based distribution example I described, the brand manager is alerted via email that the new product is not selling in accordance with expectations. In the stock replenishment example, the data warehouse simply sent a message to a system that automatically placed an order to replenish. In a case in which the action to take is obvious, it makes sense to leave the human decision maker out of the process. It is more important to get more binders on the shelves quickly than to fine-tune the decision with human judgment.
  • Detailed data. Most active data warehouse implementations require interaction with the full detail of the data (often in a nearly operational time frame): Store replenishment requires specific stock keeping units (SKUs) in specific stores; revenue problems require information about specific products, stores, and distributors; and so on. Such operational and tactical decisions can't be based on summary data alone.

B2B Data Warehousing

In certain respects, B2B active warehousing involves complexities and challenges different from business-to-consumer (B2C) data warehousing. In general, B2B means more complex transactions, data flows, system interactions, and data relationships.

For example, let's look at a system in place at Wal-Mart. Wal-Mart's suppliers are responsible for managing store inventory for the products they supply. To enable this, Wal-Mart provides suppliers with access to the data warehouse, which enables them to see the complete history of sales of their products by store and day. Wal-Mart also provides the suppliers with analytical tools so that they can monitor their own effectiveness in having the right products in the right stores at the right time. As Wal-Mart moves forward with its active data warehouse program, plans call for suppliers' systems to be fully linked to Wal-Mart systems so that interaction between the organizations - on products, delivery, invoicing, payment, and so on - is highly automated. In this B2B arrangement, the retailer and the supplier enterprises will share complex links at many points; therefore, the data warehouse becomes the repository of a large and complex flow of information between the companies. This pattern differs from the interaction between an e-commerce site and its consumer clientele, in which the consumer side of the interaction ordinarily involves one person or one small group of people.

Customer relationship management (CRM) functions in B2B data warehousing differ from those in B2C uses as well. In the B2B setting, the notion of customer can be extraordinarily complex. Although consumer businesses face the challenge of understanding individuals, families, and households, suppliers to businesses deal with customer entities that may include hundreds or even thousands of related individuals. The number of individuals who are affected by, or involved in, a single transaction may also be larger.

Technical Challenges

Combining B2B environments with active warehousing produces a distinctive set of far-reaching technical implications (see Table 1). These include:

Large data volumes. Tactical decisions are often based on knowing the full detail of a situation. If an item is out of stock in a store, for example, it's important to know whether the item is being discontinued or should be restocked. Before placing an order, it's important to know the appropriate stock level for the item. If multiple suppliers are available, you should know which can deliver quickly. If multiple suppliers can meet the delivery requirements, you need to know who delivers at the best price, on the best terms. If there is a history of quality problems with some suppliers and or products, that must be taken into account.

Some of the necessary information involves recent transactions. Some must be extracted from comprehensive historical data. In an enterprise of any size, providing this information generally involves large volumes of data. And large volumes of data require excellent optimization, high parallelism, complex query planning, and good access techniques.

Complex data relationships. The many locations, individuals, and roles involved in a B2B transaction create a picture far more complex than even today's consumer households. And, in many businesses (for example, those extending credit) it is important to understand and analyze ownership relationships among apparently separate business entities. Although a consumer e-commerce transaction might typically involve the delivery of a few items to a single address, a B2B transaction could involve the delivery of 100 related products to each of 2,000 stores.

Large user communities. In active warehouse applications, it is not unusual to have thousands of concurrent users because decision-making is pushed out of the back room into the front lines of the organization. In fact, in supply-chain applications, decision-making is often pushed out into the virtual enterprise to distributors or agents, resulting in even larger user communities. Large user communities mean many concurrent users, large demand peaks and sudden swings, many concurrent queries in flight, and online query workloads that dwarf those of traditional data warehousing.

High availability. The active warehouse takes on some key characteristics of an operational system; therefore, the active warehouse usually needs to be available 2437. If thousands of call center operators depend on the system to make decisions while customers are on the line, then the system must be available to them continuously. The tolerance for downtime associated with some traditional data warehouse operations disappears.

High performance. In many traditional data warehouses, the principal performance requirement is really a throughput requirement. That is, a certain amount of analysis and reporting must be completed on a schedule - often an overnight schedule. In the active warehouse, thousands of people in operational jobs use applications to which interactive response is expected. Many of these requests must be serviced at the same time. Response time becomes very important.

Scalability. When a company undertakes an active warehouse program, it anticipates a series of active applications. Each of these will add workload, may add data, and may add a large community of users to the system. An ongoing ability to increase capacity is necessary and assumed. To support these requirements, the platform must exhibit data, workload, and user scalability over a wide range.

Frequent or continuous update. Continuous update means that the data in the data warehouse is updated to reflect events either immediately or shortly after they occur. For example, in a continuously updated retail data warehouse, store transactions would be reflected in the warehouse shortly after checkout time (rather than in daily or weekly batch updates).

Some active warehouse operations use continuous data updates today. Many companies doing daily updates for an active warehouse plan to go to continuous update (or something close to it) in the next year or so. An alternative to continuous update would be frequent update in smaller batches (for example, every 30 minutes).

There are two reasons for the move toward frequent or continuous update. First, it shortens the time between the occurrence of an event and action upon that event by the organization. Second, the batch window often used for large batch updates is disappearing. Because many businesses operate 24 hours a day, operational use of the active warehouse translates into a need for continuous availability.

Mixed workload. One of the most challenging technical implications of the active warehouse is the mixed workload. The data warehouse engine must handle multiple types of work at the same time: continuous updates; interactive requests to support operational users; and the complex query, analysis, and reporting requirements of the traditional data warehouse.

The presence of both short- and long-running queries places a burden on the data warehouse to schedule and prioritize effectively. The system needs to complete interactive requests quickly without inappropriately penalizing longer running requests.

The presence of both updates and queries places a burden on the data warehouse engine to complete work with a high degree of concurrency. For example, it cannot permit updates to lock out queries or reports for any extended period of time. In a system in which updates are always (or nearly always) running, locks held for any length of time will reduce the query/reporting workflow to a trickle. This is a routine challenge in an operational system, and it is one of the reasons that operational systems do a minimum of secondary indexing.

However, in large-scale data warehousing, some use of secondary indexes, join indexes, and other complex data structures is often essential. Because these structures must be updated on a continuous basis, active warehousing demands that they be updated in a way that does not inhibit continuing query. This is a requirement that few database engines can meet, especially in a situation involving thousands of users and large data volumes. But this is indeed what is required for large active warehousing.

Business Implications

Active warehousing has sweeping business implications. In the retail example I discussed, active warehousing enables store shelves to be restocked shortly after they are depleted. This increases customer satisfaction: If shelves are ever empty, they are empty for a shorter period of time. It also means a given level of customer satisfaction can be maintained with less store inventory, resulting in lower carrying costs. In the manufacturing example, active warehousing means more rapid and effective distribution of new products.

The speed and precision with which an enterprise reacts to events is key in each example. This is surely one of the defining characteristics of the 21st century enterprise: Even large enterprises can, with the appropriate use of technology and new business processes, react to events with astonishing speed and accuracy. When coupled with accurate and appropriate actions and messages, speed means competitive advantage. Speed means customer satisfaction. Speed means increased financial performance. And the active data warehouse is the key.

The active warehouse brings the data warehouse out of the back room and onto the front lines and matches businesses' pace to that of e-commerce - and of 21st century change.

Richard Winter is a specialist in large database technology and implementation, and president of Waltham, Mass.-based Winter Corp. (www.wintercorp.com). You can reach him at richard.winter@wintercorp.com.

Table 1. Technical requirements of B2B active warehousing.
Requirement Reason
Large data volumes Decisions require detailed data
Complex data Represent B2B relationships
Large user communities Support the front lines
High availability Moment-to-moment dependence
High performance Interactive response
Scalability Continuously growing usage
Frequent or continuous update For timely and accurate decisions
Mixed workload Update, interactive inquiry, reporting, analysis




Copyright by Teradata Corporation 2001-2007.