Active data warehousing—the ultimate
fulfillment of the operational
data store
by Dr. Claudia Imhoff
Over the years, data warehousing has gone through a number
of evolutions, from a relatively simple reporting database
to a collection of sophisticated applications capable of
analyzing customer lifetime values; conducting market basket
analyses; identifying potentially defecting customers, fraud
patterns and inventory churns; etc.
Despite the benefits, however, these static data sets could
not give us the most current and recent changes necessary
to act upon the results of business intelligence analyses.
For example, although we could identify a customer likely
to go to a competitor, we still could not view the specifics
of their current situation, such as which products the customer
uses, whether the customer is a VIP requiring special treatment
and where the customer is in the sales cycle.
This lack of insight was the result of warehouses set up
to give static snapshots of data, perhaps as recently as
last week. But last week's (or even last night's)
data is often not sufficient to react to current situations.
Things change rapidly in today's e-business economy,
and the company with the best set of integrated, current
data is the one that will not only survive but will actually
thrive in this economy.
Unfortunately, most enterprises today do not have any integrated
data other than the snapshots found in their data warehouses.
This is where the need for the operational data store (ODS)
developed. Now, you can have integrated data in the static
snapshots and in live, current records—an environment
in which both types of data and requirements can coexist.
This is called the active data warehouse.
To better understand this advance in technology, let's
examine the characteristics that make active data warehousing
and the ODS so very different from traditional data warehouses.
Let's start with an under-standing of the difference
between analytical and operational applications (figure
1).
We classify the analytical applications as "business
intelligence," noting that the data warehouse supplies
data to the various analytical applications in the data
marts. The applications running against these components
use decision support interfaces (DSIs) and give great insight
into customers' demographics, buying habits, profitability,
lifetime value and more.
The operational applications, or business management components,
give the enterprise the ability to take action using its
collective intelligence and subject knowledge. They also
provide the organization with an enterprise-wide understanding
of its situation, which facilitates a transition away from
the silo business unit or functional viewpoint.
Business management is "where the action is."
It allows customer knowledge to be applied to modify customer
behavior and manage customer contact. Business management
consists of the ODS, the transaction interface (TrI), which
provides users with access to the valuable information as
well as the ability to update the ODS, and the associated
meta data that provides business and technical personnel
with information about the ODS.
The architecture used to support these important sets of
applications is called the corporate information factory
(CIF) (figure 2). The CIF is a logical or conceptual architecture
that provides an integrated view of enterprise data, enabling
both business intelligence and business management capabilities.
This architecture is a proven road map that maximizes the
success of enterprise-wide CRM implementations and e-business
strategies.
With that as background, we'll now focus our discussion
on the active data warehouse and its ability to handle not
only the historical, analytical requirements of data warehousing
but also the need for actionable information, which is found
in the ODS. We'll also explore the ODS's role
in business management.
Get to know the active data warehouse
Active data warehousing provides an integrated, consistent
data repository that drives both strategic and tactical/operational
decision support within an organization. Given that, what
are some of the characteristics that make this technology
capable of supporting not only the ODS for tactical decision-making
but also the data warehouse and data mart environment for
strategic decision-making capabilities?
Strategic and tactical query support
The workload in such an environment consists of traditional,
complex, decision-support queries, but it is able to expand
to support the short, quick queries used in the tactical
decision-making scenario. These new requirements mean new
service levels in terms of performance, scalability and
availability.
For example, the decision-support queries submitted by
a marketing manager might be used to derive patterns in
customer buying habits, model customer demographics and
determine customer profitability. Meanwhile, tactical queries
might be needed to determine the best offer or banner ad
for a customer, determine the availability of a product
or alter a campaign based on current results. To accommodate
both of these query types, restraints might have to be placed
on longer-running analytical queries to guarantee tactical
query performance.
Active data warehousing means large data volumes. Therefore,
scalability becomes critical to support the large amounts
of detailed data needed to understand business events. Scalability
also means being able to support concurrent queries like
those just described.
Availability (and thus reliability) is perhaps the most
distinguishing characteristic of the technology to support
both tactical and strategic queries. Traditional data warehouses
usually do not have to be functional 24X365. Not so for
ODS functions. As a result, the active data warehouse must
always be accessible, or the business simply cannot operate.
Continuous data acquisition
Because the requirement for data freshness is far more
stringent in the active data warehouse environment than
in traditional data warehouses, there is a need for a more
sophisticated data acquisition mechanism capable of gathering
data much closer to the time a business event took place.
Ideally, the acquisition mechanism will provide a continuous
feed of new or changed data into the environment without
blocking access to the very tables being updated.
The timing of data acquisition varies depending on the
class of ODS in use (more on this later). However, fresh
data, and by this we mean data only a few minutes old, is
the norm. Active data ware-houses are able to handle large
volumes of changing data with ease.
Event-based database triggers
As the need for decision-making expands from only strategic
to both strategic and tactical, it makes sense that the
environment would evolve even further to event-based activities.
This requires a series of event-based database triggers
that operate on a chain of action and reaction. Triggers
are quite useful because they can automatically initiate
certain actions when specific conditions are reached.
Given these characteristics, what benefits should you expect
to gain from active data warehousing? Besides the obvious
benefits of better performance, availability and scalability,
there are a few less obvious ones:
* Active data warehousing
eliminates latency of action and data redundancy. Latency
of action is defined in terms of the time to study the results
of a particular strategic query to the time it takes to
act upon those results. With a single environment in which
both the tactical and strategic data are co-located, this
latency is almost zero. And because there is a single environment,
there is no need to replicate or duplicate some data in
physically distinct and separate environments for strategic
and tactical decision-making.
* Active data warehousing
yields a seamless infrastructure. The technology logically
incorporates a fully functioning ODS as well as the traditional
data warehouse and data marts in a single physical platform.
This means the components are easier to develop, maintain,
sustain and enhance. It also means the environment is far
more flexible in terms of usage, changes to the underlying
database and additions to the existing data.
For the active data warehouse architecture, the integration
of the data warehouse and the ODS is much simpler than if
these two components were built in separate environments.
Because there is a single instance of the overall database
and architecture, movement of data between and among the
various CIF components is much cleaner. When appropriate,
the same set of reference data can be used by all components
rather than replicated or re-created over and over.
Figure 3 illustrates the conceptual architecture of the
active data warehouse infrastructure. Notice that the overlap
in the middle is where the common dimensions, calculations,
reference data and more reside to be used by all components.
Get to know the ODS
The ODS is a subject-oriented, integrated,
current-valued and volatile collection of detailed data
that provides a true enterprise view of data by subject
area. Let's look at these defining characteristics
of the traditional ODS in more detail:
* Subject-oriented: The
ODS is organized around major data subjects of interest
to the enterprise. The primary purpose of the ODS is to
collect, integrate and distribute current information about
the data subject and to provide an enterprise view of it.
The subjects consist of any that are important to the organization.
For example, a customer-focused ODS will typically house
the most current information about a customer as well as
information on all recent customer interactions with the
organization, including product ownership and summary usage
statistics, billing or statement information, summary-level
contacts and other related information.
* Integrated: The integration
characteristic of an ODS is of key importance to e-commerce,
CRM and other fast-moving business initiatives. The ODS
represents an integrated image of a particular profile,
such as customer, product or order. Information for this
profile is pulled from any system in the organization, including
operational and decision support. While building and refreshing
an ODS, the organization integrates all the different sources
of information into a consistent view within the ODS that
is used when reacting to a particular situation or interacting
with the customer across all contact points. As the definitive
record and the consolidation point for profiles, the ODS
may also provide other systems in the organization with
this valuable information. Of great importance is the ODS's
ability to be accessed by anyone from anywhere in the organization
(or outside of it, as with your customers or partners),
at any time.
* Current-valued: The
ODS carries little or no history, much like a typical operational
system. Unlike a data warehouse, which is a series of information
snapshots used for strategic analysis, the ODS is a current
picture of the subjects in question and is used for "action."
"Currency" is relative and can be defined differently
depending on the subject matter. In any case, the ODS will
have far less history than the data warehouse, and it should
never be considered as a replacement for the warehouse.
* Volatile: ODS data
changes frequently, and these changes are typically reflected
as updates to the existing fields in a record, not snapshots
of whole records as in the warehouse. Changes to information
in the operational systems will be reflected as changes
in the ODS as well. Some types of information, such as account
ownership, order status changes, customer touch records,
product usage records and contact information, can change
quite frequently. In some cases, the ODS can be updated
directly by the users and customers, adding to its vola-tility.
New records might be added directly into the ODS at the
same time that new product information is placed into the
business operations systems. The customer ODS must be designed
to handle these frequent updates and changes with ease and
with appropriate referential integrity protocols.
* Detailed: The ODS carries
mostly low-level, detailed data for all profile information,
but it might have some summarized information, such as customer
contacts and products or services. The summary data existing
in the ODS is different from that found in the warehouse
in that it is dynamic in nature rather than static. That
is, summaries in the ODS can be calculated at the time of
request rather than being pre-calculated and stored.
Data Currency and the ODS
Another ODS characteristic that is very relevant
to the CRM and e-commerce world is the speed at which it
is refreshed. Your organization has some choices in terms
of the information currency and update frequency.
For example, a customer might log onto your Web site and
enter his new address, phone and fax number. The new customer
contact information must be updated in the customer ODS
within a few seconds after its entry into the operational
environment. This type of ODS, known as class I,
is used when the information must be very accurate and up-to-date
at all times. For instance, when a customer service representative
is talking to a customer, he must see the most current information
no matter where it was initially entered or changed.
A class II ODS is a little more relaxed, using store
and forward techniques for data update rather than performing
synchronous updates. A class II ODS receives updates of
information, such as a customer's summary of Web purchases,
every half-hour or hour. Because service representatives
only use this information to get a feel for the customer's
product interest, relying on the product-centric ordering
systems for transaction details, class II updates are satisfactory
for these summaries. There is some trade-off between update
frequency and integration—the faster the update time,
the less time there is to perform complicated or extensive
integration routines.
A class III ODS is typically updated in batches,
most often on a daily basis. Information currency requirements
are not nearly as robust when organizations build a class
III ODS. Because product preferences, for instance, do not
change frequently and are used to understand cross-sell
recommendations, class III updates work well.
The fourth type of ODS, class IV, is a special case
where information provided to the ODS comes not only from
the operational systems but also from the data warehouse
or specific data marts. The information from the data warehouse
or data mart is transferred into the ODS only periodi-cally,
usually in a scheduled fashion. Small amounts of pre-aggregated
or pre-analyzed data flow from the strategic decision support
environment into the ODS for use with more tactical applications.
For example, the corporation might determine the lifetime
value of its customers through an extensive analysis of
customer data. The results of that analysis are then updated
in each customer's profile record within the ODS so
employees have ready access to this strategic CRM data while
performing operational tasks.
Once the strategic results are stored in the ODS, it is
possible to provide online real-time support of important
strategic information. In doing so, the data warehouse and
data marts can be said to
support high-performance, online data access when needed.
Summary
The ODS is a key component of your technology environment
that provides business management capabilities to the organization.
Architecturally, the ODS works in conjunction with the data
warehouse and data marts by providing data into and receiving
analytical results from these components. This is the critical
process that makes the business intelligence of your organization
actionable.
The active data warehouse simplifies the overall construction
and maintenance of the CIF by creating physical or logical
components in a single instance of the database, thus:
* Maximizing flexibility
with a minimum of effort. The reuse of the data and the
ability to quickly create new applications are significant
advantages because many of the components are logical constructs.
* Creating an environment
that is efficient to maintain and enhance. Because there
is one physical environment, it's a simplified process
to enhance or change existing applications and CIF components.
* Eliminating data latency
and redundancy. Because many components are logical in nature,
the ODS eliminates the time it takes traditional environments
to extract data from the warehouse, format it for various
data mart usage and then deliver it to the data mart locations.
In addition, access to current (ODS) as well as historical
(data warehouse and marts) data is easily carried out with
minimal delay.
This technology not only creates a responsive business
intelligence environment through the integration of the
data warehouse and associated marts, but now it also supports
the critical characteristics of the actionable piece of
the CIF architecture—the ODS used for business management.
This is a strategically significant techno-logical breakthrough
and one that should be seriously considered for any enterprise
embracing the CIF architecture. T
Dr. Claudia Imhoff is an internationally
recognized expert on the Corporate Information Factory,
business intelligence and CRM. She can be reached at CImhoff@IntelSols.com.