An organization eager to drive operational processes
with facts can take full advantage of the “single version
of the truth” and robust analytics in an enterprise data
warehouse without having to implement new systems and additional
copies of data. As enterprise data warehouses evolve into active
data warehouses, they can support a wide range of users and business
processes with theinformation necessary to make decisions. This
level of support requires an operational view of the data warehouse.
What
do you mean by an operational view?
Availability, continuous operation and 24/7 access
are the most obvious attributes of an operational system. Applications
are built in different ways to support the users on the frontline
as well as those outside the company. The complex, multi-system
style of architecture for a traditional data warehouse environment
must be simplified, and new organizations must become more involved
with the daily operation of the data warehouse.
What
are the pressures for simplification?
As more users require more access to more data, environments
with data marts, operational data stores, data hubs and data
cleansing areas have sprung up to support key aspects of the
overall data warehousing challenge. Complexity comes from having
data spread over many systems, processed through many steps,
partially duplicated in several places and accessed independently
by different applications.
The cost of operationalizing such a complex
environment is often significantly higher than the cost of consolidating
to a centralized view. Teradata advocates and enables a centralized
data warehouse architecture, minimizing the number of data marts,
operational data stores and other copies of the data, which
in turn minimizes the complexity and cost of the operational
processes.
Security and privacy are also major drivers
of simplification. Imagine proving the implementation of a privacy
policy to auditors when customer information is scattered among
a dozen or more separate systems comprising the data warehouse
environment. Data is dispersed, access control is difficult
to prove and accuracy of privacy-related controls is difficult
to ensure as the data is transported and transformed. Privacy
and security policies are much easier to implement, manage,
log and audit when there is only one copy of the data and one
place it is accessed.
What
new organizations need to get involved?
Many IT organizations have specialized groups
to handle specific aspects of critical system operations such
as backup, maintenance, scheduling, operations and disaster
recovery. Data warehouses often do not involve these groups
or involve them only partially. An active data warehouse must
be a fully participating member in the operational IT infra-structure.
Often this requires changing the processes and tools used to
operate the data warehouse.
What are the challenges
of working with these operational organizations?
Challenges flow in both directions. Operational organizations
are usually not familiar with the size, flow, usage or technology
of the data warehouse, while data warehouse teams typically
have not had to manage the data at operational service levels.
Often there is a mix of applications supported by the data warehouse,
with only some requiring operational service levels. The key
is for both types of organizations to learn from each other,
understand the business requirements for each application and
work together to meet those requirements.
How
is information delivered to front-line users?
In the traditional data warehouse, strategic decision-making
access is performed through BI tools; query, OLAP, data mining
and SQL access tools play a major role. A majority of front-line
users, inside or outside the company, have never seen SQL and
do not want to; they interact with the enterprise via Web pages,
graphs and buttons.
While traditional access methods continue
to be appropriate for traditional data warehouse job functions,
new ways of accessing the data warehouse are required for front-line
users. Web access and an appropriate Web server environment
are the foundation. A component architecture is appropriate
for implementing the applications—Web services, EAI middleware
and component development environments. The components then
access the data warehouse via standard data access methods,
possibly utilizing further application logic within the DBMS
environment. Teradata supports the new access architecture via
open standard database interfaces (JDBC, ODBC, OLE-DB), open
access to any EAI or component architecture combined with adapters
for and partnerships with leading EAI vendors.
How are active data
warehouse applications different from those found in enterprise
data warehouses?
Requiring very current data, delivering access to the frontlines
and managing workloads are just a few of the issues that impact
application design. From a data warehouse point of view, these
applications will be very different, but to people building
operational, transactional applications the techniques are very
familiar.
How
does the need for current data affect application design?
Data warehouse implementers have been loading from
bulk extracts for many years. Increasing the data freshness
levels requires changing the way data is acquired, moved, transformed
and applied to the data warehouse. In some cases, acquiring
data more frequently means making changes to other infrastructure
components. Continuous feed data must be moved using messaging,
queues, EAI tools, continuous ETL tools or a combination of
these tools. Applying the data to the data warehouse requires
a continuous update tool, or else bulk update tools must be
run repeatedly throughout the day, concurrent with the rest
of the workload on the system. Teradata supports keeping data
current with the TPump continuous update tool and workload management
that allows continuous or repeated bulk updates to run concurrently
with the data access workloads.
How are workloads managed?
The front-line users’ workload often has very different
service levels than traditional data warehouse applications.
The system must manage all aspects—continuous update,
tactical and event-driven decision-making mixed with the traditional
strategic query workload—each at a different service level.
Teradata supports this set of requirements with automatic, detailed
workload management tools that operate at the user- and request-level
without requiring management of the operating system. The Teradata
Priority Scheduler manages a complex workload at different service
levels for different users while fully utilizing the resources
of the configuration. The Teradata Dynamic Query Manager gates
less critical work to ensure that priority work has the system
resources it needs.
How
do new availability requirements change the picture?
The active data warehouse has quite different
availability expectations. When front-line users are being supported,
24/7 access is assumed, particularly in a global company. A
continuous flow of updates must be applied. Executives are expecting
information to be up to date and available; if no one knows
that the CEO depends on hourly updates during a key business
period, important expectations can be missed. And we as consumers
expect to interact with a company at any hour of any day. Traditional
data warehouse implementations have much lower requirements.
If the system is unavailable for a while, users are unhappy
but the operation of the business is not degraded. As a traditional
data warehouse begins to support operational applications, the
availability requirements need to be carefully reviewed. System
availability, data availability, disaster recovery and mixed
workload management must all be considered.
How is system availability
different in an active data warehouse?
Traditional data warehouses are often built on platforms
that do not have very high availability attributes. Single points
of failure such as single servers are allowed in the configuration
to balance cost against the required availability levels. An
active data warehouse has significantly higher availability
requirements. Failover and fault tolerance need to be built
into the platform and supported by the software. Teradata supports
these requirements automatically with components such as raid
disks, dual controllers, dual I/O paths, fully fault-tolerant
interconnect, multiple network and channel connections, fans,
UPSs and power connections. Every MPP Teradata system has built-in
failover to handle node failure automatically rather than requiring
special options, separate software and separate systems management.
Why is availability
an issue?
Simply having the hardware and software running is not a sufficient
level of availability for the active data warehouse. The data
must also be fully accessible to users so they can do their
job. Traditional data warehouses often assume that data can
be taken offline for data load, backup, data maintenance or
model changes. In an active data warehouse, data can’t
ever be offline. Teradata supports data availability in many
ways. Bulk or continuous data load and backup can be performed
concurrently with user access. Data management operations are
performed automatically and continuously so that processes like
defragmentation, reorgs, index rebuilds, repartitioning or updating
materialized views never take data offline. Model changes can
be done in place without unloading data and can be hidden completely
from applications using high-performance views.
Why is disaster recovery
necessary?
When the applications on an active data warehouse are business
critical, some form of disaster recovery is necessary. Just
like any other operational system, consideration must be given
to the business criticality of each application. It is very
likely that only a subset of the data and applications will
be business critical, a fact that often allows the disaster
recovery configuration to be considerably smaller than the primary
configuration. Various options are available, including backup
solutions, offsite-hosted options, warm standby systems normally
used for other work or a dual-active configuration.
How is overall system
management different?
The best technology in the world can’t overcome every
management and control issue. Applications, their users and
their business criticality need to be understood.
Decisions that affect any aspect of availability—a
single table or the whole system—require an understanding
of their impact on business-critical applications. The backup,
load, scheduling and workload management processes sufficient
for a traditional data warehouse will not deliver the significantly
higher service levels of the active data warehouse and must
be revisited. Development staff must manage the impact of an
active data warehouse environment or move to a separate development
and test environment. Teradata tools help make the data warehouse
an operational environment. For instance, export/load of data
model and system configuration information can be used to simulate
the production environment on a smaller test and development
configuration, and sampling can move an appropriate subset of
data for execution testing. The process changes and workload
management need to be balanced against flexibility and the data
warehouse’s ability to respond to new business requirements.
| E-MAIL
ME |
| Looking
for answers to lifes mysteries? Or would you
just like to know more about the Teradata Warehouse
and related applications? Ask the Expert! E-mail
questions and comments to Todd at: todd.walter@teradata-ncr.com |
|
In summary
As enterprise data warehouses evolve into active
data warehouses, they become operational enterprise systems
requiring additional considerations and management processes
similar to those used for traditional transactional systems.
Teradata technologies support all aspects of building, managing
and operating an active data warehouse.
They’re ready when you are.T
Photo by Alex Hayden