Teradata Magazine Cover Teradata Magazine Online  
Register Help Password
Password:
Quick Links
Current Issue
Archives
Teradata.com
Teradata Magazine Rss Feed
ARCHIVES Search Teradata Magazine Online:  



















How can you prove to auditors that you’ve implemented a privacy policy when customer information is scattered among a dozen or more separate systems?



























































Executives expect detailed information to be up to date and constantly available. Consumers expect to interact with any company at any hour of any day.






















Teradata technologies support all aspects of building, managing and operating an active data warehouse. Are you ready?


OUT OF HIDING
Active data warehousing comes front and center

by Todd Walter

AN “OPERATIONAL SYSTEM” DELIVERS EXTREME SERVICE levels to users who depend on it to support critical business processes. We expect operational service levels from telephone, network, Web and transactional (OLTP) systems. But traditionally, systems supporting decision-making—DSS, data marts, OLAP and data warehouses—have not been considered business critical and so have not had the same expectation levels. With a few notable exceptions, these systems have been built outside the bounds of the operational IT systems that run the business. This must change as decision-making systems are increasingly required to deliver more information to all aspects of the business.

An organization eager to drive operational processes with facts can take full advantage of the “single version of the truth” and robust analytics in an enterprise data warehouse without having to implement new systems and additional copies of data. As enterprise data warehouses evolve into active data warehouses, they can support a wide range of users and business processes with theinformation necessary to make decisions. This level of support requires an operational view of the data warehouse.

What do you mean by an operational view?
Availability, continuous operation and 24/7 access are the most obvious attributes of an operational system. Applications are built in different ways to support the users on the frontline as well as those outside the company. The complex, multi-system style of architecture for a traditional data warehouse environment must be simplified, and new organizations must become more involved with the daily operation of the data warehouse.

What are the pressures for simplification?
As more users require more access to more data, environments with data marts, operational data stores, data hubs and data cleansing areas have sprung up to support key aspects of the overall data warehousing challenge. Complexity comes from having data spread over many systems, processed through many steps, partially duplicated in several places and accessed independently by different applications.

The cost of operationalizing such a complex environment is often significantly higher than the cost of consolidating to a centralized view. Teradata advocates and enables a centralized data warehouse architecture, minimizing the number of data marts, operational data stores and other copies of the data, which in turn minimizes the complexity and cost of the operational processes.

Security and privacy are also major drivers of simplification. Imagine proving the implementation of a privacy policy to auditors when customer information is scattered among a dozen or more separate systems comprising the data warehouse environment. Data is dispersed, access control is difficult to prove and accuracy of privacy-related controls is difficult to ensure as the data is transported and transformed. Privacy and security policies are much easier to implement, manage, log and audit when there is only one copy of the data and one place it is accessed.

What new organizations need to get involved?
Many IT organizations have specialized groups to handle specific aspects of critical system operations such as backup, maintenance, scheduling, operations and disaster recovery. Data warehouses often do not involve these groups or involve them only partially. An active data warehouse must be a fully participating member in the operational IT infra-structure. Often this requires changing the processes and tools used to operate the data warehouse.

What are the challenges of working with these operational organizations?
Challenges flow in both directions. Operational organizations are usually not familiar with the size, flow, usage or technology of the data warehouse, while data warehouse teams typically have not had to manage the data at operational service levels. Often there is a mix of applications supported by the data warehouse, with only some requiring operational service levels. The key is for both types of organizations to learn from each other, understand the business requirements for each application and work together to meet those requirements.

How is information delivered to front-line users?
In the traditional data warehouse, strategic decision-making access is performed through BI tools; query, OLAP, data mining and SQL access tools play a major role. A majority of front-line users, inside or outside the company, have never seen SQL and do not want to; they interact with the enterprise via Web pages, graphs and buttons.

While traditional access methods continue to be appropriate for traditional data warehouse job functions, new ways of accessing the data warehouse are required for front-line users. Web access and an appropriate Web server environment are the foundation. A component architecture is appropriate for implementing the applications—Web services, EAI middleware and component development environments. The components then access the data warehouse via standard data access methods, possibly utilizing further application logic within the DBMS environment. Teradata supports the new access architecture via open standard database interfaces (JDBC, ODBC, OLE-DB), open access to any EAI or component architecture combined with adapters for and partnerships with leading EAI vendors.

How are active data warehouse applications different from those found in enterprise data warehouses?
Requiring very current data, delivering access to the frontlines and managing workloads are just a few of the issues that impact application design. From a data warehouse point of view, these applications will be very different, but to people building operational, transactional applications the techniques are very familiar.

How does the need for current data affect application design?
Data warehouse implementers have been loading from bulk extracts for many years. Increasing the data freshness levels requires changing the way data is acquired, moved, transformed and applied to the data warehouse. In some cases, acquiring data more frequently means making changes to other infrastructure components. Continuous feed data must be moved using messaging, queues, EAI tools, continuous ETL tools or a combination of these tools. Applying the data to the data warehouse requires a continuous update tool, or else bulk update tools must be run repeatedly throughout the day, concurrent with the rest of the workload on the system. Teradata supports keeping data current with the TPump continuous update tool and workload management that allows continuous or repeated bulk updates to run concurrently with the data access workloads.

How are workloads managed?
The front-line users’ workload often has very different service levels than traditional data warehouse applications. The system must manage all aspects—continuous update, tactical and event-driven decision-making mixed with the traditional strategic query workload—each at a different service level. Teradata supports this set of requirements with automatic, detailed workload management tools that operate at the user- and request-level without requiring management of the operating system. The Teradata Priority Scheduler manages a complex workload at different service levels for different users while fully utilizing the resources of the configuration. The Teradata Dynamic Query Manager gates less critical work to ensure that priority work has the system resources it needs.

How do new availability requirements change the picture?
The active data warehouse has quite different availability expectations. When front-line users are being supported, 24/7 access is assumed, particularly in a global company. A continuous flow of updates must be applied. Executives are expecting information to be up to date and available; if no one knows that the CEO depends on hourly updates during a key business period, important expectations can be missed. And we as consumers expect to interact with a company at any hour of any day. Traditional data warehouse implementations have much lower requirements. If the system is unavailable for a while, users are unhappy but the operation of the business is not degraded. As a traditional data warehouse begins to support operational applications, the availability requirements need to be carefully reviewed. System availability, data availability, disaster recovery and mixed workload management must all be considered.

How is system availability different in an active data warehouse?
Traditional data warehouses are often built on platforms that do not have very high availability attributes. Single points of failure such as single servers are allowed in the configuration to balance cost against the required availability levels. An active data warehouse has significantly higher availability requirements. Failover and fault tolerance need to be built into the platform and supported by the software. Teradata supports these requirements automatically with components such as raid disks, dual controllers, dual I/O paths, fully fault-tolerant interconnect, multiple network and channel connections, fans, UPSs and power connections. Every MPP Teradata system has built-in failover to handle node failure automatically rather than requiring special options, separate software and separate systems management.

Why is availability an issue?
Simply having the hardware and software running is not a sufficient level of availability for the active data warehouse. The data must also be fully accessible to users so they can do their job. Traditional data warehouses often assume that data can be taken offline for data load, backup, data maintenance or model changes. In an active data warehouse, data can’t ever be offline. Teradata supports data availability in many ways. Bulk or continuous data load and backup can be performed concurrently with user access. Data management operations are performed automatically and continuously so that processes like defragmentation, reorgs, index rebuilds, repartitioning or updating materialized views never take data offline. Model changes can be done in place without unloading data and can be hidden completely from applications using high-performance views.

Why is disaster recovery necessary?
When the applications on an active data warehouse are business critical, some form of disaster recovery is necessary. Just like any other operational system, consideration must be given to the business criticality of each application. It is very likely that only a subset of the data and applications will be business critical, a fact that often allows the disaster recovery configuration to be considerably smaller than the primary configuration. Various options are available, including backup solutions, offsite-hosted options, warm standby systems normally used for other work or a dual-active configuration.

How is overall system management different?
The best technology in the world can’t overcome every management and control issue. Applications, their users and their business criticality need to be understood.

Decisions that affect any aspect of availability—a single table or the whole system—require an understanding of their impact on business-critical applications. The backup, load, scheduling and workload management processes sufficient for a traditional data warehouse will not deliver the significantly higher service levels of the active data warehouse and must be revisited. Development staff must manage the impact of an active data warehouse environment or move to a separate development and test environment. Teradata tools help make the data warehouse an operational environment. For instance, export/load of data model and system configuration information can be used to simulate the production environment on a smaller test and development configuration, and sampling can move an appropriate subset of data for execution testing. The process changes and workload management need to be balanced against flexibility and the data warehouse’s ability to respond to new business requirements.

E-MAIL ME

Looking for answers to life’s mysteries? Or would you just like to know more about the Teradata Warehouse and related applications? Ask the Expert! E-mail questions and comments to Todd at: todd.walter@teradata-ncr.com

In summary
As enterprise data warehouses evolve into active data warehouses, they become operational enterprise systems requiring additional considerations and management processes similar to those used for traditional transactional systems. Teradata technologies support all aspects of building, managing and operating an active data warehouse.

They’re ready when you are.T

Photo by Alex Hayden




Copyright by Teradata Corporation 2001-2007.