Dimensional, normalized or hybrid data models show different levels of relationships.
by Dan Higgins, Director of data warehouse sales support for Teradata
In data warehousing, one of the most contested issues regarding the best approach for problem solving is data modeling. For years, proponents
for the dimensional, or star schema, have argued for their methods to modeling warehouse data. Others swear the normalized model is more
revealing. However, as is true with most opposing views, both have merits and challenges.
Dimensional and normalized modeling approaches have a place in data warehousing and other decision support environments. In fact, some
Teradata customers use normalized modeling exclusively; some use only dimensional modeling; many use a combination of the two; and some
develop hybrid data structures.
It's clear that organizations with successful data warehouse systems are open to redesigning and redeveloping their models based on their
needs. A company's data architects and modelers do not choose a data modeling approach simply because it is what they have always done, or
because of architectural dogmatism, or because of the limitations of a particular technology. They choose a particular approach because
it's right for their system, it's adaptable, it's scalable—and it meets the business's requirements.
To get to that point ourselves, we need to understand the roles of the different data modeling approaches and exploit the most right method,
or blending of methods, for present and future needs.
Better than normal
Some users in the IT community assert that a normalized data model is appropriate only for transaction processing systems. Others say this
model is also useful for data integration hubs, which feed the downstream, dimensionally modeled data marts that support the needs of the end
users. And some refer to dimensional modeling as data warehousing and normalized modeling as online transaction processing (OLTP).
Together with hundreds of its customers, Teradata recognizes that the normalized approach is useful in enterprise data warehousing for
modeling business data and enabling end users and applications to have direct access to the normalized data model. The benefits of normalized
data modeling include:
|
Superior cross-functional, cross-subject-area data integration
|
|
Easier data maintenance and reduced data redundancy
|
|
Richer and more extensive set of relationships among data entities
|
|
Wider range of support for applications and varying needs of different business functions
|
Another dimension
The dimensional, or star schema, data model may be more appropriate for specific decision support applications. While a dimensional approach
often eliminates many of the valuable relationships that exist among data entities, it also removes much of the complexity inherent in a
normalized model. This simpler representation of the data can enhance end users' comprehension and increase their utilization of the system.
With an underlying normalized data model, a virtual or physical semantic layer based on dimensional models is useful for hiding the complexity
of a normalized model and supporting a wide range of basic reporting and query applications. For this reason, many online analytical processing
(OLAP) business intelligence (BI) tools assume a dimensional cube structure.
Stay flexible
Many Teradata customers ask: When building an enterprise data warehouse (EDW) that will eventually support a wide range of applications and
end user needs, which approach should be used, normalized or dimensional? To provide answers based on direct subject-to-subject relationships
as well as the relationships among several entities, companies can leverage both tactics.
A normalized data model should be used for the underlying data that is integrated and shared across the enterprise. The outcome will serve as
the raw material for application-specific database views and table structures. (See figure right.)
For example, a marketing department that is planning a product promotion might use a normalized approach. Among other things, the department
would need to know the demographics of its intended consumer audience, the best salable time for that product and the most effective means to
communicate its advertising message.
Dimensional models should be used for OLAP tools, for some reporting applications and when a simpler, more straightforward presentation of the
data is required for end users who need to understand the relationships between two sets of data. Logical database views and features such as
aggregate join indexes will help minimize the maintenance of dimensional structures while balancing system performance.
The dimensional model would be the appropriate tool, for instance, when the finance department simply needs to identify the number of a
particular account sold.
Incorporating the complementary methods of normalized and dimensional data modeling into the EDW processes will deliver the greatest business
value to any organization. T
| The value of relationships |
|
The most valuable insight in decision support solutions comes not from data about "things" but from information
about the relationships among those things. For example, how customers and products relate to one another is more
valuable than information only about customers or only about products. The information about things can tell you
what happened. The information on the relationships among things can help you understand how and why it happened
and is more useful in predicting what will happen.
Normalized data models can accommodate far more relationships among data entities than is possible in a
dimensional model. This relationship capability allows greater flexibility when modeling data and, more important,
allows the model to answer a much broader range of business questions. A normalized data model also eases the data
transition from operational systems into the data warehouse and provides greater opportunity for reusing data in
new applications, as well as faster deployment of those applications.
Consider the dimensional (or star schema) and normalized data model diagrams in the figure. In both diagrams, the
lines represent relationships between data entities. The more basic dimensional model is effective for some types
of hierarchical reporting and can answer business questions that fit a specific star or cube. The normalized model
includes a far richer set of data relationships and can answer questions about the intertwining relationships.
—D.H.
|
|
| Why Teradata for all types of data modeling |
|
Teradata has long been a proponent of normalized data modeling, in part because of its suitability for enterprise
data warehousing applications but also because the Teradata platform is superior for managing and exploiting the
normalized data model. But Teradata is not exclusive to the normalized approach; rather, the design of the Teradata
platform makes it an excellent choice for the full range of data modeling methods, including applications using a
dimensional, or star schema, data model.
These features make Teradata an adaptable system for any type of modeling:
| > |
Database architecture is not biased toward any particular data modeling method.
|
| > |
System performance and scalability come from its parallelism and software-based shared-nothing
architecture.
|
| > |
Query rewrite capabilities can comfortably handle complex joins associated with a normalized data model.
|
| > |
Database views can be used to define a virtual dimensional model, thereby exposing only the
appropriate data multi-dimensional elements and relationships to a particular business intelligence
(BI) user or tool.
|
| > |
Virtual dimensional models can camouflage complexity sometimes associated with normalized data
models.
|
| > |
Referential integrity, case logic, table elimination and optimization strategies enable the
virtual dimensional model to become reality/physicalized.
|
| > |
Aggregate join indexes (AJIs) provide easy and automatic maintenance of cube structures.
|
| > |
AJIs eliminate costs associated with building cubes, such as large-server maintenance, large
data transfers, data latency and limited dimensionality for data analysis.
|
| > |
Continuous improvements made to industry-specific normalized logical data models (LDMs) minimize
years of customized design work.
|
| > |
Expansion of a comprehensive set of LDMs will include multi-dimensional data models.
|
—D.H., with contributions from Carlos Bouloy
|
|
Photography by Getty images
Teradata Magazine-June 2008
|