In a recursive structure, we can have an unlimited number of levels without knowing how many levels each member hierarchy will have. One hierarchy may be only two levels deep, and another 15. Herein lies the limitation of the star schema. The star schema demands a fixed number of levels because each level is set up by a series of fixed columns in a dimension table. Because we do not know the number of levels in a recursive structure, we cannot predefine the columns.
Often the most critical entities in the data model have recursive structures. Organizational hierarchies (e.g. internal, customer, supplier, competitor) usually are recursive. Other examples of typical recursive structures include employee tables (because employees have managers who are also employees), bill of material and project-work breakdown structures.
2. It is often argued that denormalization is better for the end user than third normal form. However, usability, when separated from performance, actually favors normalization.
Denormalized schemas certainly seem more user-friendly than the complexity of a highly generalized, normalized data model. However, denormalization is driven by known queries, so the ease is only an illusion. The normalized model will allow more flexible new analysis. In any case, users are not typically writing queries; this is often the responsibility of a tool or dedicated application developer. Moreover, views can still make the normalized data model easier to navigate. Few DBAs allow users to query base tables directly, so DBAs can create views from the base tables that look exactly like their denormalized table counterparts.
If there were no issues of performance, all denormalization could be handled in database views. Star schemas, snowflakes, summary tables, derived data, etc. could be built as virtual clusters of tables that look exactly like their physical counterparts. By handling denormalization virtually, the relationships within the underlying base tables remain intact, and referential integrity is supported regardless of how many denormalized relationships are created. This frees the DBA to create any number of denormalized database views for users while maintaining data integrity and eliminating the redundant data requirements of denormalized models.
Database views actually are better for end users than denormalized data because the DBA can create virtual, subject-oriented schemas for specific applications without destroying the underlying base-table data. Views can be created to apply security constraints for groups of business users.
A physical star schema has physical dimensions that support a physical fact table. However, for some dimensions there may be mutually exclusive substitutes for the same data. For example, an airline is interested in both point-to-point travel between segments and between the true origins and destinations (O&D) of their customers. The true O&D dimension is different from the segment O&D, although it looks the same. Moreover, the consolidation of facts is different as well, although the detailed base table data is the same. If the star schemas are physicalized, two very large schemas need to be maintained and coordinated, whereas with virtual star schemas, the data is maintained only in the base tables.
3. It also is argued that denormalization offers better performance than third normal form. However, to address performance, the first consideration should not be to sacrifice the data architecture; rather, it should be to acquire a database engine that can handle a normalized model and scale linearly. If there still is a need to improve performance, aside from scaling the database, then implement a well-planned data-propagation strategy that maintains and complements the underlying normalized base table substructure.
To begin, consider propagating denormalized data within the same database instance. Look at propagating to another environment only if there are other considerations for the target data source, such as geographic needs or the need to support proprietary data structures. In any case, the propagation is from the data warehouse and not directly from source systems.
There are three reasons for this strategy:
The full parallel capabilities of the data warehouse can be used to propagate data.
By keeping the data in the same instance of the database, hybrid queries can be implemented that take advantage of both the denormalized and normalized data. For example, large volumes of nonvolatile data from transaction detail can be propagated into new physical fact tables, and smaller volume, highly volatile dimensional data can be built into virtual dimension tables.
Administration of the complete data warehousing environment will be easier and less expensive.
4. A denormalized model creates ambiguity, so that within a denormalized table it is impossible to tell which columns are related. Normalized models maintain all relationships through the association of primary keys with foreign keys.
Consider, for example, the following denormalized table: Commodity_ID (Primary Key), Commodity_Name, Group_ID, Group_Name, Family_ID, Family_Name and Sequence.
A user viewing this table does not know the relationships among Commodity_ID, Group_ID and Family_ID. It might be that the commodity has separate relationships with groups and families, and it might be that the commodity is related to groups, which are related to families. Moreover, Sequence is ambiguous. Is it used to order commodities in groups, commodities in families or groups within families? It is surprising how often such dangling relationships appear in denormalized models. In contrast, normalized models maintain all relationships through the association of primary keys with foreign keys. Users can clearly discern the relationships simply by looking at the data model.
5. Referential integrity cannot be effectively maintained within a denormalized structure.
Denormalized structures compromise data quality. An RDBMS supports referential integrity to ensure that when foreign keys exist, the instances of the objects to which they refer also exist. Denormalized tables cannot support referential integrity. For example, business users have no guarantee that the children in a denormalized table have real parents in the same table, unless some other, more costly method of programmatically ensuring data integrity has been used.
Also, as new incremental data loads are inserted, changed fields will not be reflected in the old data. When an attribute is changed in the source system, only the new records loaded will reflect the changes. For example, suppose the marital status, last name or another field in the customer table changed. Any new loads for that customer will reflect the different data in the changed fields, causing inconsistencies in the data and false results in certain types of analysis because both the changed and unchanged attributes are stored redundantly.
6. The normalized structure with denormalized views is far more flexible in creating and managing dimensional consolidation hierarchies for on-line analytical processing (OLAP) models than the star schema.
Typically, modelers who use denormalized methods define a single set of mutually exclusive dimensions for an OLAP model. In reality, though, we can create many nonexclusive dimensional paths for a single model, which also are reusable across many models. Figure 2 shows two denormalized virtual schemas that share a number of common dimension views.