Teradata Magazine Cover Teradata Magazine Online  
Register Help Password
Password:
Quick Links
Current Issue
Archives
Teradata.com
Teradata Magazine Rss Feed
ARCHIVES Search Teradata Magazine Online:  
ENTERPRISE VIEW

Printable versionPrintable version Send to a colleagueSend to a colleague

The business value of logical data modeling

Integration ushers in the new age of data reusability.

About forty years ago, in the early days of data processing, the entire focus of system development revolved around automating tedious manual business processes, such as order processing or accounts receivable. As a result, the main effort during the system development life cycle (SDLC) was to produce a set of programs that automated a business process. In those days, the elimination of slow manual processes was the key business driver—not the delivery of critical information about the business. The data used by the programs simply "tagged along" as a byproduct of an operational system.

Just as process-oriented modeling techniques were becoming the standard of SDLC methodologies in the 1970s, Peter Chen broke the pattern by developing the entity-relationship (E/R) model, which later became known as the logical data model (LDM). Chen's first version of the E/R model represented entities (business objects) as rectangles and relationships between entities as rhombuses (diamonds) on the relationship lines. It also graphically showed attributes (data elements) as circles attached to either entities or relationships (figure 1).

E/R modeling was revolutionary in that, for the first time in data processing, data—not processing—was at the center of both business analysis and system design. The implications were enormous: Data could now become a reusable commodity, which meant that every unique data element was identified and inventoried once and only once. That provided the ability to track every person or program using the unique data elements for any purpose.

The concepts of data-driven analysis and, much later, data-driven methodologies were born as business analysts and data modeling practitioners realized that they could finally create a business model of their organizations. In other words, they could create a logical portrayal of a non-redundant view of their enterprise in terms of its data resources.

The power of relational database management systems (RDBMS) is inextricably tied to the concepts of Chen's original E/R model. At its core is the notion of separating data from both business and system processes for update (operational) as well as access (decision support) purposes. The LDM achieves this separation from a business perspective, and the physical data model implements this separation from the RDBMS perspective.

Time to get logical
So what benefits does logical data modeling bring to the business?

Process independence
Separating data from processes during logical data modeling means disregarding access paths, programming languages, SQL versions, query and reporting tools, online analytical processing (OLAP) tools and RDBMS products. All of these process-dependent variables are completely immaterial during business analysis, which focuses only on what data comprises the organization. These variables do come into play later during database design (physical data modeling), which addresses how data will be used and how data should be stored.

Business-focused data analysis
Logical data modeling facilitates business-focused data analysis, which is quite different from the usual kind of analysis we perform during an SDLC or system analysis. An old but accurate term for activities performed during system analysis is "external design" because the thought processes applied during system analysis are geared toward producing design alternatives for the database. In fact, every project in which requirement definitions for a specific application are followed by design activities omits business-focused data analysis and performs only system analysis at best.

While system analysis does produce preliminary design alternatives, business-focused data analysis is ultimately geared toward understanding enterprise-wide business rules and usage of business data, as well as uncovering existing data defects, such as business rule violations, referential integrity violations, synonyms and homo-nyms. This type of analysis doesn't consider database design or implementation method. Business-focused data analysis uses rules of normalization to build a non-redundant and fully integrated data model, which reflects a 360-degree view of a business.

Data integration
When building an enterprise model it is often necessary to integrate subsets of the corporate data. Many people confuse data integration with data consolidation. Consolidating data simply means gathering data elements that identify or describe the same business object—like customer data or product data—from multiple source files or source databases and storing them in one table or in a set of dependent tables.

Data integration goes far beyond that. In addition to consolidating data, integration enforces data uniqueness—the building blocks that enable you to reuse the same data without having to duplicate it and without the additional development and maintenance costs of managing the duplications.

Data integration requires several actions during logical data modeling:

  • Examine the definition, semantic intent and domain values of each logical entity to find potential duplicates of business objects that otherwise would not be discovered because the objects are known under different names in the systems.
  • Ensure that each entity instance has one and only one unique identifier (primary key), which is never reassigned to a new entity instance—even after the old instance expires and is deleted from the database.
  • Use the six normalization rules to put "one fact in one place"—that is, one attribute in one, and only one, owning entity. This means that an attribute can be assigned to only one entity as either an identifier of that entity or as a descriptive attribute of that entity. This modeling activity ensures that each attribute is captured once and only once and that it remains unique within the data universe of the organization.
  • The last and most important activity of integration is to capture the business actions or transactions that connect the business objects in the real world. These business actions are shown as data relationships among the entities. It is paramount to capture them from a logical business perspective (not from a reporting pattern or data access perspective) because these relationships are the basis for all potential access patterns, known and unknown, now and in the future.

Improved data quality
Because the activities of logical data modeling are solely business-focused analysis activities, they include the validation of the logical data model components (e.g. entities, attributes, relationships, definitions, domains, business rules, etc.) against the existing operational data in the source files and source databases.

The validation activities include asking probing questions, applying normalization rules to put "one fact in one place" and scrutinizing definitions, domains and semantic meanings of all entities and all attributes to ensure their uniqueness.

Performing this type of data archeology during business-focused data analysis will inevitably expose data quality problems that otherwise never would have been detected during database design activities; database designers have neither the time nor the responsibility to perform such validation activities while they are trying to design an efficient database. Therefore, logical data modeling directly contributes to improved data quality.

Enterprise LDM
The benefits of logical data modeling are more completely derived from building a 360-degree view of a business, not from designing a business-function-specific database or a reporting-pattern-specific database. But the difficulty in building a 360-degree view of a business is that the current data chaos in most organizations is so immense that it can take significant time and effort to rationalize the existing data into an integrated, non-redundant enterprise LDM. Because of the difficulty, there are conflicting opinions about how to best approach building the enterprise LDM: big-bang vs. incremental or top-down vs. bottom-up.

The physical side of things
Physical data modeling is synonymous with the term "logical database design." It is a graphic representation of the proposed tables, relationships between tables, primary and foreign keys, and columns. It differs from logical data modeling in the following ways:

Process dependence
Unlike logical data modeling, which is not process-dependent—the physical data model must be process-dependent and take into consideration access paths, programming languages, SQL versions, query and reporting tools, OLAP tools and RDBMS products. If we review the definitions for these process-dependent variables, it is easy to understand their influence on database design.

  • Access paths must be determined, analyzed and applied during database design, when the physical data model is created, to show how data will be stored in the database. For efficiency reasons, some entities may be collapsed (denormalized) to avoid excessive database joins.
  • Programming languages such as COBOL, C++ and Java have their own idiosyncratic rules about making SQL calls to relational databases. These rules have to be understood and considered during database design.
  • Query and reporting tools translate the abstract user instructions (meta data) into a form of SQL, which then executes against the RDBMS. Some query and reporting tools produce rather inefficient SQL code, and database administrators may have to replace the tool-generated SQL code with their own more efficient pass-through queries.
  • OLAP tools typically provide multi-dimensional functionalities—such as slicing, dicing and pivoting query results—which require the underlying database to be designed in a multi-dimensional schema. Furthermore, some OLAP tools have a specific preference for either star schemas or snowflake schemas, and some tools even provide their own proprietary DBMS engine. Using the wrong database design could prevent the OLAP tool from functioning properly—or at all.
  • RDBMS products compete with each other through their unique functional extensions (SQL) and optimization capabilities. Therefore, the most perfect design for one RDBMS is not necessarily the best design for another. For example, when migrating from Oracle to Teradata, the existing database designs should be reviewed and adjusted—or even redesigned—to take advantage of the different optimization capabilities of the new product.

Database design
Physical data modeling facilitates logical database design with the primary focus on performance. Thus, database designers must consider and balance all application-specific and process-dependent variables.

They must also have a good understanding of how the RDBMS product-specific optimizer will react to these variables so that they can apply the minimum amount of denormalization to achieve the maximum amount of performance.

For more on physical data modeling, read Steve Hoberman's article "Intelligent by design."

Planning for tomorrow
When moving forward in database and application design, it is helpful to remember the principles of logical data modeling as they also carry over into physical data modeling.

Because data warehouses are as much about bringing order to data chaos as they are about functional decision support capabilities, it is important to use a data-driven approach that focuses on data reusability rather than building siloed solutions that contribute to uncontrolled and costly data and program redundancies.

Data reusability requires a complete view of the business, which can be achieved through data modeling, using appropriate normalization rules to place each unique atomic data element into its owning entity once and only once. It is important to carefully plan a long-term road map and data warehouse data blueprint in the beginning to avoid rework as the business increases functionality of the data warehouse. T

© Teradata Magazine-March 2005

RELATED LINKS:

The importance of data modeling as a foundation for business insight


back to top




Copyright by Teradata Corporation 2001-2007.