|
|  | |
|
WEB-ONLY CONTENT
|
|
The "rule book" of logical data modeling
by James D. Savage
Early in data processing, it was recognized that information is created from one or several pieces of basic data. An example might be rate of pay (which is basic data) or gross pay (which is information derived by multiplying rate of pay by hours of work).
If I were to present to you the symbols M, 10 and 12345 and ask you to tell me what they represent, you would have difficulty responding. The primary reason is that data has two dimensions—facts and meanings. The symbols just mentioned denote facts that must be placed in some sort of context in order to give meaning to the data. The context or meaning of "12345" could be an amount or a zip code, which would then appear as an attribute within a logical data model.

Figure 1: Information Diagram
A logical data model (LDM) identifies the "periodic table of data" used to create information for the functions, processes and tasks being performed within an organization. The LDM is a stable, non-redundant, integrated structure. The objective for using an LDM is to establish a data-processing environment where basic data is captured once, stored once and shared among qualified systems to generate needed information.
As shown in Figure 1 — Information Diagram, information is created from one or several pieces of basic data and data consists of meanings and facts.
A logical data model identifies the meanings and relationships of the "periodic table of data" needed to support the information requirements of an organization.
The basic constructs of a LDM are:
- Things about which data is kept, e.g., people, place, thing, or event, represented by a box;
- Relationships between those things, represented by lines connecting the boxes; and
- Characteristics of those things represented by attribute names within the box.
Consider the following your "rule book" for building a normalized LDM. Use the following collection of rules and prescribed standards in your construction of semantic data models that will serve to support the management of data as an asset, the integration of information systems and the building of shared databases.
The four primary components for an LDM are:
|
A.
|
Entity—An entity represents a set of real or abstract things (people, places, things or events) that have common attributes or characteristics. An individual member of the set is referred to as an "entity instance." The entity name is a noun phrase that describes the set of things the entity represents. The noun phrase is in singular form, not plural.
|
Entity rules:
- Each entity must have a unique name, and the same meaning must always apply to that name.
- In a key-based or fully attributed model, an entity has one or more attributes that are either owned by the entity or migrated to the entity through a relationship.
- In a key-based or fully attributed model, an entity has one or more attributes whose values uniquely identify every instance of the entity.
- An entity can have any number of relationships with other entities.
- No model can contain two distinctly named entities in which the names are synonymous, i.e. one name is directly or indirectly an alias for the other or a third name exists for which both names, either directly or indirectly, are aliases.
|
|
B.
|
Attribute—An attribute is a characteristic or property associated with a set of real or abstract entities (people, places, things or events). An "attribute instance" is a specific characteristic of an individual member of the set.
|
Attribute rules:
- An attribute must have a unique name, and the same meaning must always apply to that name.
- An entity can own any number of attributes. In a key-based or fully attributed model, every attribute is owned by exactly one entity (referred to as the Single-Owner Rule).
- An entity can have any number of migrated attributes. However, a migrated attribute must be part of the primary key of a related parent entity. The basic rule is this: Primary key(s) must always migrate, and non-key attributes must never migrate.
- No entity instance can have more than one value for an attribute associated with the entity (referred to as the No-Repeat Rule or First Normal Form Rule).
- Attributes that are not part of a primary key are allowed to be null (meaning not known). This was previously called the No-Null Rule. Earlier modelers would take a non-key attribute that could have null values and make a new entity with a "zero" or "one" relationship with the parent entity. This is no longer required.
- No model can contain two distinctly named attributes in which the names are synonymous. Two names are synonymous if either is directly or indirectly an alias for the other or if there is a third name for which both names, either directly or indirectly, are aliases.
A Primary and Alternate Key Attribute is an attribute or set of attributes whose value uniquely identifies every instance of the entity.
Primary and Alternate Key Attribute rules:
- In a key-based or fully attributed model, an entity must have an attribute or combination of attributes whose values uniquely identify every instance of the entity. These attributes form the "primary key" of the entity. Every instance of an entity must have a value for every attribute that is part of its primary key.
- An entity may have any number of alternate keys.
- A primary or alternate key may consist of a single attribute or a combination of attributes.
- Attributes that form primary and alternate keys of an entity may either be owned by the entity or migrated through a relationship.
- An individual attribute may be part of more than one key, either primary or alternate.
- Primary and alternate keys must contain only those attributes that contribute to unique identification (that is, if any attribute were not included as part of the key, then every instance of the entity could not be uniquely identified—referred to as the Smallest-Key Rule).
- If the primary key is composed of more than one attribute, then the value of every non-key attribute must be functionally dependent upon the entire primary key (that is, if the primary key is known, then the value of each non-key attribute is known, and no non-key attribute value can be determined by just part of the primary key—referred to as the Full-Functional-Dependency Rule or Second Normal Form Rule).
- Every attribute that is not part of a primary or alternate key must be functionally dependent upon only the primary key and each of the alternate keys (that is, no such attribute's value can be determined by another such attribute's value—referred to as the No-Transitive-Dependency Rule or Third Normal Form Rule).
Foreign Keys Attributes are attributes in entities that designate instances of related entities. If a specific connection or categorization relationship exists between two entities, then the primary key attribute(s) of the parent or generic entity migrate to the child or category entity. These inherited attributes are called "foreign keys."
If an entire foreign key is used for all or part of an entity's primary key, then the entity is "identifier-dependent." Conversely, if only a portion of a foreign key or no foreign key attribute is used for an entity's primary key, then the entity is "identifier-independent."
If all the parent entity's primary key attributes are migrated as part of the child entity's primary key, then the relationship through which the attributes were migrated is an "identifying relationship." If any of the migrated attributes are not part of the child entity's primary key, then the relationship is a "non-identifying relationship."
Foreign Key Attribute rules:
- Every migrated attribute of a child or category entity must represent an attribute in the primary key of a related parent or generic entity. Conversely, every primary key attribute of a parent or generic entity must be a migrated attribute in a related child or category entity.
- A child entity must not contain two entire foreign keys that identify the same instance of the same ancestor (a parent or generic entity) for every instance of the child, unless these foreign keys are linked via separate relationship paths.
- A migrated attribute may be part of more than one foreign key, provided that the attribute always has the same value for these foreign keys in any given instance of the entity. A role name may be assigned for this migrated attribute. Each role name assigned to a migrated attribute must be unique, and the same meaning must always apply to the same name. Furthermore, the same meaning cannot apply to different names unless the names are aliases.
- Every foreign key attribute must reference one and only one of the primary key attributes of the parent.
|
|
C.
|
Relationship—A relationship represents an association between two entities. It answers the following question: "If I know something about Entity 1, then do I want to know something about Entity 2?" If the answer is yes, then there is a relationship.
|
A relationship is given a name, expressed as a verb phrase (a verb with optional adverbs and prepositions) placed on or beside the relationship line. The name of each relationship between the same two entities must be unique, but the relationship names need not be unique within the model. The relationship name is always expressed in the parent-to-child direction, such that a sentence can be formed by combining the parent entity name, relationship name, cardinality expression and child entity name.
There are four types of relationships within a model. They are specific, non-specific, categorization and recursive. A specific relationship is usually referred to a one-to-many relationship. A non-specific relationship is usually referred to as a many-to-many relationship. A categorization relationship identifies an entity that is a type or category of the parent entity. A recursive relationship is where an entity has a relationship to itself.
A Specific Relationship, or a "parent-child relationship" is an association or connection between entities in which each instance of one entity—the parent entity—is associated with zero, one or more instances of a second entity—the child entity. Also, each instance of the child entity is associated with zero or one instance of the parent entity.
Specific Relationship rules:
- A specific-connection relationship always connects exactly two entities: a parent entity and a child entity.
- In an identifying relationship and in a mandatory non-identifying relationship, each instance of a child entity must always be associated with exactly one instance of its parent entity.
- In an optional non-identifying relationship, each instance of a child entity must always be associated with zero or one instance of its parent entity.
- An instance of a parent entity may be associated with zero, one or more instances of the child entity, depending on the specified cardinality.
- The child entity in an identifying relationship is always an identifier-dependent entity.
- The child entity in a non-identifying relationship will be an identifier-independent entity, unless the entity also is a child entity in some identifying relationship.
A Non-Specific Relationship, also referred to as a "many-to-many relationship," is an association between two entities in which each instance of the first entity is associated with zero, one or many instances of the second entity. Also, each instance of the second entity is associated with zero, one or many instances of the first entity. In a key-based or fully attributed model, all non-specific relationships must be replaced by specific relationships.
Non-Specific Relationship rules:
- A non-specific relationship always connects exactly two entities.
- An instance of either entity may be associated with zero, one or more instances of the other entity, depending on the specified cardinality.
A Categorization Relationship is used to represent structures in which an entity is a "type" (category) of another entity. A categorization relationship is a relationship between one entity, referred to as the "generic entity," and another entity, referred to as a "category entity." Each instance of the category entity represents the same real-world thing as its associated instance in the generic entity.
Categorization Relationship rules:
- A category entity can have only one generic entity. That is, it can only be a member of the set of categories for one category cluster.
- A category entity in one categorization relationship may be a generic entity in another categorization relationship. In other words, a categorization relationship may be part of a hierarchical entity structure.
- An entity may have any number of category clusters in which it is the generic entity.
- The primary key attribute(s) of a category entity must be the same as the primary key attribute(s) of the generic entity. Therefore, you can never have an identifying relationship to a category or sub-type entity.
- All instances of a category entity have the same discriminator value, and all instances of different categories must have different discriminator values.
- No entity can be its own generic ancestor. That is, an entity cannot have itself as a parent in a categorization relationship, nor may it participate in any series of categorization relationships that specifies a cycle.
- No two category clusters of a generic entity may have the same discriminator.
- The discriminator of a complete category cluster must not be an optional attribute.
A Recursive Relationship is where an entity has a relationship to itself.
Recursive Relationship rule:
- Only non-identifying relationships may be recursive (i.e. they may relate an instance of an entity to another instance of the same entity).
|
|
D.
|
Cardinality—the specification of how many child entity instances may exist for each parent instance is its cardinality. Within an LDM, the following relationship cardinalities can be expressed from the perspective of the parent entity:
- Each parent entity instance may have zero or more associated child entity instances;
- Each parent entity instance must have at least one associated child entity instance;
- Each parent entity instance can have zero or one associated child instance;
- Each parent entity instance is associated with some exact number of child entity instances; and
- Each parent entity instance is associated with a specified range of child entity instances.
|
Following these rules for a normalized LDM will ensure proper construction of a data model to manage all your data assets and provide a data-processing environment where basic data is captured once, stored once and shared as "information" throughout the organization. T
James D. Savage is a senior consultant in Teradata's Professional Services organization. Jim has 25 plus years experience focusing on defining system requirements using process and logical data modeling techniques. Jim also has extensive knowledge of business process re-engineering and has developed shared data systems for major corporations and the U.S. Department of Defense. He can be reached at james.savage@teradata-ncr.com.
© Teradata Magazine-March 2005
back to top
|
|
|