Data Modeling in the Healthcare Industry

George Wright

This whitepaper provides the premise and rationale for acquiring the Teradata Logical Data Model (LDM) designed specifically for the Healthcare industry, while providing an overview of the processes undertaken and the rationale for the development and use of an LDM in healthcare settings.

Email Print Download

 Average 2 out of 5


This whitepaper provides the premise and rationale for acquiring the Teradata Logical Data Model (LDM) designed specifically for the Healthcare industry, while providing an overview of the processes undertaken and the rationale for the development and use of an LDM in healthcare settings.

For those unfamiliar with logical data modeling or the use of an LDM in healthcare settings, this white paper provides specific insights to the discipline and the composition of data analysis and the resultant logical data model, some of its uses, and benefits.

The organization and delivery of healthcare services is an information intensive effort. Generally, the effectiveness of healthcare operations is greatly affected by the extent of the integration of information across all sectors. Healthcare organizations that have not integrated their data and related information in an effective and efficient manner will find it difficult to compete in the hypercompetitive healthcare market space, now or in the future. To overcome this deficiency, at a minimum, healthcare organizations will need an active enterprise intelligence environment predicated on an enterprise-wide data model, implemented as a comprehensive, highly scalable data warehouse; one capable of housing healthcare and corporate business data and related information in a single, fully rationalized form.

Healthcare organizations face numerous challenges in today's competitive environment among which are to: provide better patient care; utilize best practices; comply with industry and government standards; innovate new ways and means of comprehensive healthcare; improve internal and external communications; and do all of these things at a reduced cost to patients and healthcare organizations alike. Underlying the ability to surmount these challenges, healthcare organizations are increasingly finding their inability to leverage their data and information resource is at the heart of the healthcare dilemma.

If your organization is among those grappling with these challenges, and you are hoping for a vendor solution that saves you time, money, and effort, Teradata Corporation may have just the solution you've been looking for. It comes in the form of a comprehensive, enterprise-wide, healthcare specific LDM. This LDM from Teradata is capable of capturing every piece of data in your organization in a standardized, fully documented manner, enabling your organization to store one fact in one place without ambiguity or confusion, and thereby, enhancing its ability to leverage this invaluable resource.

The need for extensive data analysis and interface work could be greatly reduced with a standard data architecture that forms the cornerstone for application development, data warehousing, performance measures, budget planning, and network interfaces for the healthcare industry.

As evidenced in most of today's environments, business users and vendors find it difficult to support an incompatible, transaction-oriented applications environment. Instead, a framework must be secured and implemented to minimize incompatibility and maximize the exchange of information between systems. The Teradata® Healthcare Logical Data Model is designed to provide such a superstructure for healthcare environments to facilitate a common specification and representation of data and information. It is indeed practical and economical to obtain and commit to a standard data architecture with compatible interfaces for computer applications and information interchange in healthcare organizations.

The Teradata Healthcare LDM has been designed in four levels of decomposition to accommodate the various levels of views; from the business conceptual through database (data warehouse) implementation configuration, each without losing context or meaning. The decomposition process transforms each subsequent level into greater specificity. The final level of decomposition, (i.e., the database/data warehouse implementation model) is usually tweaked for performance purposes (See Figure 1).


The four levels are: (1) Conceptual, business high-level, subject area data model (2) Business LDM 3rd Normal Form, fully attributed data model (3) Preliminary Physical Database Design, populated with technical names/abbreviations (4) and finally, a database, data warehouse construction, or implementation data model configured to maximize throughput. We will explore these levels more fully.

The term data model means a representation of the data structure illustrating data entities and their relationships in a manner that is independent of hardware, software, functions, and organizations. This is important because hardware, software, functions, and organizations change. Data are the most stable resource of the organization.

Level One – Business or Conceptual Data Model

Sometimes referred to as a conceptual data model, or subject area model, is the overall logical structure of the high-level data categories. It presents a formal representation of the data classes needed to run the enterprise. The business model is the product of an enterprise-wide planning effort and reflects in broad terms the target data architecture. The level-one model often contains anchor or core data entities, and does not contain data attributes. An entity is a classification or category of data addressing such things as Person, Place, Thing, Event, or Concept of Interest to the enterprise. Data attributes are atomic level data items that describe the characteristics of data entities; for example the Patients subject area's core entity is Patient, and a data attribute of the patient entity could be Patient Identification Number; the Providers subject area core entity is Provider, and a data attribute of the provider data entity could be Provider Identification Number. Note that subject areas are stated in the plural form, while entities and attributes are stated in the singular form.

Level Two – Logical Data Model

The LDM is a detailed and structured representation of the business view of data resulting from comprehensive, broad-based data analysis. It contains the entities, their attribute identifiers (primary keys), and the relationships between them, an Entity-Relationship-Diagram, and supporting textual documentation. The level-two model is normalized to 3rd Normal Form, i.e., each entity contains a set of elemental or atomic level data attributes that fully describe, and are specific to, that entity and that entity alone. It includes the definitions for each object in the model, and is independent of any specific database management system.

Level Three – Preliminary Physical Database Model

Level three is the first transformation of the logical model to a physical database design. It includes the primary keys for each entity. It identifies relationships between related entities in the form of migrated foreign keys. It defines data format and functional dependencies. It includes some de-normalization and some over-normalization, and is the first step in supporting business User Views or query profiles. In the case of the healthcare LDM, it also supports the reporting requirements of most industry standards.

Level Four – Physical Database/Data Warehouse Model

Represents the form in which the data are actually stored in a specific database management system, to include various levels of adjustments that maybe required to address performance related issues.


Data analysis coupled with logical data modeling is a synergistic approach to identifying and defining the data environment. It uses data analysis techniques to separate the logical design of data from the system development process and from physical database design. Data modeling establishes a detailed inventory of the data as viewed by end users, and establishes logical data structures. End users are defined as individuals, groups, and others, who design, have stewardship over, consume, and/or utilize data in the performance of their job duties and related activities.

These logical data structures reflect a standard representation of the data. What they are, not how they're used, or when processed, or by whom. These logical data structures, once documented in the model, are available to multiple users, developers, or systems. They can be implemented in any micro, mini, or mainframe database management system.

While there is no official best methodology, some data modeling methods and techniques may work better than others. However, such differences are more likely to be the result of the people involved and their data modeling philosophy and practical experience. The Teradata methodology has stood the test of time, and has been refined and proven through years of experience in hundreds of engagements, and has proven to be the preferred choice.

Some methodologies seem good in theory, but may not be successful in practice. Our approach is to separate pragmatism from theory by starting with simple solutions and working towards the more complex as necessary.

A recurring theme in the healthcare LDM is a pragmatic approach to solving the problems facing the business, healthcare, and IT communities in healthcare settings. 

Data analysis and modeling rules and guidelines lead to a clean definition, resulting in logical structures in which each structure contains information about one data entity only. The clean data representation, as documented in the model, is easier to understand, easier to use, and easier to extend if new information is to be added later on.

In a quality logical data model, the data will be stable and will serve as a good foundation for future growth. Another way of expressing the clean definition objective is one thing in one place. Each business fact appears at exactly one place in the model.

To acquire a proper understanding of the data, you need a discipline, a set of rules that can help you think about data in a logical manner. Data analysis and modeling provides just such a discipline. The more complicated the data, the more data sharing occurs, the more necessary the discipline becomes.

The objective of analyzing any data structure is to express that structure in the form of a data model. In a data model, the information is represented by a small number of different constructs. In this model, the technique of Entity – Relationship – Diagram (ERD) is used.

Data Model Constructs

Data analysis and modeling provides a means of understanding and documenting a complex environment in terms of its data entities, their relationships, and their descriptive data attributes.

Many different modeling schemes have been proposed, offering different sets of constructs from which the models can be built. For the purpose of this model, the constructs used are ENTITIES, RELATIONSHIPS between data entities predicated on business rules, and DATA ATTRIBUTES that describe the data entities.

Each construct is named and precisely defined. A graphic representation of the model is included, showing the data entities as boxes and the relationships as lines connecting the boxes. In addition, the Healthcare LDM 0 Data Model depicts the primary and foreign keys. A diagram gives a useful overview, but cannot replace the definitions.

Data Definitions

Data Model

A model consisting of data entities, relationships, and supporting policy-based definitions of each representing some business area of the enterprise. The data model is a synthesized set of data-entities, fully described (data attributes) with identifying keys.


Person, place, thing, concept, or event, about which an enterprise gathers data. Data entity names are usually nouns and singular; uniquely identifiable by means of an identifier, referred to as the "primary key", that is known and managed by the business; mutually exclusive of other data entities; and relatively equal in importance to other data entities in the enterprise.


A business association between occurrences of one or more data entities, which embodies some relevant informational value. Although the basic component of the data model is the data entity, the data model is not merely a collection of data entities. The model must be given structure, which we define in terms of the relationships between data entities. Any two data entities may have a relationship defined between them. The degree of the relationship is based on the number of occurrences of the data entities that can be related to one another. A relationship is an expression of an applied business rule, i.e., the business rationale expressing the relationship between entities. For example, the relationship between the two entities PROVIDER and HEALTHCARE SERVICE is PERFORMS, and is represented by a line connecting the two entities.


A single, particular instance of a data entity or relationship type.


The Entity-Relationship Diagram is a diagrammatic representation of the entities and their relationships. It only includes those data entities required to support business information requirements.

Data Attribute

Unit business fact, atomic, i.e., it cannot be broken into parts that have meaning of their own. Attributes are descriptive properties of a data entity. For example, a primary data attribute of the entity Provider would be Provider-Identification-Number.

Data Modeling Rules and Guidelines

To keep the structure of the data model simple and easy to understand, the division among data entities, relationships, and data attributes must be kept quite clear.

It is important to state the basic rules of data modeling. They are simple, but very important. Data modeling depends on these rules.

Rule – Each object in the environment must be classified as one data entity, one data attribute, or one relationship. You must classify the objects you uncover in business area analysis in only one way. Something cannot be a data attribute and a data entity, or a data attribute and a relationship, or any combination of these objects.

Rule – A data entity must have data attributes. A data entity cannot exist unless it has data attributes.

Rule – Each data entity must have a unique identifier. If there is no identifier, there is no data entity.

Rule – If a data entity contains only an identifier and is related to two other data entities, then it is not needed. A relation that connects the two entities should replace such an entity.

Rule – Data entity sub-types are an occurrence of the super type. They are mutually exclusive groups of entity occurrences within the main entity class. Sub-types must have an identifying data attribute. Sub-types must have relationships with other data entities that are not applicable to all occurrences of the supertype. For an example, see Figure 2.


Rule – Relationships may only exist between data entities.

Rule – Only data entities have data attributes. Relationships cannot have data attributes.

Rule – If we can derive relationship information from other relationship information, an absolute path, we have no need to keep redundant relationships. Remove the redundant relationship from the model.

Rule – A data attribute must describe the data entity in which it resides. A data attribute is a descriptive property of only one data entity.

Rule – A data attribute must be dependent on the identifier of the data entity.


Is There a Logical Data Model Suitable for the Healthcare Industry?

Teradata has developed the most comprehensive 3rd Normal Form LDM in the Healthcare industry. At the core of Teradata's Healthcare LDM is a set of comprehensive healthcare data constructs capable of capturing all of the basic healthcare data/information needed to provide comprehensive healthcare services and beyond to any given healthcare population. In addition, Teradata in its forward thinking, has also included other data components to assist healthcare organizations with addressing broad enterprise needs. For example, financial accounting data constructs have been expanded to support standard financial accounting and related reporting as it relates to various aspects of healthcare financials.

The Teradata Healthcare LDM contains 11 business subject areas, i.e., high-level data categories, cited by the industry as vital to support the data and information needs of the healthcare environment. 

The Teradata LDM subject areas are:

Agreement – contains details of the arrangement(s) between Parties (e.g., customers) regarding a specific product or service, such as the contract between healthcare payers and providers.

Campaign – contains details required for communication plans to deliver a message for a desired action by a customer, prospective customer, or member, for the purpose of obtaining new clients and/or strengthening relationships with existing clients, etc.

Channel – specifies the data relative to any means by which a party, i.e., customer, prospect, or others, interacts with or communicates with the healthcare enterprise.

Claim – contains the details for requests for payments for reimbursable services that were performed, such as dental services or medical procedures under either a group or individual policy.

Clinical – represents patient encounters with healthcare service providers. This includes details regarding administration of the appointment, services rendered, establishment of an episode of care, and clinical studies.

Event – records any contact or transaction that involves a customer or prospect or a customer's account. It may or may not involve money. 

Feature – a concept that is central to the Healthcare LDM. It employs general aspects of healthcare insurance that are applied to many different business entities in the model. In this capacity, features are extensively used throughout the model to define products, coverages, services, amounts, rates, terms, and quantities. 

Financial Management – provides data to support the general ledger and internal financial aspects of the enterprises.

Geography – includes information about any geographical area on earth and locators (addresses), and may include such things as city, state, province, country, village, region, or census block.

Party – defines the people and organizations of interest to the enterprise. The Party concept is a level of abstraction that offers much flexibility and provides a single view of an individual or organization regardless of how many roles one might play.

Product – contains information about the healthcare enterprise's products and services along with those offered by competitors. Products (including services) are the things that are marketed to parties for the purpose of generating revenue.

Each subject area, in turn, is decomposed into internal classes of data called "Entities" representing; People, Places, Things, Events, and/or Concepts of interest to the healthcare vertical. Each entity, in turn, will contain logical business data items or data fields that describe or show the property or makeup of that entity. These properties take the form of logical data fields, or data attributes. For example, the patient entity will contain such data attributes as patient identification number; gender code; and date of birth. Only the data attributes that are specific characteristics of a given entity are permitted to be associated with that entity. Every data attribute contains a full business description or definition of what it is. Further, each data attribute is given a business name predicated on its description or definition. Each data attribute is assigned a specific data format that specifies the type of data it represents, such as numeric for numbers, character for non-numeric, etc. Each entity is associated via a business rule with at least one other entity in a binary fashion. The association is referred to as a relationship and can only exist if there is a true business need for a given association. For example, a provider (entity) administers care (relationship) to a patient (entity). What this construct provides is the wherewithal to capture which provider administers care to which patient. Additional related constructs provide for capturing what services were provided and why, as well as other related information regarding the care of that patient.


Appendix A – Teradata Healthcare LDM Questions and Answers

Q. Is there a potential Return on Investment?

A. There is a definite ROI. Short-term in labor and development cost and longterm by providing the underpinning infrastructure to eliminate data anomalies. Consider this, on average, companies where databases are constructed without such an architecture will usually have anywhere from five to 10 iterations of each unique data item in the company. That amounts to 500% to 1000% data redundancy. Also, the redundantly stored data items are usually named and/or formatted differently in each iteration. That is not only wasteful, but creates data chaos, which circumvents comprehensive data collection and assembly, thereby making information incomplete at best, if not outright incorrect. This same information is used to make business decisions affecting every aspect of business and patient care. Independent of the associated IT cost, which is significant in and of itself; the cost saving in patient care and better business decisions predicated on this information can be astronomical. The best minds and/or processes can not negate the effects of poor data, or faulty information. Decisions made based on such information may result in missed opportunities, erroneous conclusions, poor business decisions, mis-directions, penalties for missing government mandates, improper patient care, and in extreme cases, patients' fatality, each potentially resulting in a window of vulnerability or costly penalty.

Q. How would this data model help eliminate this potential vulnerability?

A. The data model can be used as the corporate data underpinning and single source of truth. Once installed and customized, each iteration and/or permutation of data items existing in the operational databases/files will need to be identified, reviewed, assessed, and mapped to a standardized representation of same in the data model. This exercise is referred to as data rationalization. Segments of the data model can then be used to build newly constituted databases. The data maps will act as the road maps to extract, transform, and load disparate data from current operational databases to the newly constituted databases and/or data warehouse, resulting in a complete elimination of unplanned data redundancy. The data in the new databases and/or data warehouse will have standardized names, formats, definitions, and where applicable, standardized permitted values. These standardized data elements and permitted values will further constitute a data lexicon and provide a means of enhancing a common understanding of data among the IT, healthcare, and business communities, while eliminating data ambiguity, data conflicts, and skewed information used for business decisions and/or patient care.

Q. To transform the data as described can be a long and expensive undertaking. By the time it is finished, things may have changed drastically, and we would have wasted time, money, and effort with little or no ROI.

A. The transformation is a time-boxed build-out with incremental value add deliverables over time, starting with the most important segment of data such as customer, member, patient, or provider. These are referred to as subject areas. Once transformed, each subject area can equate to a single database master file or data warehouse data increment. Subject areas can be addressed singularly or in multiples. Also, the data model can be adjusted in flight when and if the need arises. Once customized and adjusted for your environment, additional adjustments should be minimal. And unless you change businesses, your data will remain stable with very few changes over time. The point here is every organization needs a data framework to build into. The healthcare LDM provides that framework and more.

Q. When can we expect the first deliverable, and what form will it take?

A. Depending on the availability of appropriate personnel, related skill sets, and tools, you can expect your initial deliverable in less than eight weeks. That deliverable will be in the form of a customized enterprise-wide data model complete with subject areas, related entities, business rules, and a standardized set of data attributes including definitions and standard names, standard formats, and where applicable, permitted values as recognized by the community of end users, Erwin data modeling tool with supporting documentation, etc.

Q. And then what?

A. You are now ready to begin the data rationalization process and to construct/ produce consolidated subject area databases and enterprise data warehouse data increments.

Q. How long would this effort take without this data model in place?

A. First, you would need to construct an equivalent data model. If done serially, or one subject area at a time, it may take up to 42 months. In any case, your in-house cost would far exceed licensing the Teradata Healthcare LDM, which could be obtained in a matter of days. Cost is based on time and material and can vary depending on staff, skill sets, expertise, wall-clock time, and access to relevant personnel. If built in sets of three, (i.e., three subject areas at a time) it may take as few as 14 months; the cost would be the same, (i.e., less time, more staff). Once a subject area is completed, you can then begin the data rationalization process for that subject area.

Benefits of Implementing the Teradata Healthcare Logical Data Model:

  • Improved data consistency and data quality.
  • A single, consistent view of data across the enterprise.
  • Improved communications with internal and external clients/customers, agencies, and vendors.
  • Having in place the underpinning infrastructure required to consolidate data from disparate databases; construct/ develop the data portion of Electronic Medical Records (EMRs); facilitate internal and external data/information interchange, such as Regional Health Information Organizations.
  • Enhanced ability to better leverage existing data into usable information in business analytics, outcome studies, Business Intelligence, Customer Relationship Management, and other key data intensive business initiatives (e.g., data warehousing and decision support).
  • Lower implementation cost through the reuse of existing data/information.
  • Simplified and rapid development of new application systems.
  • Substantial cost reduction associated with delivering business capabilities tied to enterprise data.
  • More effective use of information technologies, such as database management systems, portals, internet, metadata repositories, and change control mechanisms.
  • Elimination of, or significant reduction in, cost associated with redundantly stored data, related processes, personnel, and CPU usage.
  • Improved healthcare delivery while reducing information related cost.
  • When implemented in an active enterprise intelligence environment, such as Teradata's AREA; access to relevant information at the point of need.

Q. Are there specific environments this healthcare data model has been designed to support?

A. The Teradata Healthcare Logical Data Model has been designed to support the following environments:

  • Payers and Insurers
  • Health Maintenance Organizations (HMOs)
  • Preferred Provider Organizations (PPOs)
  • State Run Healthcare Institutions: Federal, (e.g., Veteran), State, and County
  • Integrated Healthcare Care Organizations (HCO)


Information is the life blood of any viable organization. That could not be more true for the Healthcare industry and related communities in today's overburdened healthcare environments. Thus, it has become imperative that data and information needed to conduct patient care, as well as business operations, is made available in an accurate, timely, efficient manner, and is understood by all impacted parties without ambiguity. The underpinning foundation for such data and information is predicated on an industry specific, enterprise-wide logical data model, such as the Teradata Healthcare LDM. Don't build a healthcare data warehouse without it.

For more information, contact your Teradata representative or visit

About the Author

George Wright is President, founder, and principal consultant of the American Information Management Group; a Southern California-based corporation specializing in Information Resource Management consulting for the Healthcare industry. He has more than 24 years experience in various facets of Information Resource Management, including Data Architecture, Data Warehousing, Strategic Information Planning, Metadata Management, and Technology Architecture; with fourteen years in Healthcare Information Technology in several large environments, including Kaiser Permanente, Blue Cross of California, and WellPoint. He has instituted and/or revitalized Data/Information Management in multiple large, complex environments. He and members of his organization have developed a comprehensive enterprisewide Healthcare Data Architecture suitable for identifying, standardizing, and normalizing data in healthcare settings, as well as an Information Management Guide as a "how-to manual" for developing and implementing Information Management Centers of Excellence. He has written several articles about data warehousing, metadata management, and corporate information management.

He recently joined Teradata as a member of the Financial, Insurance, and Healthcare Global Services organization, where he currently serves as the Healthcare Global Practice Partner.