from_the_ground_Insiders1

 

Tech2Tech

Insider's Warehouse

From the Ground Up

Business Intelligence Roadmap offers project life cycle methodology for decision-support applications.

A consultant in the enterprise architecture industry once compared building systems to building airplanes. While a paper or model airplane can be constructed with little forethought, a jet airplane cannot.

Similarly, a small stand-alone application can be built without careful planning, but a data warehouse with its many dependent business intelligence (BI) applications cannot. Yet many project teams try to develop their data warehouse and BI applications using no methodology at all. As a result, they quickly become overwhelmed with complexity, unanticipated problems, added tasks, too much rework and wasted time, and a project that seems out of control.

6 stages with 16 steps

A methodology I co-developed, the Business Intelligence Roadmap, is designed for data warehouse and BI applications. The 16 development steps of the roadmap are grouped into the following engineering stages of justification, planning, analysis, design, construction and deployment. (See table, below)

Justification

An assessment, which includes cost justification, is made of a business problem or opportunity. This gives rise to a data warehouse initiative or BI project.

1. Business case assessment

First, define the business drivers and propose a data warehouse or BI solution. Each BI application should clearly define the benefits of either solving a business problem or taking advantage of an opportunity. You may also want to find out what types of BI initiatives your competitors have and what the risks may be for not implementing a data warehouse or BI solution at your organization. Interview people at all levels in the organization in order to find out their understanding and expectations of a data warehouse and BI project. This will dictate whether an enterprise solution should be architected or whether building departmental or stand-alone BI applications is more appropriate.

Discussing the initiative with others will also help decide the executive sponsor for the project and whether collective sponsorship from all business executives will be achieved for one cohesive enterprise solution.

Planning

Strategic and tactical plans lay out how the initiative or project will be accomplished.

2. Enterprise infrastructure evaluation

Equally important to supporting an evolving data warehouse are:

  • Technical infrastructure
    • Hardware, software, middleware
    • Database management systems, operating systems
    • Network components
    • Developer tools, end-user BI tools
  • Non-technical infrastructure
    • Metadata standards, data naming standards, enterprise data model
    • Data warehouse methodology, guidelines
    • Testing procedures, change control process, issues management procedures, dispute resolution procedures

Identify the technical and non-technical infrastructure components you already have in place and which ones you still need. Evaluate the completeness and effectiveness of those components and identify activities for enhancing or upgrading them.

3. Project planning

The difference between data warehouse/BI projects and traditional IT projects is the emphasis on data as opposed to processes. Before committing to a project deadline, know the condition of the source files and the effort it will take to sanitize the data. Be sure the project plan considers that data-related activities can take four to five times the effort of process-related activities. These activities consist of data modeling, profiling, standardization and integration, and data cleansing. Most project deadlines are missed because the project team underestimated the amount of dirty data and its impact on the extract, transform and load (ETL) process.

Analysis

Detailed analysis of the problem or opportunity is performed to identify business and data requirements for a solution.

image

Click to enlarge

4. Requirements definition

Stable requirements rarely exist on data warehouse and BI projects because users often demand too much data in an impractical time frame—they want everything, and they want it yesterday. Break the requirements into multiple packages that you can deploy in iterations. Start with requirements that have the highest business value, are the most stable (least likely to change), and are sourced from relatively clean and easily integrated data. Never try to freeze requirements, but allow users to change them when they see BI capabilities unfold.

5. Data analysis

An in-depth understanding of the data requirements is produced in this important development step, including standardized data names, definitions, domains, business rules and data quality rules.

The two major activities involved are top-down data modeling and bottom-up source data analysis. Top-down data modeling is the creation of a business data model with the participation of the business users. Bottom-up source data analysis refers to the data archaeology performed on the source data to confirm or refute the business rules identified during the business data modeling process.

Only trained data analysts or data administrators should perform this step, not developers.

6. Prototyping

Performing system analysis, design, coding and testing allows users to assess and change their requirements with less impact early in the project schedule.

The costs of experimenting with different database designs, visualization methods, development tools and programming languages are much lower during prototyping than during development. Some prototypes are nothing more than smoke and mirrors and are used to obtain user buy-in, validate functional requirements or prove a concept. Others with longer life spans produce partially functioning pieces of the application.

7. Metadata repository analysis

The four types of metadata and where they are captured are:

  • Business—in a data modeling tool
  • Technical—in the relational database management system (RDBMS) and online analytical processing (OLAP) cubes
  • Process—in ETL, OLAP and other developer tools
  • Usage—in the RDBMS and manually by the DBA monitoring the data warehouse environment

These different types of metadata must be mapped to one another and stored in a repository. Metadata repositories can be purchased or built. In either case, document in a meta model the type of metadata to capture, how to map it and where to store it.

Design

A product is conceived that will help solve the business problem or enable the business opportunity.

8. Database design

A database can be either relational or multi-dimensional, depending on the underlying RDBMS, the purpose of the database, and its access paths and BI tools. A relational design offers maximum flexibility for an operational data store (ODS) or an enterprise data warehouse (EDW). A multi-dimensional design offers maximum performance and is more common for data marts that are tailored toward specific reporting patterns or BI applications.

The project’s specific performance considerations should influence the database schema you choose as well as the physical components, such as indexing strategy, data set placement, partitioning and clustering.

9. ETL design

ETL functions are designed for different processes:

  • Initial population is very similar to a system conversion. The main task is to map the selected data elements from the current source files to the target columns in the data warehouse databases.
  • Historical population is a process that loads historical data from old files. This set of specifications will be slightly different because historical data has usually been archived to off-line storage devices. Since the record layouts of the files and databases change over time, the programs for the historical population have to recognize those changes.
  • Incremental update refers to the ongoing updates (i.e., inserts) to the data warehouse databases. Two things must be considered for the incremental update: whether to extract all records from the source files or only the changed records (deltas), and whether to pull data from the source files or push out the data from the source systems. In many data warehouse environments, the optimum choice is to have push-out deltas only.

10. Metadata repository design

If built in-house, several metadata repository structures must be designed: the database, the migration programs that load and link the metadata from the tools where it is captured, the interface programs that communicate with the tools and users, and the online help function. If purchased, the metadata repository must be installed and tested.

Construction

The conceived product is built, and the design elements are incorporated.

11. ETL development

Implementing the ETL process is almost always done through an ETL tool. Some ETL tools are more sophisticated than others. Thus, in some cases, additional code must be written to supplement their functions. Because the ETL process is the most complicated process in the data warehouse environment, it must be thoroughly and formally tested. This is done with a formal test plan, which contains test cases, test sequence, expected test results, actual test results and a test log. Recommended types of tests are unit, end-to-end integration and regression, performance or stress, quality assurance (pre-production), and user acceptance.

12. Application development

The various prototyping results are used to construct a production-worthy front-end BI application. This can be a state-of-the-practice OLAP cube with reports and canned or parameterized queries. It could also be a state-of-the-art dashboard or scorecard. In either case, the BI application is subjected to the same rigorous testing activities as the ETL process.

13. Data mining

A specialized data mining application uses a tool to execute data mining operations against a data pool. This pool is based on an analytical data model, which is usually developed by a statistician.

Data mining activities include identifying the business problem domain; collecting, consolidating, cleansing and preparing the data for the analytical data model; and populating the database. The results are often displayed visually in charts (bar, pie and lift), decision trees, scatter plots and histograms.

14. Metadata repository development

Creating a database, writing metadata-specific migration programs to extract the metadata, coding interface programs to deliver metadata to the users, and providing an online help function are some activities to developing a metadata repository.

Another development step is writing ETL-specific metadata programs. These programs capture load statistics, reconciliation totals, data quality statistics and error statistics produced during the ETL runs. After each run, these statistics are loaded into the metadata repository.

Deployment

The finished product is implemented and its effectiveness measured to determine whether the solution meets, exceeds or fails end-user expectations. Additionally, a pre-defined time frame has been established to determine return on investment (ROI).

15. Implementation

After the programs are tested, the data warehouse, BI application and metadata repository are ready for production. The databases and program libraries that will house the ETL and BI programs must be created. Once the data has been loaded into the databases, all security measures should be tested again. The users are then trained, and the support functions—such as a help desk, maintenance of the data warehouse databases, scheduling and running ETL jobs, performance monitoring and tuning, and monitoring usage—are initiated.

16. Release evaluation

A data warehouse is not a system but an evolving and expanding environment. It is beneficial, therefore, to review lessons learned at the end of every project. Not only does this enable the team to anticipate and prevent problems in future projects but it also helps to streamline the development process.

Any tools, techniques, guidelines and processes that were not useful should be re-evaluated and adjusted or, possibly, discarded. Any missed deadlines, cost overruns, disputes and resolutions should be examined, and adjustments to the processes should be made before the next project begins.

The release evaluation review is also the perfect forum for repackaging dropped data or functionality and reprioritizing the next projects.

Data warehouse/BI development tracks

Although the development steps are presented sequentially and mapped into the engineering stages, these types of projects are rarely executed in this order. Some steps could be combined or bypassed. At a minimum, most projects will divide the development effort into multiple tracks. (See figure)

image

Click to enlarge

The two most common tracks are ETL and BI application. A third track may also be needed: metadata repository. If data mining is a full-time activity at your company, the data mining step can be separated into its own track. The table above shows which steps apply to each track.

While each track can have its own team and set of activities, they are interdependent. All tracks start and end their activities at the same time, but the activities during the analysis, design and construction stages may diverge in terms of duration and work organization.

The back-end ETL track may spend more time in analysis before designing and developing the ETL process, and the front-end BI application track may prototype extensively. The metadata repository track may be organized as a separate but parallel development project, or it may simply involve buying and installing a metadata repository product. Since discoveries made on one track often affect the others, it is important to manage all resources as one project team.

Supportive methodology

Successful data warehouse and BI initiatives evolve into complicated enterprise-wide decision-support environments. A strong foundation is required to support various BI applications for different types of users. To build such a foundation, the question is not whether a methodology is needed, but what type is the best option.

The data warehouse is built in increments to accommodate requirements that are inevitably evolving and morphing. Its methodology, therefore, must support periodic, spiral architectural redesigns and have a strong focus on data and powerful enterprise infrastructure activities.

Data warehouse projects are full of challenges, but having to remember hundreds of tasks should not be one of them. That’s why a good methodology specifically designed for data warehouse and BI projects is necessary.

Planning of this sort will help strengthen any company’s BI investment.


twitter
StumbleUpon
Delicious
Facebook
Google
Technorati
Reddit
Your Comment:
  
Your Rating:

Comments