He who takes the wrong road makes the journey twice. —Proverb
by Rob Armstrong, Director of data warehouse support at Teradata
It has often been said that data warehousing is a journey. When you're on that journey, you will pass through Report-ville, Cube-city,
Analytics-burg and Predictive-place and head to Triggered-town on your way to the bustling metropolis of Active Data Warehouse.
Like any journey, when developing an active data warehouse, it helps to chart your path; you'll want a general idea of the route you are going
to take through the hills of data consolidation to activation.
So just what are the hills, peaks and valleys on this journey? The peaks are the major towns mentioned above where you'll stop, with the
uphill climbs being the various reinvestments in your data warehouse environment as you move forward. You want to avoid the downhill jaunts,
side roads and dead ends—these are the pitfalls when you stop evolving in the data warehouse process and start losing ground.
With that in mind, let's start traveling along the road to Active Data Warehouse.
Data mart consolidation
"We're all entitled to our own opinions. But none of us can afford to be wrong in our facts."
—Mort Crim
Many companies start their journeys with data spread across the organization, redundant or inconsistent data—even data that is unaccounted for.
Instead of having a warehouse they have a "where-house"! The general consensus is that consolidation would allow users to access it with
greater ease.
While true to a point, the act of bringing the data together actually begins the process of identifying what problems must be overcome. Data
consolidation will give the users a quick boost. No longer will they need to search multiple places or move data around for joining. Whatever
potholes the consolidation effort uncovers, it is an important first stop on our journey.
Of course, the real questions are which data should be consolidated and what value does consolidation bring to the corporation.
For the consolidation effort to be successful, it must be based on business needs. You must first determine which data should be brought
together, then find the current locations of that data. Ultimately, you will understand the extent of your data redundancy or inconsistency
and begin the process needed to ensure data quality.
The danger is thinking the journey is over once you reach this point. Simply consolidating the data will highlight the data inconsistency in
definition and timing. As the users start to join data together more efficiently, they might question the accuracy of the data, or they could
have more problems running analytics because of the inconsistency in data types, definition or content and, therefore, might question why they
even started the trip.
This is good if it is seen for what it is: identifying the problems that must be overcome to establish a sound data and information management
strategy. However, oftentimes the inconsistency is not addressed, leading users to mistrust the data warehouse. Consequently, users are prone
to return to their silo systems, thereby diminishing the data warehouse value.
Now is not the time to end your journey. Regardless of the impending hills, you must now gas up the car and drive to the next stop.
Data integration
"A man with one watch knows what time it is. A man with two watches is never sure."
—Segal's Law
As mentioned earlier, the problem with (or benefit of) consolidation is that it highlights the real data management and usability issues. The
point to take away is that consolidation is not the same as integration. This has been a frequent misconception I've been hearing among
customers over the past year.
Once the data is consolidated the next step, which must be committed to from the start, is integration of the data model. Integration will
provide analytical consistency, ease of extract, transform and load (ETL) processes, and time-to-market reduction. The foundation is now laid
for sustainable data warehousing—including going active. Data consolidation is about technical return on investment (ROI); data integration is
about business ROI.
You're now at the long haul of your journey where work needs to be done to set a direction and roadmap on data management, quality and
security, and bring a consistent set of data to drive the total organization. Unfortunately, many people are afraid to take this leg of the
trip: The drive is long, and often navigation must be made around or through roadblocks. Governance and leadership are critical here to give
direction and make people more comfortable with the concept and process. The goal of data integration is one that cannot be missed, and there
is no shortcut to this destination.
Just as you do not automatically jump to a destination, you cannot achieve data integration in a single bound. It is a cyclical journey,
driven by business need, data availability and ROI. A thoughtfully planned course is necessary to keep short-term benefits in line with
long-term objectives.
Once at the integration oasis, you will start to take some day trips. As you get comfortable with the integration of data elements and
understand the impact to cross-functionality and processes and actions that cut across the company, you will discover what data is missing or
which data could be used to complement the existing data warehouse.
Data expansion
"The road to success is always under construction."
—Lily Tomlin
Now, one of the tricky parts of the journey is deciding which day trip to take next. Like a family road trip where everyone has a desired
attraction to visit, the business and IT communities will certainly have preferences as to which data subject area should be next in line.
Of course, not everything can be first. Just like our family road trip, priorities must be established and decisions must be made.
The real question of what data comes next is based on two key factors: cost to acquire the new data elements and their ensuing benefits. Does
the data readily exist, and is it in a fairly "clean" state? Will there be many transformation issues to get the current as well as historical
data? Does the current system have capacity for this subject area? These are all questions that must be asked from a cost perspective.
From the benefit side of the equation, you want to understand the value of incorporating the new data. Will the new subject area align well
with our current data areas? What current analytics will be enhanced by the new data? What new capability will be enabled, and what key
performance indicators does it tie back to in our company objectives?
The results of this analysis can help you prioritize the next subject area and phase. The added bonus is that by understanding the benefit
part of the equation you can start to set business milestones and measurements into the evolution of the data warehouse implementation.
While I have presented this as a linear journey, you will actually end up making a few side trips between integration and scope in order to
extend the data warehouse and its capabilities throughout the organization. One point to make is that, like many families taking road trips,
not all business units will have to "travel as one." Some may choose to have more expansion than other business units before heading onto our
next big stop.
Data acquisition
"It is not so much what you know anymore that counts, it is how fast you learn."
—Robert Kiyosaki
Much like the data expansion, the frequency and timeliness of data is like a day trip. When you look at what new data can be of value, one
dimension to consider is the timeliness of that data. Can the benefit increase as the data acquisition becomes more real time? Which data is
most beneficial is based on a correlation between the data's timeliness and level of detail, and its cost and value to the organization.
Timeliness is often thought of as having the data in real time. This is not normally the case. Timeliness has two aspects: frequency and
granularity. The data's business usage and value justify how frequently the same level of data is collected and at what scope of granularity.
The question is whether having the same data sooner would enable you to make the same decision sooner and, ultimately, reap a greater or
quicker benefit. This depends on your ability to respond to information. If you can take inventory action only once a day, then whether you
know about an outage at 8 a.m. or 8 p.m. is irrelevant. However, if you can address inventory requirements throughout the day, then receiving
the information earlier makes a great difference.
Regarding the data's granularity, the level of data you acquire helps to determine the degree of analysis and subsequent decision making. You
may load the data only once a day, but rather than a daily aggregation you provide hourly transactions. With more granular time divisions, you
can readily spot trends and see whether incorporating changes in your processes and responses makes better business sense.
Welcome to activation!
Now for a quote of my own: "If you are not taking action, then stop making decisions."
You've arrived at the metropolis of Active Data Warehouse. As you have traveled through the integration and timeliness points in our journey,
the ultimate outcome of data warehousing is that actions are triggered by the data itself without requiring constant intervention. It is only
the exceptions that need attention, and the data will alert us to those situations.
The road to active is long and shortcuts must not be taken. You can successfully navigate your way by maneuvering through all of the other
phases. As you understand what data is meaningful to the processes and the timeliness those processes require, only then can you move forward.
The important message is that data-driven actions will also drive new data points. And remember, this is an ongoing adventure—the Active Data
Warehouse metropolis is just a stone's throw from active enterprise intelligence. T
Teradata Magazine-June 2007
|