Server Management - Windows computing for IT professionals & strategists

RSS

Spilt ends

Story by Mark Whitehorn, 01-06-2008, 0 comment

Teradata epitomises the top end of the data warehouse market, so is its shock announcement of a data warehouse appliance a sign it is changing course or is it intelligently exploiting a market opportunity?

I've just come back from the Teradata Universe conference in Lisbon where Teradata announced that it had extended its family of products. Of particular note are the Teradata 550 and Teradata 2500. The 550 is a symmetric multiprocessing (SMP) box with a price in the order of $67,000 (£33,000) per terabyte. Teradata describes it as a departmental data warehouse. "The Teradata 550 was developed to run a single application or support test and development workloads. It is simple to set up and can use the Novell SUSE Linux 64-bit operating system or Windows. Customers can license and run the Teradata 12 database on their choice of Intel-based platforms, starting at $40,000." The Teradata 2500 is described as an "entry-level data warehouse".

Despite the terminology Teradata uses to describe them – "analytical platforms" – these are what you and I would recognise as data warehouse appliances.

This announcement is significant because Teradata doesn't just lean towards the high-volume, complex analytics end of the data warehouse spectrum, it is that end of the spectrum. And DWAs are from the other end. So finding Teradata with a DWA is like finding a Wagon Wheel in Fortnum & Mason's food hall: not impossible but certainly unexpected. This is about a totally different philosophical approach to data warehouse architecture.

What's this spectrum?
We can conveniently (and somewhat simplistically) identify three main data warehouse architectures. All of them start with the premise that the operational data of the enterprise sits in a set of transactional databases (HR, finance, sales, etc). The systems are great for performing transactions but bad at reporting and analytics.

However, after that, the three architectures differ significantly.

Isolated data mart
Each department looks after its own reporting needs. The data from its own transactional system(s) is copied and placed in a reporting server, often called a data mart. This data mart is used for reporting (Figure 1).

split ends 1


Figure 1: The isolated data mart architecture
This architecture is very cost effective and quick to implement. The disadvantage is that there is no attempt to unify the meaning of data across the enterprise.

In practice this is a serious issue. Many of the entities that feature in reports (Customers, Products, Employees, etc) are of interest to many departments. If the data marts are created and managed independently, it is likely that the data collected (and the definitions applied) will differ between them. So when two departments calculate a value for, say, the average sale per customer, they will get different answers. As soon as that discrepancy is spotted (at, say, a board meeting) the limitations of this approach are thrown into sharp relief.  

Enterprise data warehouse
At the opposite end of the spectrum, the enterprise data warehouse (EDW) neatly answers this particular criticism. Instead of using isolated data marts, the data from the operational systems is pulled into one central repository (the EDW). There it is massaged into a relational set of data and reports are generated directly from that data set (Figure 2).
split ends 2

Figure 2: The EDW architecture means everyone has the same definitions of entities

The major advantage of this approach is that it more or less forces people from the different departments to cooperate and agree definitions of entities like customer. (The EDW can, of course, support multiple definitions – New Customers, Lapsed Customers, All Customers, and so on.) So the EDW should end up with a set of agreed definitions and everyone within the organisation should see the same version of the truth. This is a huge advantage; the disadvantages are that a fully blown EDW is more difficult to design and implement and also requires a seriously powerful machine at its heart.

Hybrid architecture
Between these two extremes we find, not unreasonably, the hybrid model, which attempts to combine the strengths of both. Data is stored in a central repository called a data warehouse. Unlike the EDW, the data in the data warehouse is often structured as dimensional, rather than relational, tables. However, the major difference from the EDW is that the data warehouse itself isn't often directly used for reporting. Instead, subsets of data from the warehouse are copied to data marts, each data mart servicing the reporting needs of a department or group of users.
 
This solution is relatively quick to set up and can be developed over time. However, data is often duplicated multiple times (because data from the financial system, say, may be required in many of the data marts). This can lead to problems with auditing, network loading and the sheer time spent moving data around.

The major players
These descriptions form a somewhat simplified view of data warehouse architecture, compartmentalising what is in fact a spectrum. Many enterprises implement some aspects of two or all three of the models. However, the descriptions do provide a framework for putting the major players into the picture.

The low end
The isolated data mart architecture is the habitat of data warehouse appliance suppliers, including DATAllegro, GreenPlum, Kognitio and Netezza. Each vendor has its own ideas and definitions of a DWA, so this description is of necessity a generalisation. That said, a DWA is often pre-configured and built to service a particular sector, such as a finance department. Templates to simplify common tasks (such as defining dimensions and measures) are frequently provided. Typically they are very quick to set up, have limited scope both in terms of the data they'll accept and therefore of the data they can display.

Teradata territory
The EDW architecture is Teradata territory. The company has an immensely powerful database engine, which stores data as relational tables and yet can run very large and very complex queries against it with impressive speed. (Relational tables are notoriously slow to query.)

Hybrid producers
The hybrid territory is exemplified by Microsoft, with SQL Server as the database engine and Analysis Services used for the data marts. Other players include Oracle, IBM, Business Objects, among others.

Hopefully, the interest in Teradata's announcement is now clear. Here is a company that has, right from its inception in 1979, stressed the need for a centralised, unified, core EDW. That same company has just announced a DWA. Not unreasonably I asked Randy Lea, vice president of Product and Services Marketing for Teradata about it.

He was at pains to point out that this did not represent any change whatsoever in Teradata's thinking or direction. The company still sees the EDW as vital for the enterprise, it is not promoting the data mart model as a viable alternative, it will continue to put its efforts into the EDW... End of story.

OK, that's clear enough, but then why introduce the 550 and 2500? Teradata has found that a number of its potential customers are considering the use of departmental data warehouses. Having these new products allows Teradata to start a dialogue with those customers. That dialogue may eventually convince the customer to pursue the EDW route; if not, then a 550 or 2500 can be installed. But, crucially, both of these run the Teradata 12 database engine that is common to the entire Teradata range. So, Lea argues, the customer then has a very simple and elegant upgrade path to a more enterprise-wide solution if and when that is considered appropriate.
 
What do I think? I am aware that as a cynical journalist I should take the contrary view and say that Teradata is frightened of the emerging DWA vendors and that it is changing horses in midstream and now promoting the data mart approach. The only problem with this is that I don't believe it. I am (rightly or wrongly) convinced that Teradata is telling the truth, that this announcement merely demonstrates that it has seen an opportunity in the market and is intelligently exploiting it. This may be very good news for many of us.
 


Post new comment





500 characters left
Top 10 Most Popular Articles

Want to advertise here? Follow me!
Syndication.
Create an account
News & Features Feed
Viewpoints Feed