Teradata Magazine Cover Teradata Magazine Online  
Register Help Password
Password:
Quick Links
Current Issue
Archives
Teradata.com
Teradata Magazine Rss Feed
ARCHIVES Search Teradata Magazine Online:  














Data quality
Wanna save money? Get rid of bad data!

Tech tip
Dynamic select statements in stored procedures

Corporate performance reporting
Leveraging industry data models for financial management excellence

Just the FAQs
We simplify the complicated. Read the FAQs posted online or ask the experts





Query? Got Questions about Teradata?
Send your technical inquiry to the experts in Teradata engineering. From the architects to the developers, we'll get your question to the appropriate certified Teradata professional for resolution.
teradata.query
@teradata-ncr.com

The impact of RFID on data warehousing

By Dan Linstedt

The amount of information that will be generated by radio frequency identification (RFID) tags and other micro-devices is on the verge of exploding. That leaves us with questions like "What happens to data quality? What data should we capture, and how often should we capture it? What about 'white noise'?"

While we can't address every issue regarding the coming data avalanche, we can highlight some of the more "front of mind" concerns surrounding RFID.

All about RFID
While a lot has been written about RFID, not a lot of thought has been given to the information architecture necessary to support the data streaming in from this technology. RFID tags are just another data source, right? What does this mean?

The data from RFID tags can easily overwhelm any interface in use today. Teradata, while the most capable of all RDBMSs to handle this influx, currently doesn't have remote filtration capabilities. Thus, today we can't code a business rule in Teradata and then mark it for distribution to the RFID listener. The ability to maintain a single point of business rules, and then allow distributed processing of these business rules, may soon become a necessity in order to support the data generated by RFID interfaces. However Teradata, as I note later in this article, is best positioned to take on the challenge.

Let's step back for a minute. How does the RFID function?
From a simplistic definition standpoint, an RFID tag consists of a transponder and an embedded silicon chip with encoded data. The tag is placed on an object, and when the object passes within range of an antenna broadcasting radio waves on a specific frequency, the transponder "wakes up" and sends the chips data to a transceiver, sometimes over distances up to 20 feet.

What does the transceiver do?
The transceiver collects the data from each RFID tag, decodes it and transmits it to a data store or central processing computer. From there, the data can be analyzed and used according to specific requirements.

What does this have to do with active data warehousing?
Good question, because it is the crux of this article. There are several areas to explore when discussing active data warehousing and RFID technology. From an architectural perspective, we are now faced with the following challenges:

  • Constantly streaming data in massively parallel transponders
  • Astronomical amounts of information if every single product or item is tagged
  • The ability or lack thereof to tag at a granular level
  • The need to filter incoming data
  • Dynamic maintenance of the business rules, which must be capable of being distributed to the transponders for action at the point of collection
  • Melding of hardware and software systems
  • White noise and radio frequency interference

This is only the tip of the iceberg. There are many other concerns to consider, including privacy, GPS locations, bad data, compromised or damaged RFID tags and the distribution of rules and filters. How do we decipher what data is meaningful and what is meaningless?

"Distributed" active data warehouse
The result of all this RFID data activity is something I call a "distributed" active data warehouse. There are massive sets of data arriving in parallel streaming modes to the transponders, and the data is distributed because the rules for filtration must be running on the transponders themselves, otherwise the active warehouse would be overwhelmed.

Today, there may not be enough channels or fast enough networks to connect the transponders directly to the data warehouse itself. It's similar to the problem that disk-drive manufacturers faced years ago. (That problem was partially solved through fiber-optic connectivity.)

Two things that force changes to our architectures and designs are latency and volume. RFIDs are active on both fronts. Let's examine a hypothetical example to explore latency and volume.

Suppose we have a carton of candy bars, and each candy bar wrapper is tagged with an RFID tag. Now assume that the manufacturer has transponders at the plant, and the data from the transponders begins streaming into a centralized data warehouse the minute the candy bar is wrapped. Through the packaging process the candy bars are put in boxes (20 at a time). The boxes are then shrink-wrapped and put on a pallet for distribution. Let's say 500 boxes fit on a pallet. Now from one pallet alone, the transponders are receiving and transmitting data from 10,000 tags.

The questions that arise at this point might be: How frequently do we "access" the RFIDs through the transponders? Do we want 10,000 signals every second, every minute or less frequently? If we have more than one transponder in the plant, how do we eliminate duplicate signals?

Lest we forget, it's all about business driving technology, not the other way around. We have this hypothetical example because the business feels the need to track all products from inception to consumer, and possibly back again. So what's important to the business user here, and how do we get the desired answers?

From an active data warehouse perspective, 10,000 transactions per transponder every second is not too bad, considering that most Enterprise Application Integration (EAI) tools run from 10,000 to 25,000 transactions per second (depending on the technology and the performance and tuning done in the environment). But that's just from one pallet. What happens when we have 100 pallets in the room?

This is just the active feed side of the data warehouse. What if we stored all the incoming feeds (such as location from global positioning devices, for instance)? In that case, we would have a massive set of derived or computed items that can be produced from each product, such as: time on a pallet, time to ship, time in a stock room and time on the shelf.

What else can we derive from this information?
We can determine when the product might spoil, whether the vendor is selling enough of a particular product and when product is sold out. We might even predict when the product will sell out, and ship more just in time to restock the shelves. I could go on and on.

If we look at the technical aspects of implementing the system, what should we consider?
From a Teradata standpoint, the pipes into the database must be wide open and capable of handling massive parallelism. Alternatively, we could change the rate and volume of information coming in through the use of business rules.

Teradata might need an extra management component to handle the registration of transponders as a data source. It might also require a partnership with a business-rules vendor for dynamic, data-driven business rules that can be deployed to these transponders.

One thing is certain, not everyone will want all the data all the time. In this light, a business interface will be necessary to customize the amount and type of data revealed to an end-user. Another technical component that is often overlooked by database vendors is change data capture. In 2002, I wrote an article for Teradata Magazine about EAI vs. ETL vs. RDBMS (http://www.teradata.com/t/go.aspx/?id=115378). I suggested that these technologies would begin to merge together and push on each other for functionality.

RFIDs drive this case home. It will require the use of EAI rules to move the data through the transponders and networks, the use of "transformation" (the T in ETL) in the database and the power of the RDBMS to be massively parallel and capable of handling high-speed, high-volume data feeds.

What happened to ETL?
Extracting from transponders will be a moot point unless storage devices are actually built into the transponders themselves; perhaps they become distributed operational data stores instead of dynamic feeds. This might be required for fault-tolerance and fail over. However, devices like network routers and hubs will need to become "smart" and run filters to rid the network of bad feeds and undesirable information. Data quality and change data capture will have to operate on these distributed nodes.

Transformation will occur in two stages: on the transponders themselves and on the warehouse receiving the RFID transmission information. Time and date stamping will move to the forefront of database processing necessity. The ability to be time/date and geo-location aware will become a competitive advantage.

What happens if a transponder receives bad data?
Bad data can be generated (theoretically) by a defective RFID tag or an RFID virus (let's hope not). Transponders must have change data capture logic programmed in, along with parallel authentication devices to ensure that the data from the RFID is indeed bad. We may want to capture this information and record the fact that the RFID is bad so it can be replaced. We may even want to know how to replace it and how to keep it from "infecting" other nearby RFIDs. We can take a lesson from the credit card processing companies here. In an active data warehouse, they have flags that signal possible fraudulent activities. A similar rating system might be employed to detect bad data from the RFIDs and to either re-program them remotely or shut them down. Either way, the transponders must be connected to an active data warehouse in order for these decisions to be made.

What happens to Teradata?
Maybe a better question is what should happen to Teradata? When we peek into the future, no one can say for sure, but here's what I think may come about as only one result of RFID technology:

  • Teradata will manage dynamic business rules engines and contain a warehouse of information about the transponders, their locations, their filtration devices, their throughput rates, their success/failure points and a prediction of potential future failure.
  • Teradata will contain additional transformation rules that include powerful time-, date- and geo-location-based comparisons.
  • Teradata will generate cubes of space and time information to resolve complex geo-spatial formulas. The answers will be fueled by business questions like tracking, production speed, supplier management, manufacturing time, etc.
  • Teradata will disperse the business rules pertaining to white noise and filtration to all the registered transponders. These will be maintained in the single warehouse mentioned previously.
  • Teradata will change the core engine technology, allowing parts of the engine to run directly on the transponders themselves. They may begin to operate in a grid-fashion, with ODSs on each of the transponders to manage the flow of information. Queries will be executed against distributed transponders in parallel, as well as at the data warehouse level.
  • The lines of the ODS and the data warehouse will blur. Users will no longer "think" about the question as being strategic or tactical; instead, all the data will be available all the time.
  • Teradata's core engine will be the EAI, ETL and ELT engine of the future. These technologies will still be utilized, but only to source data outside the transponder world.
  • Teradata will extend its networking capabilities to allow transponders to be parallel, fault-tolerant, and redundant.

These are just a few ideas that come to mind for the Teradata engine when we consider the implementation and application of RFIDs to data warehousing. Teradata may or may not implement these suggestions-they are just logical extensions to the technology at hand that Teradata is so well equipped to handle. The other RDBMS engines have a long way to go to play catch-up in this space, particularly when it comes to distributed computing power, parallel everything, redundancy and fault tolerance.

In summary
Teradata is sitting smack-dab in the middle of the future. It has the capacity to handle RFID technology and beyond. A few modifications may be necessary to accommodate a single version of the business rules, but this will be necessary as we move forward.

RFIDs, like any other technology, will bring change: to our lives, to our data architectures, to our designs and our implementations. We should not sit on the sidelines and watch the technology go by.

Dan Linstedt, of Myers-Holum, Inc. can be reached at daniel.linstedt@MyersHolum.com.




Copyright by Teradata Corporation 2001-2007.