|
|
|
Add capabilities to the data warehouse dynamically with Teradata's extensibility features.
|
An active user community
Empowering users of business intelligence adds ROI for your enterprise and confidence for your team.
|
Enterprise BPM
Integration of technologies from Hyperion and Teradata result in an enterprise architecture for Business Performance Management.
|
Purpose-built platform
Enterprise data warehousing requirements drive the design and architecture decisions of the Teradata platform.
|
Tech support
Need technical help? A Teradata Certified Master answers your queries.
|
|
|

|
|
Ask the expert: Extensibility
Find new ways to drill into your data.
by Todd Walter
My power drill is my favorite tool. Besides just being fun, it is also extremely versatile. Its primary purpose is to drill holes, with bits inserted in the chuck. However, the chuck allows many tools to be inserted. I have used mine as a screwdriver, a hole saw, a lathe, a grinder, a paint stirrer and a couple of things that probably invalidate the warranty.
This power drill is an example of extensible design. Anyone may extend the capabilities of the drill by building tools that fit into the chuck—the manufacturer, other tool vendors and even consumers. The drill itself needs no modifications to perform any of these new functions.
Extensibility occurs in software too; it refers to the ability of a software product to be enhanced or customized. Approaches to extensibility include facilities for allowing users to insert their own program routines, the ability to define new data types and the ability to add new features to an existing program. The key point is to provide for change while minimizing impact to existing system functions.
What extensibility capabilities does Teradata have?
Views, macros and triggers have
always provided basic extensibility in Teradata. More recent additions include User Defined Functions (UDFs), Stored Procedures (SPs), External Stored Procedures (xSPs) and Table
Functions (TFs). User Defined Types (UDTs) will be introduced in Teradata Warehouse 8.1.
How do views, macros and
triggers provide extensibility?
A view provides the ability to package expressions and SQL access to tables. The view extends the database by structuring access and by providing new metrics
computed from underlying columns. This extension appears as a single new table, hiding the complexity from the user.
A macro contains one or more SQL statements, which can be executed by invoking the macro name. A macro may be parameterized, allowing constraint values to be supplied to its execution. It extends the database by encapsulating complex SQL, multiple task steps and report formatting into a simple-to-execute object.
Upon inserting, updating or deleting rows, a trigger can conditionally execute SQL statements or call an SP. Triggers can be used for logging, security, data conditioning, event detection or even sending e-mail. They extend and automate, replacing separate application logic.
What is a UDF?
A UDF is a function that is coded and installed independently of the installation of the DBMS (database management system) software. Once installed, the function is referenced just as any other SQL function would be. The UDF extends the capability of the DBMS dynamically with the logic implemented by its creator.
What are the benefits of
creating a UDF?
The SQL language has many pre-defined functions. Each DBMS implements the functions defined by ANSI SQL and adds specific proprietary ones as well.
But what if you want to extract the substring that's found after the third comma and before the second semicolon unless there is a dash? Compute a complex mathematical function and have everyone guaranteed to use the same computation? Provide an equivalent to a nonstandard function from another DBMS? Deploy an analytic model calculated from a data mining tool? Analyze the content of non-traditional data—perhaps extract attributes from a document? Each of these may be addressed with a UDF and the use of it in SQL statements.
What kinds of UDFs
can be created?
There are two primary types of SQL functions: scalar and aggregate. A scalar function operates on the contents of the current row within a SQL statement. It has input parameters that accept a combination of columns and expressions, and it returns a function result. String operations and mathematical functions are the focus of the predefined set of scalar functions.
An aggregate function operates over groups of rows as defined by the SQL GROUP BY clause. Parameters from each row in the group are provided to the function that accumulates results across the set. Sum, count, average, max, min and statistical functions are the focus of the predefined aggregate functions.
A UDF may be a scalar or an aggregate function. Scalar UDFs accept parameters as input, perform as much computation and logic as desired and return a result for each row in the SQL statement. Aggregate UDFs accept input parameters, perform accumulation and computation and return a result for each group of rows defined by the GROUP BY clause.
How do UDFs work?
A UDF is called for each row processed by a SQL statement. The parameters
are supplied from the current row. A scalar UDF performs all of its computation immediately and returns its result to the row being processed by the statement. The UDF is executed on all units of parallelism simultaneously.
An aggregate UDF has storage for
context variables that keep their values across many calls. It is called for each
row in the statement. If the row represents the first time a new group value has been seen, then the function initializes its context and copies values into the variables.
On subsequent calls with the same group value, the function accumulates values into the context variables.
When the final row in the group is encountered, the function utilizes the accumulated values, computes the final function result and returns it to the row for the group. Teradata provides all of
the services necessary to deal with groups existing on multiple units of parallelism, caching and combining so the function creator does not have to be aware of its parallelism aspects.
What is an SP?
An SP extends the DBMS with an entire program or program fragment. It can contain SQL and procedural control statements in any combination. The
program is encapsulated so that a user need only invoke the procedure by name and supply the required parameters. An SP is coded and installed dynamically and independently of the DBMS.
In what ways does an xSP differ from an SP?
An SP is coded using the ANSI stored procedure language, making it easy to manage the flow of SQL statements and logic. An xSP is coded using C or C++, which allows access to functions and processes outside of the DBMS domain.
What are the benefits of creating an SP?
The alternative to an SP is code in
an application running outside the DBMS domain. An SP executes inside
the DBMS closer to the data, making
it higher-performance.
There is a single copy of the code,
meaning reuse and consistency are guaranteed. The code can be protected, and the execution of the SP is auditable for better security.
Multi-step application logic, data transformation and loading, packaged tactical data access, DBA task automation—the uses are constrained only by the imagination. An xSP can reach outside Teradata to read from a queue, send a message or get data from a file. When invoked by a trigger, an SP can analyze, filter and propagate events.
What is a TF?
A TF is a C or C++ function that is
referenced in the FROM clause of a SQL data manipulation language (DML) statement. A TF produces a set (one or more rows) that comes from external data sources or input parameters or is generated internally and formatted by the TF. Output rows contain columns comprised of any of the supported Teradata types.
The returned rows appear to the SQL statement just as if they came from a table or view reference. A TF is created and installed independently from the DBMS, allowing adjustment to new sources and source format changes.
What are the benefits
of creating a TF?
Data from outside the data warehouse may be required for analysis. If it is a relatively well-known source and format, and if it is inconvenient to load the data into the warehouse ahead of time, then a TF is a way to bring that data into the analysis. The data may reside in an external file, a spreadsheet or even another database. A TF may generate test data inside the engine rather than creating and loading it from outside. A TF may convert a single input into a set of rows—for example, converting an XML document, a text document, an electronic data interchange (EDI) data structure or a time series into many rows.
What is a UDT and why
would I want to create one?
The next installment of Teradata's
extensibility story is UDTs, which
will be available with Teradata Warehouse 8.1. A UDT extends
the DBMS with a new data type
and the associated methods that operate on it. New types may be
created directly from types previously
provided. New types may also be
complex structures made up of
several different fields of different
types. The methods defined on
the new types extend the query
language and are invoked as if they
were SQL operators.
UDTs allow more clarity, fewer
errors and greater access to more
types of data. For example, a set of
currency UDTs (euro, dollar, etc.)
keeps different currencies from being used in the same expression. A UDT
for spatial data with latitude, longitude and altitude could have methods that compute distance between locations.
A UDT for unstructured data could define the header of a picture file, making all the attributes of the
picture available for query (time,
f-stop, resolution, etc.).
Can I share my extensions with others?
There are no limits (except your company's policies) to sharing the extensions you build. There are UDFs available for download on Teradata.com; use them to get started. Contributions to the library are welcomed under "open source" rules.
Teradata, the extensible power tool
Like the power drill, Teradata's extensibility features allow capabilities to be added to the data warehouse dynamically. All the uses of the drill take advantage of the basic drive mechanism.
Similarly, all of the Teradata extensibility features take advantage of Teradata's built-in parallelism and scalability, putting the full power of the Teradata MPP platform to work on the task and opening that power up to anyone with a new problem to solve. T
Todd Walter, CTO, Teradata Development Division, oversees R&D efforts for Teradata Database software and systems. He also is responsible for the future vision and development of the active data warehouse. You can send your data warehousing questions to the expert at
todd.walter@teradata-ncr.com.
© Teradata Magazine-March 2006
back to top
|