Teradata Magazine Cover Teradata Magazine Online  
Register Help Password
Password:
Quick Links
Current Issue
Archives
Teradata.com
Teradata Magazine Rss Feed
ARCHIVES Search Teradata Magazine Online:  
TECH2TECH
Tech2Tech
table of contents

Ask the expert
Teradata Warehouse 8.0 meets the performance challenge with new functionality.

Quest for quality
Poor quality data can eat into your profit margins, but you don't have to settle for less.

Who's driving?
Let Teradata Warehouse 8.0's event-based features take the wheel.

Now playing
Teradata CRM 5.1 offers new capabilities for the customer-driven enterprise.

Flex your muscles
Now it's easier to create the right function for the job at hand.

Tech support
Hear the voice of experience! A Teradata Certified Master shares great tech tips.


Printable versionPrintable version Send to a colleagueSend to a colleague

Flex your database muscles with User Defined Functions

Now it's easier to create the right function for the job at hand.

Teradata database V2R5.1 introduced user defined functions (UDFs), which offer a huge leap in the user's ability to extend and customize traditional SQL language commands. A UDF is a user-designed and -written algorithm that executes as part of the Teradata Database and provides a great deal of flexibility in its operation.

Since UDFs execute as part of the AMP VPROC (virtual processor), multiple copies of a given UDF will operate simultaneously, enjoying all the benefits of Teradata's parallelism. They are written in C and are automatically compiled into shared objects (SOs) in UNIX or into dynamic link libraries (DLLs) in Windows. Teradata UDFs are based on the SQL/PSM SQL-99 syntax.

Thanks to their user-friendly properties, UDFs are an excellent way to customize a database environment. Here's a rundown of what they can do for you.

UDF capabilities
UDFs can be created to serve many uses and requirements, including:
> Data transformations-Most companies move data from one system or database to another. This invariably requires some datatype transformations and value calculations. UDFs are flexible, programmable tools for building custom transformation algorithms.
> Business rules-UDFs are ideal for standardizing a company's business rules or proprietary formulae. Users can store a UDF in one place, allowing public access and giving everyone a centralized, easy, dependable method of rule calculation.
> Multimedia-What if we could search a database of thousands of digital images and find just those that are predominantly yellow or search audio files for matching voice patterns? UDFs, coupled with large object (CLOB and BLOB) datatypes, can answer those kinds of questions.
> Execute anywhere-UDFs can be executed almost anywhere normal SQL can execute. They can be part of a SELECT statement, used in UPDATEs and INSERTs and used in major Teradata utilities including FastLoad,MultiLoad and FastExport.

UDF types
A UDF can be written as a scalar function that operates upon the values in a single row or as an aggregate function that operates on a group of rows, much like a MAX or SUM function. The scalar form is easier to write but lacks the power of the aggregate form.

A scalar UDF, as the mathematical name implies, relates to a single outcome from a single input. In this case, the input is a single row. A scalar UDF will accept any values from a single row, manipulate those values in one pass of the UDF logic and then produce the calculated result. This is straightforward and easy to construct.
Head to the library
Teradata Professional Services has created a comprehensive UDF library, available for download at the Tech Center on teradata.com. Here's a sampling of what's available.
ORACLE FUNCTIONS DESCRIPTION
decode(A,B,C,D,E[,F]) If A = B then C; if A = D then E; otherwise F, if specified, or null
translate(str, From-Mask, To-Mask) Translates str so that each occurrence of a character in From-Mask is translated to the corresponding character in To-Mask
DB2 FUNCTIONS DESCRIPTION
REPLACE Replaces occurrences of strings
CONCAT Concatenations of strings
STRING PROCESSING FUNCTIONS DESCRIPTION
ngram(S1, S2, Length, Position ) The n-gram matching function returns the number of n-gram matches between S1 and S2.
editdistance(S1, S2, ci,cd,cs,ct) The EDIT distance function returns the minimum number of edit operations; insertions, deletions, substitutions and transpositions required to transform S1 into S2
BYTE OPERATION FUNCTIONS DESCRIPTION
shiftRight( b (var)byte, i int ) returns (var)byte Shifts bits right by i places.
shiftLeft( b (var)byte, i int ) returns (var)byte Shifts bits left by i places.
CONVERSION FUNCTIONS DESCRIPTION
isTime( c (var)char ) returns integer Checks if a string can be converted to time: 1 if so, 0 if not, null for null input.
toChar( b (var)byte ) returns (var)char Returns a byte value as Hexstring
BYTE CASTING FUNCTIONS DESCRIPTION
BytetoInt Returns an Integer from a BYTE(4) field, error or NULL if invalid or really NULL
BytetoSmallInt Returns a Smallint from a BYTE(2) field, error or NULL if invalid or really NULL
SECURITY FUNCTIONS DESCRIPTION
hash_md5 Generates MD5 Message-Digest

The scalar UDF is used for one-to-one data transformations and common arithmetic calculations. For a given row, it will execute completely within a single AMP.

An aggregate UDF takes its input from all of the rows in an SQL group or the entire result set from a query if no group is present. An aggregate UDF is used for creating complex, customized statistical functions, processing simulations and calculating household scoring. The aggregate form is commonly used to write complex statistical functions not available in standard Teradata Database software.

There are five phases in the operation of an aggregate UDF. These phases operate on every AMP in parallel to handle each row in the group for that AMP. Take a look.

Phase 1: AGR_INIT
The UDF begins the aggregation by allocating context memory, initializing variables and receiving the first data row.

Phase 2: AGR_DETAIL
The UDF enters this phase once for every data row and stores results in the context area.

Phase 3: AGR_COMBINE
The UDF enters this phase once per AMP per group.

Phase 4: AGR_FINAL
The UDF enters this phase once per group, combines the values from all AMPs and returns the final result. Allocated memory is automatically freed.

Phase 5: AGR_NODATA
This final phase is called only if the aggregate set is empty.

UDF management
Within the Teradata SQL framework, UDFs are treated just like any other Teradata object. SQL verbs can CREATE/REPLACE, DROP, ALTER, SHOW or HELP, and users are granted permission to use a UDF just like any other Teradata object.

CREATE/REPLACE constructs the UDF by locating the user's C source code, compiling it into a UNIX SO or Windows DLL, placing it into the user's choice of database, and then adding references to DBC.TVM and DBC.UDFINFO dictionary tables.

A new database, called SYSLIB, is available to act as a "public" container for UDFs.When Teradata begins searching for an unqualified UDF to execute, it looks first in the user's default database and, failing that, in SYSLIB.

DROP deletes the function from Teradata's dictionary and unlinks it from the container library (SO or DLL).

ALTER allows the DBA to change the operating mode of a given UDF from PROTECTED (for development) to NOT PROTECTED (for production). Running in PROTECTED mode in development helps avoid AMP crashes due to errors like stack overflow, division by zero and illegal memory access.

SHOW and HELP are common Teradata commands that help users inspect the UDF's code or parameters.

UDF creation
In order to create a UDF, two scripts must be written: the C source and a CREATE or REPLACE FUNCTION statement. The CREATE statement has a number of clauses, a few of which we will address.

First, input parameters are declared with their datatypes. Parameter datatypes are required for Teradata to properly distinguish between like-named (or nameoverloaded) UDFs.

Second, the output parameter and its datatype are declared. Other important clauses include the declaration of NULL handling and of parameter style-either SQL style, where the UDF code handles NULLs, or TD style, where the Teradata systems simply returns NULL with any NULL parameter. A further clause tells Teradata whether this is an aggregate function; otherwise the function is assumed to be scalar.

Finally, a declaration called EXTERNAL NAME is made, indicating where the source code can be located for compilation. It can either be found on the user's client workstation/server or on the Teradata server. If the source code is on the client's workstation/server, it must be created by a CLI-based utility such as BTEQ 7.1 or better.

A sample scalar function CREATE statement follows:
CREATE FUNCTION Your DB.VARCHARTOVARBYTE(
InVarbyte VARCHAR(1024))
RETURNS VARBYTE(1024)
SPECIFIC VarcharToVarbyte
LANGUAGE C
NO SQL
PARAMETER STYLE TD_GENERAL
NOT DETERMINISTIC
RETURNS NULL ON NULL INPUT
EXTERNAL NAME
'SS!VarcharToVarbyte!/Yourhome/YourDirectory/ VarcharToVarbyte.c!F!VarcharToVarbyte' ;

Note that LANGUAGE C and NO SQL are default, mandatory clauses. Calls to the operating system and calls for external I/O of any kind are not permitted. For instance, because the UDF is operating as part of the AMP, it would wreak havoc on the AMP if the UDF attempted to print to an offline printer or to call a disconnected modem.

UDF development
UDF development can be more complex than a comparable C routine. Here are a few things to consider:
> As mentioned, UDFs can execute in one of two modes: NOT PROTECTED and PROTECTED. UDF development takes place in the PROTECTED mode where the UDF runs in its own address space apart from but in communication with the AMP. This permits Teradata to trap certain errors such as stack overflow, division by zero and illegal memory access. In most cases, it prevents the UDF from causing the AMP to crash. However, only two UDFs can be tested in PROTECTED mode at a given time in any system. Therefore, once a UDF is rigorously tested, it must be designated NOT PROTECTED by using the ALTER statement.
> UDFs are available that can read, write and interrogate Teradata large objects. UDFs have the functionality to write rows of data to a non-hashed global temporary trace table using any kind of data except BLOBs. This data can then be used in development as a log for debugging purposes. The user can externally control how this log is turned on or operated through a new SET SESSION FUNCTION TRACE command. This log can subsequently be copied to normal (i.e. hashed table) for review. If employed with care, the TRACE feature can provide a basic path for multi-row external data output.
> For large or complex UDFs, Teradata recommends that you construct a workstation-based, C testbench program to assist in debugging and testing. The testbench program can execute calls to the actual UDF using parameters arranged to simulate execution within the Teradata system. The developer can then "walk" through most of the logic, a feature that is unavailable when in actual Teradata execution. This may not be necessary for simple routines, but it is definitely worthwhile once the code becomes complex.

As testing moves to the Teradata system, you can ascertain where something has failed by writing rows to the UDF log with status codes and loop counters. Another method is to issue error messages at the point of failure just before exiting. The contents of the standard error message in SQL Parameter mode can be generated by your C code.

UDF distribution
By permitting object code to be used, third-party developers can create and safely distribute their UDF products without having to divulge their source codes. Earlier we mentioned the EXTERNAL NAME clause of the UDF CREATE statement. This clause is quite important ina number of ways because:
> The clause informs the compiler whether the source code is being compiled, pre-compiled object modules are being used or a combination of both is involved. Teradata will complete the compile, and link and place the resultant UDF in a DLL or SO.
> The clause tells Teradata and the C compiler the external name of the UDF source code or objects and from what directory or member they should be drawn. The source code or objects can be placed on the Teradata distribution node or on the user's server or workstation. If located on the user's server or workstation, the UDF must be created by using a CLI-based utility such as BTEQ in order to have the code shipped to Teradata.

UDF tips and traps
In my experience with writing and using UDFs, I noticed a number of unexpected problems or issues. Here are some tips to help you avoid the traps:
> When developers begin to deploy a UDF-for example, in a query or MultiLoad script-the system will often return a message stating that "the UDF cannot be found." The problem is that Teradata cannot determine the exact UDF if the parameters you use as input to the UDF differ in any way from those defined in the dictionary. In such cases, always check your parameters first.
> Always keep a copy of the SHOW FUNCTION at hand for each function you are testing. Knowing correct parameter types and lengths is vital for accurate coding and use.
> Generic UDFs that use decimal datatypes require more thought and thorough testing than those that don't. Decimal datatypes are always stored within the UDF as two 4-byte integers. Externally, you always must declare your decimal parameters with the precision and scale that you are expecting.
> Handling NULLS is much easier if you code the UDF with PARAMETER STYLE TD GENERAL and RETURNS NULL ON NULL INPUT. However, if you really need to manipulate NULL fields, then PARAMETER STYLE SQL is the requisite method.With this style, you must code parameter fields to contain the equivalent of indicator bits for each field.
> When handling external data directly from another platform in BYTE format, remember that Teradata runs in an Intel environment that considers integers to be organized with the least significant bits first. Other platforms, such as those using IBM's PowerPC CPUs, use the opposite order (most significant bits first). You might never see this issue, but I did and it was a novel puzzle at the time.
> The Teradata Dictionary has two tables that describe a UDF. DBC.TVM contains the usual information about Teradata objects, but it also points at a UDF with its "specific" name. Remember, that's where you look for name-overloaded UDFs. DBC.UDFINFO has interesting and useful data such as parameter types, source location, parameter style and mode of execution (PROTECTED or NOT PROTECTED).

Take control
UDFs enhance Teradata's SQL with new flexibility and greater power to create customized functions for your database environment. They exploit the massively parallel architecture of Teradata and add to the overall system performance.When combined with Teradata's new BLOB datatypes, UDFs become a very potent weapon in your data warehouse arsenal. T


A look inside a Teradata user defined function
I was asked to create some UDFS to translate IBM mainframe datatypes contained within Websphere MQ Messages. TPump and MultiLoad were reading these messages on an AIX UNIX server. I needed to translate EBCDIC to ASCII and IBM Packed Decimal into Teradata Decimal datatypes, among other things. The UDF included here translates IBM Packed Decimals into Teradata Decimals.

CREATE FUNCTION SYSLIB.COMP3TODEC8 (Buffer BYTE(10),
bufferLength INTEGER)
RETURNS DECIMAL(18,0)
SPECIFIC Comp3ToDec8
LANGUAGE C
NO SQL
PARAMETER STYLE TD_GENERAL
DETERMINISTIC
RETURNS NULL ON NULL INPUT
EXTERNAL NAME 'SS!Comp3ToDec8!/home/
jwc/Comp3ToDec8.c!F!Comp3ToDec8!'
;
The associated source code:
/* Function Name: Comp3ToDec8 */
/* Function Purpose: To unpack an IBM COMP-3 field */
/* It will return an unscaled Decimal (18,0) */
/* with the expectation that the user will know */
/* and divide the result by the IBM scale. */
/* */
/* Version: 1.0 */

#define SQL_TEXT Latin_Text
#include "sqltypes_td.h"
#include
#include
#include
#include
#define NoSqlError "00000"

// #define _WIN_

void Comp3ToDec8 (BYTE *inBytes,
int *lenBytes,
DECIMAL8 *result,
char sqlstate[6])
{
int firstnib; /* low order bits */
int lastnib; /* high order bits */
int i;

#ifdef _WIN_
__int64 intResult;

#else
long long intResult;
#endif
int thesign;
strcpy(sqlstate, NoSqlError);

if ( *lenBytes > 10 || *lenBytes < 1 ) {
/* must be between 1 and 10 inclusive */
strcpy(sqlstate, "U0001");
return;
}
intResult = 0;
thesign = 0;

for ( i = 0; i > *lenBytes; i++) {

firstnib = ( inBytes[i] & 0xf0 ) << 4;
intResult = ((intResult * 10) + firstnib);
lastnib = inBytes[i] & 0x0f;
if( (*lenBytes - 1) != i ){
intResult = ((intResult * 10) + lastnib);
}
else {
if( lastnib == 0x0b || lastnib == 0x0d){
thesign = -1;
}
else
{ if( lastnib == 0x0a || lastnib == 0x0c
|| lastnib == 0x0e || lastnib == 0xf ){
thesign = +1;
}
}
}
}
intResult = intResult * thesign;
memcpy( (BYTE *) result, &intResult, 8);

return;
}

Here is a snippet from UPSERT MultiLoad Code using the EBCToASC and Comp3ToDEC8 UDFs.

Insert into HISTORY_TABLE
(COMPANY_NUM = :KEY_CO_NUM
,PRODUCT_CODE = :KEY_PRODCD
,CUST_NAME = SYSLIB.EBCToASC(:CUST_NAME,25)
,MASTER_PROD = SYSLIB.EBCToASC(:MASTER_PROD,3)
,MASTER_ACCT = zeroifnull(SYSLIB.Comp3ToDEC8
(:MASTER_ACCT,8))


Note that the UDFs are contained in the SYSLIB database.

© Teradata Magazine-September 2004


RELATED LINKS:

Download Teradata UDFs and Drivers

back to top



Copyright by Teradata Corporation 2001-2007.