Converging separate products into Teradata Parallel Transporter offers better productivity with greater options.
by Elizabeth Lipp
An enterprise with multiple business units works better when the units function as a whole. Likewise, separate data warehouse tools are more productive when they
are combined within the same infrastructure; then they speak the same language and communicate as one. Teradata Parallel Transporter is one example of products
working together within an active data warehouse.
Teradata Parallel Transporter comes from a long line of data loading products, including FastLoad, MultiLoad, FastExport and TPump, each of which communicated
with the Teradata Database in its own language. A new-generation product, Teradata Parallel Transporter simplifies the data loading process by running the
protocols used by each of these products as modules or operators: load, update, export and stream.
Better together
Teradata Parallel Transporter is made up of a range of operators that have their own purpose in the data warehouse system. Load/unload operators use the FastLoad,
MultiLoad, FastExport and TPump protocols. (For consistency, these protocols have been dubbed with the same name as when they were individual tools functioning
separately.)
Load operator, using the FastLoad protocol, is a parallel load utility designed to move large volumes of data collected from
data sources on channel- and network-attached clients into empty tables in the Teradata Database.
Update operator optimizes batch operations that rapidly acquire, process and apply data to tables in a Teradata Database by
using the MultiLoad protocol. For data maintenance, MultiLoad updates, inserts and upserts large volumes of data into empty or populated tables, and/or deletes
data from the tables. MultiLoad works at the data block level, providing an alternative to insert/select operations that touch a significant portion of the target
table. A single MultiLoad job can maintain up to five Teradata tables by extracting large volumes of data, locking the destination tables, then loading data using
block level updates.
Export operator exports large data sets from Teradata tables or views, imports the data to a client's system for processing or
generating large reports, or for loading data into a smaller database. It uses the FastExport protocol. Like all Teradata load utilities, FastExport can export
data to channel- or network-attached client systems. FastExport also provides session control and data handling specifications for the data transfer operations.
Stream uses the SQL-based TPump protocol, not block-based, for continuous loading that is designed to move data from data sources
into Teradata tables without locking the affected table. TPump loads near real-time data into the data warehouse, allowing users to maintain fresh, accurate data
for up-to-the-minute decision making. TPump can be used to insert, update, upsert and delete data in the Teradata Database, particularly for environments where
batch windows are shrinking and data warehouse maintenance overlaps normal working hours. Because the TPump protocol uses row hash locks, users can run queries
even while they are updating the Teradata Warehouse.
More opportunities
Combining these four protocols—FastLoad, MultiLoad, FastExport and TPump—in Teradata Parallel Transporter and running them as load, update, export and stream
operators provides users with a flexible tool to handle various load scenarios. With Teradata Parallel Transporter, it is possible to concurrently run multiple
instances of the operators against sources and targets to shorten the load process. (See figure 1.)
Previously, when extracting data from one Teradata system and loading it into another, the jobs needed to be done independently with two separate scripts: one to
extract the data and another to load the data. Teradata Parallel Transporter simplifies this process by providing flexible, scalable access to data sources from a
single interface using one SQL-like scripting language or an open application programming interface (API). This allows for more automatic and simultaneous loading
of dozens of files, easing script development and maintenance. (See figure 2.)
With the introduction of an API in this process, Teradata Database load protocols become open, enabling third-party partners to integrate faster, easier and deeper
without the need to run a separate script job. The API also enables the full job flow from extracting through transformation and loading, which can be run,
controlled and monitored by extract, transform and load (ETL) products. This can be done using either ETL or extract, load and transform (ELT) techniques to load
the Teradata Database. In addition, customer-written programs can also make use of the Teradata Parallel Transporter API to directly take advantage of the load
protocols.
Since the four utilities have converged into Teradata Parallel Transporter, only one scripting language is used. This simplicity creates new opportunities for the
user, such as:
|
Solving more complex load scenarios
|
|
Switching between load protocols more easily
|
|
Unlimited symbol substitution
|
|
Fewer required scripts
|
|
Simultaneous Teradata-to-Teradata export and load scenario
|
Another feature provided through the Teradata Parallel Transporter is the Directory Scan, which allows data files in a directory to be processed in parallel.
(See figure 3.) The data connector operator, which is used as the operator to read the data, provides scalable and parallel access to multiple files,
and distributes the files as evenly as possible among operator instances.
Similar to the Directory Scan is the Active Directory Scan. While both process data files in parallel, the Active Directory Scan also allows data files to
continuously arrive to the scanned directory while the job is still running. This type of processing can provide 24x7 scanning of data files, which could represent
daily transactions that occur in different timeframes from multiple sources or locations.
All for one
There are many ways to import and export data with specific foci on overnight batching, continuous loading, streaming data and mini-batching. Teradata Parallel
Transporter is a fully scalable solution on the client machine where it runs, giving it the capability to overcome bottlenecks that restrict throughput. This
allows the Teradata Parallel Transporter to take advantage of additional memory and multiple CPUs with the purpose of increasing the throughput of the load job.
The protocols that make up Teradata Parallel Transporter have long been supported by the Teradata Database for a variety of volume load activities and requirements.
Teradata Parallel Transporter, a new-generation product, allows for more productive and high-performing functions with the Teradata load utilities compiled into
one solution. T
| Merging capabilities |
|
Teradata Parallel Transporter was designed for increased functionality and customer ease of use for faster, easier
and deeper integration. The capabilities include:
| > |
Simplified data transfer between one Teradata Database and another; only one script is required to export from the production-and-load test system
|
|
| > |
Ability to load dozens of files using a single script makes development and maintenance of the data warehouse easier
|
|
| > |
Distribution of workloads across CPUs on the load server eliminates bottlenecks in the data load process
|
|
| > |
Data flows through multiple instances of UPDATE OPERATOR and in-memory data streams to update tables
|
|
| > |
Option to export data to in-memory data stream instead of landing data
|
|
| > |
The open database connectivity (ODBC) operator reads from the ODBC driver, which could pull data from any database; for
example, DB2 or Oracle
|
|
| > |
Accessibility to myriad data sources via open standards
|
|
| > |
Ability of multiple operators to scan directories for files to load, and to combine the data in the in-memory data stream
with UNION ALL operation and stream operator loads
|
|
| > |
Script-building wizard to aid first time users
|
|
—E.L.
|
|
Below are some sample scripts to get you started with the Teradata Parallel Transporter product:
GT11
/**************************************************************/
/* */
/* Script Name: GT11.txt */
/* Description: This Teradata Parallel Transporter sample */
/* script exports 10 rows from a Teradata table */
/* and loads the 10 rows into a different, empty */
/* Teradata table. */
/* */
/**************************************************************/
/* Explanation: */
/* */
/* This script uses two job steps. */
/* */
/* The first job step called "setup" uses the DDL Operator to */
/* setup the source and target tables. */
/* */
/* The second job step called "load_data" uses the Export */
/* Operator to export the rows from a Teradata source table */
/* and uses the Load Operator to load the rows into an empty */
/* Teradata target table. */
/* */
/* This script shows how to use the following Teradata */
/* Parallel Transporter features: */
/* * Unconditional job steps. */
/* * Export rows from a Teradata table and load the rows into */
/* a different, empty Teradata table without landing the */
/* rows to a data file. */
/* */
/**************************************************************/
/* This script also demonstrates using "job variables", */
/* which are defined in the external file "jobvars.txt". */
/* These variables can be used anywhere inside the job script */
/* by prepending the @ symbol to the variable name. */
/* */
/**************************************************************/
/* Required customizations before running this script: */
/* */
/* Inside the file "jobvars.txt", modify the values for */
/* the following job variables: */
/* */
/* MyTdpId, MyUserName, MyPassword */
/* */
/* MyTdpId is the name of the Teradata Database system. */
/* MyUserName is the username on the Teradata Database system.*/
/* MyPassword is the password associated with the username. */
/* */
/**************************************************************/
/* Execution: */
/* */
/* Here is the command to execute this script: */
/* */
/* tbuild -f GT11.txt -v jobvars.txt */
/* */
/**************************************************************/
DEFINE JOB BACKUP_EMPLOYEE_TABLE
DESCRIPTION 'BACKUP SAMPLE EMPLOYEE TABLE'
(
DEFINE SCHEMA EMPLOYEE_SCHEMA
DESCRIPTION 'SAMPLE EMPLOYEE SCHEMA'
(
EMP_ID INTEGER,
EMP_NAME CHAR(10)
);
DEFINE OPERATOR DDL_OPERATOR()
DESCRIPTION 'TERADATA PARALLEL TRANSPORTER DDL OPERATOR'
TYPE DDL
ATTRIBUTES
(
VARCHAR PrivateLogName = 'GT11_ddloper_log',
VARCHAR TdpId = @MyTdpId,
VARCHAR UserName = @MyUserName,
VARCHAR UserPassword = @MyPassword,
VARCHAR AccountID,
VARCHAR ErrorList = '3807'
);
DEFINE OPERATOR LOAD_OPERATOR()
DESCRIPTION 'TERADATA PARALLEL TRANSPORTER LOAD OPERATOR'
TYPE LOAD
SCHEMA EMPLOYEE_SCHEMA
ATTRIBUTES
(
VARCHAR PrivateLogName = 'GT11_loadoper_privatelog',
INTEGER MaxSessions = 32,
INTEGER MinSessions = 1,
VARCHAR TargetTable = 'TARGET_EMP_TABLE',
VARCHAR TdpId = @MyTdpId,
VARCHAR UserName = @MyUserName,
VARCHAR UserPassword = @MyPassword,
VARCHAR AccountId,
VARCHAR ErrorTable1 = 'GT11_LOADOPER_ERRTABLE1',
VARCHAR ErrorTable2 = 'GT11_LOADOPER_ERRTABLE2',
VARCHAR LogTable = 'GT11_LOADOPER_LOGTABLE'
);
DEFINE OPERATOR EXPORT_OPERATOR()
DESCRIPTION 'TERADATA PARALLEL TRANSPORTER EXPORT OPERATOR'
TYPE EXPORT
SCHEMA EMPLOYEE_SCHEMA
ATTRIBUTES
(
VARCHAR PrivateLogName = 'GT11_exportoper_privatelog',
INTEGER MaxSessions = 32,
INTEGER MinSessions = 1,
VARCHAR TdpId = @MyTdpId,
VARCHAR UserName = @MyUserName,
VARCHAR UserPassword = @MyPassword,
VARCHAR AccountId,
VARCHAR SelectStmt = 'SELECT * FROM SOURCE_EMP_TABLE;'
);
STEP setup
(
APPLY
('DROP TABLE SOURCE_EMP_TABLE;'),
('DROP TABLE TARGET_EMP_TABLE;'),
('DROP TABLE GT11_LOADOPER_ERRTABLE1;'),
('DROP TABLE GT11_LOADOPER_ERRTABLE2;'),
('DROP TABLE GT11_LOADOPER_LOGTABLE;'),
('CREATE TABLE SOURCE_EMP_TABLE(EMP_ID INTEGER, EMP_NAME CHAR(10));'),
('CREATE TABLE TARGET_EMP_TABLE(EMP_ID INTEGER, EMP_NAME CHAR(10));'),
('INSERT INTO SOURCE_EMP_TABLE(1,''JOHN'');'),
('INSERT INTO SOURCE_EMP_TABLE(2,''PETER'');'),
('INSERT INTO SOURCE_EMP_TABLE(3,''FRANK'');'),
('INSERT INTO SOURCE_EMP_TABLE(4,''MARY'');'),
('INSERT INTO SOURCE_EMP_TABLE(5,''ELLEN'');'),
('INSERT INTO SOURCE_EMP_TABLE(6,''MICHAEL'');'),
('INSERT INTO SOURCE_EMP_TABLE(7,''SAM'');'),
('INSERT INTO SOURCE_EMP_TABLE(8,''JONATHAN'');'),
('INSERT INTO SOURCE_EMP_TABLE(9,''MICHELLE'');'),
('INSERT INTO SOURCE_EMP_TABLE(10,''ALICE'');')
TO OPERATOR (DDL_OPERATOR () );
);
STEP load_data
(
APPLY
('INSERT INTO TARGET_EMP_TABLE (:EMP_ID, :EMP_NAME);')
TO OPERATOR (LOAD_OPERATOR () [1] )
SELECT * FROM OPERATOR (EXPORT_OPERATOR() [1] );
);
);
GT44
/**************************************************************/
/* */
/* Script Name: GT44.txt */
/* Description: This Teradata Parallel Transporter sample */
/* script exports 10 rows from a Teradata table */
/* to a file. */
/* */
/**************************************************************/
/* Explanation: */
/* */
/* This script uses two job steps. */
/* */
/* The first job step called "setup" uses the DDL Operator to */
/* setup the source table. */
/* */
/* The second job step called "export_to_file" uses the */
/* Export Operator to export the rows from a Teradata table */
/* and uses the Data Connector Operator to write the rows to */
/* a file. */
/* */
/* This script shows how to use the following Teradata */
/* Parallel Transporter features: */
/* * Unconditional job steps. */
/* * Export rows from a Teradata table and write the rows to */
/* a file. */
/* */
/**************************************************************/
/* This script also demonstrates using "job variables", */
/* which are defined in the external file "jobvars.txt". */
/* These variables can be used anywhere inside the job script */
/* by prepending the @ symbol to the variable name. */
/* */
/**************************************************************/
/* Required customizations before running this script: */
/* */
/* Inside the file "jobvars.txt", modify the values for */
/* the following job variables: */
/* */
/* MyTdpId, MyUserName, MyPassword */
/* */
/* MyTdpId is the name of the Teradata Database system. */
/* MyUserName is the username on the Teradata Database system.*/
/* MyPassword is the password associated with the username. */
/* */
/**************************************************************/
/* Execution: */
/* */
/* Here is the command to execute this script: */
/* */
/* tbuild -f GT44.txt -v jobvars.txt */
/* */
/**************************************************************/
DEFINE JOB EXPORT_EMPLOYEE_TABLE_TO_FILE
DESCRIPTION 'EXPORT SAMPLE EMPLOYEE TABLE TO A FILE'
(
DEFINE SCHEMA EMPLOYEE_SCHEMA
DESCRIPTION 'SAMPLE EMPLOYEE SCHEMA'
(
EMP_ID INTEGER,
EMP_NAME CHAR(10)
);
DEFINE OPERATOR DDL_OPERATOR()
DESCRIPTION 'TERADATA PARALLEL TRANSPORTER DDL OPERATOR'
TYPE DDL
ATTRIBUTES
(
VARCHAR PrivateLogName = 'GT44_ddloper_log',
VARCHAR TdpId = @MyTdpId,
VARCHAR UserName = @MyUserName,
VARCHAR UserPassword = @MyPassword,
VARCHAR AccountID,
VARCHAR ErrorList = '3807'
);
DEFINE OPERATOR FILE_WRITER()
DESCRIPTION 'TERADATA PARALLEL TRANSPORTER DATA CONNECTOR OPERATOR'
TYPE DATACONNECTOR CONSUMER
SCHEMA EMPLOYEE_SCHEMA
ATTRIBUTES
(
VARCHAR PrivateLogName = 'GT44_dataconnoper_privatelog',
VARCHAR FileName = 'GT44_employee.dat',
VARCHAR IndicatorMode = 'N',
VARCHAR OpenMode = 'Write',
VARCHAR Format = 'Unformatted'
);
DEFINE OPERATOR EXPORT_OPERATOR()
DESCRIPTION 'TERADATA PARALLEL TRANSPORTER EXPORT OPERATOR'
TYPE EXPORT
SCHEMA EMPLOYEE_SCHEMA
ATTRIBUTES
(
VARCHAR PrivateLogName = 'GT44_exportoper_privatelog',
INTEGER MaxSessions = 32,
INTEGER MinSessions = 1,
VARCHAR TdpId = @MyTdpId,
VARCHAR UserName = @MyUserName,
VARCHAR UserPassword = @MyPassword,
VARCHAR AccountId,
VARCHAR SelectStmt = 'SELECT * FROM SOURCE_EMP_TABLE;'
);
STEP setup
(
APPLY
('DROP TABLE SOURCE_EMP_TABLE;'),
('CREATE TABLE SOURCE_EMP_TABLE(EMP_ID INTEGER, EMP_NAME CHAR(10));'),
('INSERT INTO SOURCE_EMP_TABLE(1,''JOHN'');'),
('INSERT INTO SOURCE_EMP_TABLE(2,''PETER'');'),
('INSERT INTO SOURCE_EMP_TABLE(3,''FRANK'');'),
('INSERT INTO SOURCE_EMP_TABLE(4,''MARY'');'),
('INSERT INTO SOURCE_EMP_TABLE(5,''ELLEN'');'),
('INSERT INTO SOURCE_EMP_TABLE(6,''MICHAEL'');'),
('INSERT INTO SOURCE_EMP_TABLE(7,''SAM'');'),
('INSERT INTO SOURCE_EMP_TABLE(8,''JONATHAN'');'),
('INSERT INTO SOURCE_EMP_TABLE(9,''MICHELLE'');'),
('INSERT INTO SOURCE_EMP_TABLE(10,''ALICE'');')
TO OPERATOR (DDL_OPERATOR () );
);
STEP export_to_file
(
APPLY TO OPERATOR (FILE_WRITER() )
SELECT * FROM OPERATOR (EXPORT_OPERATOR() [1] );
);
);
GT62
/**************************************************************/
/* */
/* Script Name: GT62.txt */
/* Description: This Teradata Parallel Transporter sample */
/* script loads 10 records from a file into an */
/* empty Teradata table. */
/* */
/**************************************************************/
/* Explanation: */
/* */
/* This script uses three job steps. */
/* */
/* The first job step called "setup_tables" uses the DDL */
/* Operator to setup the source and target tables. */
/* */
/* The second job step called "export_to_file" uses the */
/* Export Operator to export the rows from a Teradata table */
/* and uses the Data Connector Operator to write the rows to */
/* a file. */
/* */
/* The third job step called "load_data_from_file" uses the */
/* Data Connector Operator to read records from a file and */
/* uses the Load Operator to load the records into an empty */
/* Teradata table. */
/* */
/* This script shows how to use the following Teradata */
/* Parallel Transporter features: */
/* * Unconditional job steps. */
/* * Export rows from a Teradata table and write the rows to */
/* a file. */
/* * Read records from a file and load the records into an */
/* empty Teradata table. */
/* */
/**************************************************************/
/* This script also demonstrates using "job variables", */
/* which are defined in the external file "jobvars.txt". */
/* These variables can be used anywhere inside the job script */
/* by prepending the @ symbol to the variable name. */
/* */
/**************************************************************/
/* Required customizations before running this script: */
/* */
/* Inside the file "jobvars.txt", modify the values for */
/* the following job variables: */
/* */
/* MyTdpId, MyUserName, MyPassword */
/* */
/* MyTdpId is the name of the Teradata Database system. */
/* MyUserName is the username on the Teradata Database system.*/
/* MyPassword is the password associated with the username. */
/* */
/**************************************************************/
/* Execution: */
/* */
/* Here is the command to execute this script: */
/* */
/* tbuild -f GT62.txt -v jobvars.txt */
/* */
/**************************************************************/
DEFINE JOB LOAD_EMPLOYEE_TABLE_FROM_FILE
DESCRIPTION 'LOAD SAMPLE EMPLOYEE TABLE FROM A FILE'
(
DEFINE SCHEMA EMPLOYEE_SCHEMA
DESCRIPTION 'SAMPLE EMPLOYEE SCHEMA'
(
EMP_ID INTEGER,
EMP_NAME CHAR(10)
);
DEFINE OPERATOR DDL_OPERATOR()
DESCRIPTION 'TERADATA PARALLEL TRANSPORTER DDL OPERATOR'
TYPE DDL
ATTRIBUTES
(
VARCHAR PrivateLogName = 'GT62_ddloper_log',
VARCHAR TdpId = @MyTdpId,
VARCHAR UserName = @MyUserName,
VARCHAR UserPassword = @MyPassword,
VARCHAR AccountID,
VARCHAR ErrorList = '3807'
);
DEFINE OPERATOR FILE_WRITER()
DESCRIPTION 'TERADATA PARALLEL TRANSPORTER DATA CONNECTOR OPERATOR'
TYPE DATACONNECTOR CONSUMER
SCHEMA EMPLOYEE_SCHEMA
ATTRIBUTES
(
VARCHAR PrivateLogName = 'GT62_dataconnoper_writer_privatelog',
VARCHAR FileName = 'GT62_employee.dat',
VARCHAR IndicatorMode = 'N',
VARCHAR OpenMode = 'Write',
VARCHAR Format = 'Formatted'
);
DEFINE OPERATOR EXPORT_OPERATOR()
DESCRIPTION 'TERADATA PARALLEL TRANSPORTER EXPORT OPERATOR'
TYPE EXPORT
SCHEMA EMPLOYEE_SCHEMA
ATTRIBUTES
(
VARCHAR PrivateLogName = 'GT62_exportoper_privatelog',
INTEGER MaxSessions = 32,
INTEGER MinSessions = 1,
VARCHAR TdpId = @MyTdpId,
VARCHAR UserName = @MyUserName,
VARCHAR UserPassword = @MyPassword,
VARCHAR AccountId,
VARCHAR SelectStmt = 'SELECT * FROM SOURCE_EMP_TABLE;'
);
DEFINE OPERATOR LOAD_OPERATOR()
DESCRIPTION 'TERADATA PARALLEL TRANSPORTER LOAD OPERATOR'
TYPE LOAD
SCHEMA EMPLOYEE_SCHEMA
ATTRIBUTES
(
VARCHAR PrivateLogName = 'GT62_loadoper_privatelog',
INTEGER MaxSessions = 32,
INTEGER MinSessions = 1,
VARCHAR TargetTable = 'TARGET_EMP_TABLE',
VARCHAR TdpId = @MyTdpId,
VARCHAR UserName = @MyUserName,
VARCHAR UserPassword = @MyPassword,
VARCHAR AccountId,
VARCHAR ErrorTable1 = 'GT62_LOADOPER_ERRTABLE1',
VARCHAR ErrorTable2 = 'GT62_LOADOPER_ERRTABLE2',
VARCHAR LogTable = 'GT62_LOADOPER_LOGTABLE'
);
DEFINE OPERATOR FILE_READER()
DESCRIPTION 'TERADATA PARALLEL TRANSPORTER DATA CONNECTOR OPERATOR'
TYPE DATACONNECTOR PRODUCER
SCHEMA EMPLOYEE_SCHEMA
ATTRIBUTES
(
VARCHAR PrivateLogName = 'GT62_dataconnoper_reader_privatelog',
VARCHAR FileName = 'GT62_employee.dat',
VARCHAR IndicatorMode = 'N',
VARCHAR OpenMode = 'Read',
VARCHAR Format = 'Formatted'
);
STEP setup_tables
(
APPLY
('DROP TABLE SOURCE_EMP_TABLE;'),
('DROP TABLE TARGET_EMP_TABLE;'),
('DROP TABLE GT62_LOADOPER_ERRTABLE1;'),
('DROP TABLE GT62_LOADOPER_ERRTABLE2;'),
('DROP TABLE GT62_LOADOPER_LOGTABLE;'),
('CREATE TABLE SOURCE_EMP_TABLE(EMP_ID INTEGER, EMP_NAME CHAR(10));'),
('CREATE TABLE TARGET_EMP_TABLE(EMP_ID INTEGER, EMP_NAME CHAR(10));'),
('INSERT INTO SOURCE_EMP_TABLE(1,''JOHN'');'),
('INSERT INTO SOURCE_EMP_TABLE(2,''PETER'');'),
('INSERT INTO SOURCE_EMP_TABLE(3,''FRANK'');'),
('INSERT INTO SOURCE_EMP_TABLE(4,''MARY'');'),
('INSERT INTO SOURCE_EMP_TABLE(5,''ELLEN'');'),
('INSERT INTO SOURCE_EMP_TABLE(6,''MICHAEL'');'),
('INSERT INTO SOURCE_EMP_TABLE(7,''SAM'');'),
('INSERT INTO SOURCE_EMP_TABLE(8,''JONATHAN'');'),
('INSERT INTO SOURCE_EMP_TABLE(9,''MICHELLE'');'),
('INSERT INTO SOURCE_EMP_TABLE(10,''ALICE'');')
TO OPERATOR (DDL_OPERATOR () );
);
STEP setup_export_to_file
(
APPLY TO OPERATOR (FILE_WRITER() [1] )
SELECT * FROM OPERATOR (EXPORT_OPERATOR() [1] );
);
STEP load_data_from_file
(
APPLY
('INSERT INTO TARGET_EMP_TABLE (:EMP_ID, :EMP_NAME);')
TO OPERATOR (LOAD_OPERATOR () [1] )
SELECT * FROM OPERATOR (FILE_READER() [1] );
);
);
Elizabeth Lipp is a freelance IT writer based in Dublin, Pa.
Teradata Magazine-March 2007
|