What is new in the Teradata WorldMark platform?
Teradata Warehouse 6.1 (TW6.1) continues the commitment to deliver the latest technology in the platform portion of the solution. Intels 900 MHz PIII processors anchor the latest update, increasing the CPU power per node by at least 15%. Teradata integrated the latest in disk-array technology from our storage partners (LSI and EMC), including 18 GB, 15 K rpm and 36 GB, 10 K rpm drives. Components of the newest technology can coexist with components of the previous generation in the same system.
On Microsoft Windows 2000 we added Fibre Channel I/O, 8-way SMP servers from commodity server vendors and Advanced Server Logo Certification from Microsoft. Teradata delivers all features and functions of TW6.1 simultaneously on both Unix and Windows 2000 and on servers from 2 CPUs to 2,000 CPUs due to the common code base implementation for all target platforms.
Will TW6.1 run on our current Teradata platform?
Current customers can upgrade to TW6.1 on their existing platforms. All Unix- and Windows 2000-based platforms previously supported by Teradata can run TW6.1. Applications and tools that work on Teradata are compatible with TW6.1 without changes.
Client interfaces and utilities will continue to operate on previously supported platforms. HP-UX and AIX have been added as supported platforms, allowing more choices of application platforms from which Teradata may be accessed.
What is new for operational queries?
Teradata Stored Procedures, introduced in V2R4.0, gain several enhancements in V2R4.1. CASE and REPEAT UNTIL statements have been added along with exception handler facilities. Recursive stored procedure calls are now allowed. Performance of stored procedures is enhanced by caching the object code of frequently used procedures and speeding up expression evaluation. These enhancements make it easier to move the logic for operational decision-making applications into the database engine, where it can execute near the data.
Are there any other updates to stored procedures?
Data Definition Language (DDL) and Data Control Language (DCL) statements are now allowed in stored procedures. These statements can be created dynamically by logic in the stored procedure. Many customers have asked for the ability to create tables, users and other schema structures using parameterized code stored in Teradata. That is now possible with a stored procedure that has parameters for the name (or part of the name) of a table, parameters for table attributes and some logic to assemble the CREATE TABLE statement. This allows DBAs and system administrators to package frequent tasks into parameter-driven procedures that can be executed simply.
What improvements help performance of high-volume operational queries?
The Join Index (JI) and Aggregate Join Index(AJI) features have been extended to cover more of the complex operational queries that require short response times. Previous versions of the Join Index required a querys needs to be fully covered by the definition of the JI. In V2R4.1, the JI can truly be used as an index to the base table as well. This allows alternative distributions of the data in a JI to qualify rows quickly and then be used to retrieve data from the base table.
Single-table JIs will no longer require their own statistics; the optimizer will look through the index to the base table to understand the data demographics. Many additional opportunities for optimizing complex queries to use JI and AJI in the plans have been included, all automatically identified by the Teradata query optimizer. And the locks placed by the CREATE JOIN INDEX statement have been relaxed to allow full access to all involved tables while the index is created.
What if we want to make decisions as events happen?
Parallel triggers can be used to detect events as data changes in the active data warehouse. In V2R4.1, triggers are further optimized to ensure that they can handle a high volume of events. The optimizer has been enhanced to generate more efficient plans for complex trigger cases. Many more triggers will result in plans that use one or two units of parallelism, delivering scalable performance.
How do we keep the data flowing into the warehouse?
The TPump continuous data load utility was updated in TW6.1. The UPSERT operator, which allows an update to become an insert if the row does not exist, used to be a performance challenge for large volumes of data. The UPSERT operation has been moved into the Teradata DBMS so that it can be done as a native, high-performance SQL function. An ELSE INSERT clause was added to the UPDATE SQL statement to implement the UPSERT capability. This change in update ability provides up to 50% improvement (highly dependent upon the percentage of inserts and errors) in UPSERT performance for TPump.
Another method of keeping data more current is via SQL update statements. Many times the data for these requests comes from other tables in the database. For UPDATE and DELETE operations that join on a primary index, a new plan will be generated that confines the work to a single unit of parallelism. A similar plan is generated for an INSERT SELECT operation that retrieves records from a primary index on the source table. These plans also will lock at the row level. The result is significantly improved scalable performance for even high volumes of these update operations.
What is new for analytic and data mining work in the warehouse?
V2R4.1 continues the strategic direction of making Teradata into a scalable analytic engine as well as a data warehouse database. Linear regression has been extended to allow sum of squares, x and y average and several other regression analysis functions. Trigonometric and hyperbolic functions have been added for data-mining analysis. Percent rank and width bucket have been added for segmenting data in the warehouse. The new soundex function allows matching character strings when precise spelling is not available.
Teradata Warehouse Miner gets a significant upgrade to take advantage of all the analytics in the database. Logistic and linear regression plus factor analysis were added to the large suite of statistical analysis functions in the previous release. Machine learning techniques, including clustering and decision trees, are executed inside the database without data extraction, delivering scalable performance over even the largest volumes of detailed data. SQL-based model scoring and evaluation produces results from the calculated model within the database, scalable to any data size. Visualization displays the mining results in a variety of ways to deliver a fully scalable data-mining offer, deeply integrated into the database engine.
Are there any limits to active data warehousing with Teradata?
Every system has limits, but we are always working to make sure that we remove any that might interfere with delivering active data warehousing at any level. The maximum size of a data block has been increased to 128 KB to take advantage of large disks and deliver high performance to scans. A single unit of parallelism has been expanded from 112 GB to allow access to 1.3 TB of data. Limits on the number of temporary tables that a single session can have at the same time have been lifted: Global Temporary from 32 to 1,000 and Volatile Temporary from 64 to 2,000. Together, the platform and database are now capable of delivering a high-performance 1 PB data warehouse to a customer ready to solve a problem of that magnitude.
For systems of this size, how do we keep track of the metadata?
The Metadata Services (MDS) component of Teradata, first released in V2R4.0, manages the warehouse metadata. MDS received a facelift in TW6.1 with a browser-based user interface. MDS makes visible the metadata from the database catalog, annotation from the owner of the model and information from the ETL processes. It shows the links, associations and dependencies between the objects in the metadata. It allows users at several levels (end users, power users and administrators) to understand the impact of change to objects in the schema. The repository is updated automatically as any schema changes are made in the database so the information is always current, and the changes are logged with the user ID of the initiating user for an audit trail. If interchange of metadata with other tools or repositories is required, MDS provides standard export/import of metadata using XML and the CWM meta-model.
With all of those things going on concurrently, how can we manage the workload?
Teradatas Database Query Manager has been re-architected; the functionality has been moved from the ODBC (open database connectivity) driver into the DBMS. Now, requests from all client platforms can be managed according to parameters set up by the administrator. All queries from all interfaces, tools, applications and platforms pass through the control of the query management facility. We have renamed the facility Teradata Dynamic Query Manager (TDQM), reflecting its expanded role in managing the workflow through the active data warehouse.
Teradatas Priority Scheduler Facility has been implemented on Microsoft Windows 2000, allowing mixed workloads to be managed on Teradata regardless of the platform operating system. Teradata Manager now has an interface for controlling the priority scheduler without a separate operating system logon.
How can we see what is going on inside Teradata?
Teradata Manager has functions for viewing resource utilization at both the system and session level in real time. TW6.1 adds the capability to view the work in progress within each session. Teradata Manager will display the currently running SQL, plan explain, currently executing plan step and resource utilization in real time.
The Teradata Visual Explain facility has been enhanced to be more accessible to all users of the system and to add more detail to the captured information. New information includes capturing the indexes and statistics, base-table definition and cardinalities, and spool-table column and sort-key information.
Teradata Visual Explain is made even more useful with the integration of the Teradata System Emulation Tool (TSET) utility. This tool allows the system and optimization parameters to be extracted from one system and loaded on another. For instance, all the parameters from a production system can be loaded onto a development system. Then Teradata Visual Explain can be used on the development system to see how queries will be planned for the production system.
Teradata Warehouse 6.1 contains feature, function, performance and manageability for all aspects of an active data warehouse. The Worldmark platform is updated with the latest technology. Teradata Database V2R4.1 offers new levels of performance for decisions of all sizes and shapes. Teradata utilities and interfaces take advantage of the new features to deliver value to the users and managers of the warehouse. This latest episode of Teradata again sets the standard for the technology required to deliver data warehousing the way todays customers requirean active data warehouse. T
E-MAIL ME
Looking for answers to lifes mysteries? Or would you just like to know more about the Teradata Warehouse and related applications?
Ask the Expert!
E-mail questions and comments to Todd at: todd.walter@ncr.com