FAQ
Q1: During spool redistributions to different AMPs (merge joins where primary indices are different), some records come to the same AMP after rehashing on the basis of join columns. Do they really go to the BYNET or message-passing layer for rehashing, or are these records directly spooled locally?
A1: When rows are redistributed during join processing, there is a same-node optimization that is done by the message system. The redistribution message, when it is sent, uses VPROC and node addressing rather than hash addressing (it could be a different node, different AMP, same node, different AMP, or same node, same AMP). The optimization consists of recognizing when the destination is on the same node as the sender and routing that message to the correct AMP, without leaving the node. Only rows being redistributed off the node from which they were sent will be handed off to the BYNET. A row being sent out and returned to the same AMP leaves the AMP within a sender task and returns to the same AMP within a receiver task.
Q2: Does Teradata support API/CLI for pre-defined, vendor supplied, third-party supplied and user defined complex data types on the database? To clarify, I am looking to understand if things such as composite field types, create sub-tables, arrays of like types or free form arrays are supported.
A2: The upcoming Teradata Warehouse 8.1 (Teradata Database V2R6.1) will support user defined types (UDTs), including complex data/structured types. It does not provide direct array support and array operations yet. Teradata Database V2R5.1 included BLOBs and CLOBs for storing large unstructured data and user defined functions (UDFs) for processing that data. While the typing is not explicit, any array or structured type can be stored using these types and can be processed in the UDFs. The data structure must be represented in C programming language within the UDFs rather than the database knowing the type structure. This ability continues and is extended by UDT implementation in Teradata Database V2R6.1.
Q3: When insert selecting, I assume the system is doing something to the data on the way, such as creating a spool file. Further, because maybe only one field is being changed, I assume that the spool file is uncompressed then recompressed on the other end. I am insert selecting a 600GB table to a new data structure with new columns, a couple of transformations and both partitioned primary index (PPI) and multi-value compression. Am I right to assume that I will need 2 x 600GB plus the size of the uncompressed data in spool (40%-50% of 600GB)? If so, I will need 1.5TB of space and the spool file will need complete cylinders. The system currently is 67% full CurrentPerm/MaxPerm.
A3: You are correct that the spool is not compressed. How much space it requires depends on the plan. Does the plan need to spool it only once or does it have to do joins, for example, to get the additional columns and transforms? All of this should be taken into account. You will need to target the necessary source space as well as the space required by the uncompressed copy in spool.
Q4: When performing INSERT SELECT within target tables and concurrently generating reports with access locking, we have found that large amounts of data disappear from the target tables. We know that other factors may influence this observation, such as a faulty description. We are running Teradata Database V2R4.1 and moving to Teradata Database V2R5.1.1 and want to know if this has been resolved in the new release.
A4: This should not be an issue with INSERT SELECT. Based on your description, there is only one case where this is an issue. That is when UPDATE WHERE is being used with a join. This can result in an access lock reader seeing a lot of missing rows. You can always review the explain plan to see if it shows a delete step followed by a merge insert. If the select sees the table during the time that the delete has operated, then those rows will be missing until the merge step completes. In Teradata Database V2R5, merge update, in most cases, has been changed to perform a direct merge into the table; again this can be seen in the explain plan. Prior to Teradata Database V2R5, any Update that uses the delete/merge plan should be executed with a LOCKING EXCLUSIVE clause if access lock selects might possibly be performed concurrently.
Here are some additional questions answered by Carrie Ballinger, a Teradata Certified Master and a senior technical consultant in Teradata's Active Data Warehouse Center of Expertise. If you would like to get some additional tips and insight from Carrie, be sure to check out her column titled "Learning from each other."
Q5: Can partitioned primary indexes (PPIs) replace value-ordered indexes (VOIs)?
A5: In many cases, yes. However, it depends on the application. A VOI is a special type of index whose sub-table rows are stored in order by index value. With VOIs you have flexibility over the range of access. The same VOI structure will provide efficient access for ranges of two days, two weeks or two months. With PPIs, your partition boundaries are fixed. If you partition by month, then partition elimination cannot be less than one month, even if you only require data from one day. If your access range size varies, consider finer partitions in your PPI table—a day or a week instead. Often, this is preferable to supporting a VOI on the same column as the partitioning column.
Q6: Are AMP worker tasks (AWTs) held by queries when they are returning answer sets? During a quiet time on my system, I noticed 10 sessions, all in RESPONSE mode; however, when I issued a puma -c command, there were no AWTs in use. What is happening here?
A6: AWTs working on behalf of a query are released at the end of the final query step before the BYNET merge process begins. The last AMP to complete its work in building a query's final spool collects row counts from all AMPs, responds to the dispatcher that the step is complete and gives the final row count. The dispatcher then tells the BYNET to start a merge process.
The BYNET driver has special data buffers where rows that will feed into the merge are to be placed. The BYNET driver also has a special place in memory to track details about a query's response processing, such as its merge-ID; the spool-ID where the rows to be returned are located; and row-ID, which tells how far that spool has been read. AWTs are acquired as needed and only for the brief period of time it takes to move data from the final spool file into the BYNET buffers.
After all rows in the first BYNET buffer have been returned to the client, and if there is more data in the spool, a new AWT will get involved briefly to move data from the spool into the BYNET buffer. Once the merge is complete, the BYNET driver tells the dispatcher it is finished, completing the response cycle.
So, except for very brief periods of time when AMP activity is required to fill the BYBNET buffers, there are no AWTs held during response processing. T