|
|  | |
|
FAQ
|
| Q1: | I am loading data into Teradata via TPump. There seems to be an issue inserting a row into one table from two sources at the same time. When I look at the performance monitor, I can see that the data is being blocked. Logically, I find it difficult to believe that multiple sources could be loading the same table simultaneously without locking issues. I guess it could work, depending on blocks, cylinders, etc., but it sounds like at some point (statistically) you'd run into a blocking scenario. Are there any best practices around using TPump and multiple sources loading the same table?
|
| A1: | There are two parameters to control locking in TPump: PACK and SESSIONS. With a large PACK value, TPump will run faster, but blocking will happen more frequently. SESSIONS just indicates the number of users concurrently inserting data via TPump.
Start by lowering PACK and SESSIONS on the TPump jobs you are running. TPump has a SERIALIZE function that avoids cross-session collisions by hashing identified key values to particular sessions; this also preserves order. Ultimately, you can run with PACK=1, SESSIONS=1, SERIALIZE ON. This way you may end up with no blocking, but TPump may run slow.
|
| Q2: | I need help regarding Teradata's file writing utilities. In my application, I have a procedure that retrieves data from more than one table and stores data in variables after manipulation. I need to write the data to a file, which in turn can be loaded into a target table using MultiLoad. (This is because direct writing to the table takes a lot of time for large data volumes.) Which Teradata file writing utility can be used in this case?
|
| A2: | In Teradata we always try to get things done using set logic. You do not state what kind of data manipulation you need to do. If there is any way to do it in SQL statements, then the best solution for this type of application is to use INSERT SELECT without the data ever having to leave the database.
If it is absolutely necessary to retrieve and process a large amount of data outside Teradata, then it sounds like a job for the FastExport utility. FastExport accepts any SQL statement and retrieves the result using an interface optimized for high-volume export. It also has an interface for OUTMOD or Access Module that allows you to process the data on the way out of FastExport and lets you to write it anywhere you desire.
|
| Q3: |
Test (1) Environment:
database: dev1
Tables: tab1, tab2
Test (2) Environment:
Database: dev2
Tables: tab1, tab2
Both databases run on the same Teradata system and have identical table structures, names, access rights, etc.
I have a query that looks like this:
select col1, col2, col3...
from dev1.tab1, tab2
where inner-join conditions ...
When I try to run this query, one of several things happens:
- When the default database is 'dev2,' I get a Table/View/Macro name 'tab2' ambiguous error.
- When the default database is 'dev1,' the query runs fine.
- When the default database is either 'dev2' or 'dev1' without the 'dev1' qualifier on 'tab1,' the query runs fine.
- When the default database is either 'dev1' or 'dev2' with both the tables properly qualified as dev1.tab1 and dev2.tab2...5.
Is this a bug or am I missing a link?
|
| A3: | Think of the default database as one entry in a search list of databases, not as a text substitution for a missing database specifier. All other databases named explicitly in the request are also added to the search list. Then each object is looked up in the databases in the search list. An object that appears in multiple databases in the search list is ambiguous.
|
| Q4: | I've looked as some test results showing that single-AMP (PK access) queries are not impacted by VLC to any notable degree. If compression is defined but not used (because the particular rows' columns were not compressible), then there was less than 1% overhead detected (-0.14% throughput change) when performing five sessions doing PK selects. This is equivalent to nothing, and it is a heavy load.
Another test with 100% compression (all the columns had to be uncompressed) showed single-AMP selects under the same conditions also having less than 1% overhead (-0.53% throughput change). That is also too small a factor to care about.
With short all-AMP tactical queries you may be able to benefit from VLC to some degree, or at least not experience degradation. In addition, because VLC has the potential of reducing overall I/O demand on the platform, it can provide a secondhand benefit to short tactical queries because it can reduce contention for I/O. Tactical queries always do better in environments that are not maxed out on a particular resource. VLC can be helpful in restoring a healthier balance between CPU and I/O demand, if the platform tends to be I/O-bound.
What do you think?
|
| A4: | Actually, Teradata VLC compression has shown in testing as well as customer sites to offer a benefit in the load utilities. There are two reasons for this:
1.) When rows are compressed prior to loading, more of the rows can be placed into a single data block. Therefore, the load utility is doing less physical I/O to store more rows.
2.) With PPI, if you have partitioned on day, for example, and are loading rows that all go into the same partition, often those data blocks are so few and so concentrated that they stay in the FSG cache. While this benefit will depend on many factors (your mileage will vary), and you might not see this at all, let me give you an illustration. During one test we loaded a PPI table via TPump where the rows targeted one partition, and we experienced almost zero I/O load. That significantly reduces time to load, as you can imagine. ARCMAIN will benefit as well, as there will be fewer blocks to back up.
In terms of the test results, I've seen 20-25% improvement using FastLoad. But what you will experience will depend very much on how many columns and rows can be compressed, and how large they were to start with. However, you should always see some improvement.
|
© Teradata Magazine-June 2004
Archived FAQs |
|
|
|