Design Issue
Teradata Teradata Discussion Forums Teradata.com Discussion Forum
Visit Teradata.com
Home       Guidelines    Member List
Welcome Guest ( Login | Register )
        


This online forum is for user-to-user discussions of Teradata products, and is not an official customer support channel for Teradata. If you require direct assistance, please contact Teradata support.


Design Issue Expand / Collapse
Author
Message
Posted 9/2/2009 11:47:33 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: Forum Members
Last Login: 9/29/2009 12:25:15 PM
Posts: 2, Visits: 11
I have surrogate key column and non unique key column.I have to assign primary index. Which column i have to choose (either surrogate key column or non-unique column) in order to make effcient design?
Post #16726
Posted 9/7/2009 4:54:24 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: Forum Members
Last Login: 10/6/2009 7:12:18 AM
Posts: 5, Visits: 43
The thumb rule is PI: It is used for data distribution.

PI can be decided based on Frequency of usage in Where Clause columns (either it could be surrogate key / natural key)

If the non-unique column is having low cardinality (say M/F) then your NUPI can cause skewness in AMPS.

Thanks
Toad
Post #16775
Posted 9/8/2009 4:46:49 PM
Supreme Being

Supreme BeingSupreme BeingSupreme BeingSupreme BeingSupreme BeingSupreme BeingSupreme BeingSupreme Being

Group: Forum Members
Last Login: 10/8/2009 1:16:14 PM
Posts: 79, Visits: 105
Hi,

To add more on choosing PI first do the data profiling:
Consider both the column as candidate columns for PI selection and identify the following stats for each of them:
1. Number of rows
2. Number of distinct values
3. Typical rows per value
4. Number of NULL values
5. Value Access Frequency - Frequency of usage of this column in Where Clause
6. Join Access Frequency - Frequency of usage of this column in Join

Based on the above stats give priority to the 6 and 7 items, and you can have the data skewness upto 20% if that column is going to be mostly used in Join/Where.


Regards,
Balamurugan
Post #16784
Posted 9/25/2009 1:29:34 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: Forum Members
Last Login: 10/21/2009 4:47:49 PM
Posts: 7, Visits: 29
Hi

Its a trade off between data distribution and retrival. Surrogate key can be a good PI candidate for data distribution but wont be effective in queries on the table. Thus if the table PI doesnt exist in queries, then the retrival would be slow because of higher number of AMPs involvement,data redistribution or dublication.
Post #16930
Posted 9/28/2009 11:28:01 AM
Junior Member

Junior MemberJunior MemberJunior MemberJunior MemberJunior MemberJunior MemberJunior MemberJunior Member

Group: Forum Members
Last Login: 10/5/2009 12:34:37 PM
Posts: 10, Visits: 26
I believe you are in physical database design stage. Now in order to choose a good PI here you should have complete ELDM ( extended logical datamodel design ) The ELDM components are "join clause and Data demographics"

Join Clause inculdes :
1)join access frequency
2)Value access frequency
3)Value access rows

Data Demographics include:
1)Distinct rows
2)Max rows per value
3)max rows null
4)typical rows per value
5)change rating

Note: A great PI will have : high value access frequency ,join access frequency, reasonable distribution and a change rating below 2 (out of 10).



Cheers,
Samp
Post #16958
« Prev Topic | Next Topic »


Reading This Topic Expand / Collapse
Active Users: 0 ( 0 guests, 0 members, 0 anonymous members )
No members currently viewing this topic.


All times are GMT -5:00, Time now is 6:46pm

Powered By InstantForum.NET v4.1.4 © 2009
Execution: 0.078. 6 queries. Compression Disabled.