|
|
|
Forum Newbie
      
Group: Forum Members
Last Login: 10/29/2009 3:38:06 AM
Posts: 5,
Visits: 18
|
|
Hi,
I'm trying to validate that each character in a varchar(100) row is between 0 and 9 on a 250M row table. Any rows that are not valid will be ignored in an insert statement into a base table. However, this is giving me a hard time. I tried using a tally table with substring comparison in an exists clause, but the data are just too huge. Is there an efficient way to validate my data?
Thanks,
David
|
|
|
|
|
Supreme Being
      
Group: Forum Members
Last Login: Yesterday @ 3:55:54 AM
Posts: 111,
Visits: 263
|
|
Check if you have the Oracle UDF's installed - you can get them from the Teradata Download centre.
There is an otranslate finction - Oracle's Translate - which can check them for you.
See:
http://teradataquestions.stackexchange.com/questions/16/the-format-or-data-contains-a-bad-character
|
|
|
|
|
Forum Newbie
      
Group: Forum Members
Last Login: 10/29/2009 3:38:06 AM
Posts: 5,
Visits: 18
|
|
|
I'm a downstream user with few permissions and I see no UDFs in the meta-data. My work is pretty much limited to pure SQL.
|
|
|
|
|
Forum Newbie
      
Group: Forum Members
Last Login: 2 days ago @ 9:26:30 PM
Posts: 6,
Visits: 11
|
|
I have not understood your doubt?
What you want to do , either take the valid records only or you want to see the invalid records?
If you want to see the invalid records(records with character from Ato Z).
Login through bteq
use .Set Session transaction ANSI ; before login
create volatile table test_acct
(acct_num varchar(20) ,
acct_num2 varchar(20) )
on commit preserve rows ;
insert into test_acct
sel upper(acct_num) , lower(acct_num)
from accounts ;
Now you can take out the records which has characters by
sel * from test12
where acct_num <> acct_num2 ;
Hope this works
:)
Rupesh Baheti
Datawarehouse Analyst
OCBC Bank
Singapore
|
|
|
|
|
Forum Newbie
      
Group: Forum Members
Last Login: 10/29/2009 3:38:06 AM
Posts: 5,
Visits: 18
|
|
That's interesting as far as alpha-numeric data, but these strings also include various non-alpha characters. However, I solved the problem through direct analysis of each character column by grouping:
SELECT SUBSTRING( order_number FROM n FOR 1 ), COUNT(*) AS Total
FROM Orders
GROUP BY SUBSTRING( order_number FROM n FOR 1 )
ORDER BY SUBSTRING( order_number FROM n FOR 1 )
;
In the end I found there were repeated patterns which could be filtered easily. If new patterns appear in the future, then the process will break and the script will have to be modified. The data are pretty regular, so not too bad really.
Thanks for the suggestions!
|
|
|
|