Is anyone performing Statistical analysis on data using Teradata SQL
Teradata Teradata Discussion Forums Teradata.com Discussion Forum
Visit Teradata.com
Home       Guidelines    Member List
Welcome Guest ( Login | Register )
        


This online forum is for user-to-user discussions of Teradata products, and is not an official customer support channel for Teradata. If you require direct assistance, please contact Teradata support.


Is anyone performing Statistical analysis on... Expand / Collapse
Author
Message
Posted 3/28/2008 1:10:31 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: Forum Members
Last Login: 6/30/2008 2:11:40 PM
Posts: 2, Visits: 6
We have an application that is written in 'R' and was wondering if we could incorporate the calculations directly in Teradata SQL.

We do Linear Algebra, matrix/vector calculations, Multidimensional Statisticsal Distributions, Random Number Generation, Numerical Optimizations and use algorithimy for correlation estimation simluation, aggregation, and disaggregation.

Any one doing this in the Teradata environment?
Post #11085
Posted 3/30/2008 8:41:29 PM
Junior Member

Junior MemberJunior MemberJunior MemberJunior MemberJunior MemberJunior MemberJunior MemberJunior Member

Group: Forum Members
Last Login: 2 days ago @ 3:36:10 AM
Posts: 20, Visits: 67
SAS guys are working with Teradata guys to add statistical functions to Teradata.

I do a lot of simple CASE statements with nested SUM combined with GROUP BY a key column as a form of transposing and summarisation. The 'case' satements create many dummy variables and the 'sum' and 'group by' are used to aggregate to a single row of data per customer ID. This is common data preparation prior to running a linear regression or similar predictive modelling algorithm. I use QUARTILE in order to squish my data into percentiles (or even deciles if I can get away with it) as a lazy way of reducing the effects of outliers.

I don't build the models in Teradata (obviously), but do score linear regression, C5.0, CART decision trees, and also back progagation neural networks as SQL on our Teradata box. Once prepared, all data preparation, transformation and predictive scoring is on the Terdata box. The SQL is massively verbose and not very effectively written (much of it is auto generated and looks hideous) but it is optimised well and runs amazingly fast.

As a freeware tool, I wouldn't expect 'R' or the related tools ('Yale' i think it's called) to be able to autogenerate SQL and do a lot of the data preparation in-database. The main commercial data mining tools have this functionality. I'm biased because I'm ex-SPSS and heavily use Clementine, but SAS appearently has similar basic features. Maybe the freeware tools will be developed in a similar way over the next few years.

Cheers

Tim

Post #11093
« Prev Topic | Next Topic »


Reading This Topic Expand / Collapse
Active Users: 0 ( 0 guests, 0 members, 0 anonymous members )
No members currently viewing this topic.


All times are GMT -5:00, Time now is 6:49pm

Powered By InstantForum.NET v4.1.4 © 2008
Execution: 0.063. 8 queries. Compression Disabled.