Tech2Tech
Applied Solutions 1
DAOs enable active integration
Data access objects meet the active data warehouse.
by Dan Graham
An active data warehouse delivers fresh up-to-the-minute data to front-line operational users such as sales agents, call center representatives or consumers searching a Web site. Since these end users do not use reports, it follows that integrating the active data warehouse with the front line is not done with traditional business intelligence (BI) tools. Instead, this typically means using Java or .NET programming to deliver analytic insights to a portal, batch program or Web site. For most SQL programmers and DBAs, this is an exciting journey into new architectures and cutting-edge software.
Java and .NET programmers use object oriented (OO) languages to develop applications, processes and Web services. To them, the active data warehouse is "just another database." They quickly discover, however, that the active data warehouse contains integrated, pristine data—a rarity for an operational database. Its quality, accuracy and completeness greatly simplify the coding required to assemble the data on a customer, an order, a sale, etc. Furthermore, with the active data warehouse, the OO programmer can avoid piecing together data from multiple repositories and coping with incomplete, dirty data.
Layer upon layer
Best practices in OO programming organize software development into layers of responsibility: presentation, business services, business objects, data access objects (DAOs) and persistent data. The presentation layer interacts with users, typically through a Web browser.
The business services layer is where all of the business logic for an application resides, such as "post an order" or "close an account." Web services—a hot topic in the IT industry—are usually found in the business services layer. The business objects layer is where OO data is stored in memory to be used by business services. These objects are defined as Java or .NET object classes, which contain data wrapped in OO methods that perform functions on the data. (See figure 1.)
DAOs are the interface between busi-ness objects and the relational database management systems (RDBMSs). As intermediaries between the application logic and the underlying data, the DAOs can be enhanced or rewritten by IT so one database product (e.g., MySQL) can be swapped out for another (e.g., Teradata Database). This isolation of function provides flexibility and reduces maintenance costs since only one module in the application is responsible for building customer data objects.
Finally, persistent storage is where flat files, databases, images and documents are stored on disk.
The OO-SQL gap
Java and .NET programs pass data to and from the JDBC and ODBC drivers via DAOs. In this regard, DAOs are considered the link between OO pro-gramming and relational semantics. OO languages rely on encapsulation, modularity, polymorphism and inherit-ance. In contrast, relational databases and SQL rely on tables, joins and views. Since these languages are so dissimilar, the DAO acts like a translator, converting objects into tables and vice versa.
Unfortunately, most programmers are skilled in either OO or SQL, but not both. This, coupled with the complexity of bridging the gap between the two languages, means not enough people can write good DAOs. Fortunately, the Java community took care of this years ago by improving upon Microsoft's first DAO, designed in 1992.
A generally accepted best practice is to use object relational mapping (ORM) tools when building DAOs. For .NET developers, Microsoft offers ActiveX Data Objects and, recently, LINQ for this purpose. In the Java community, popular ORM open-source tools include Hibernate, iBATIS, Apache Cayenne, Oracle Toplink and EclipseLink.
Three popular ways to get data to and from a Java program are:
NATIVE SQL
Programmers who are skilled at Java and want to avoid DAOs will use SQL inside a plain old Java object (POJO). This approach is useful in small applications that will not need a lot of maintenance. While it shortcuts much of the programming needed with ORMs, its limitations are sometimes prohibitive.
Coding Java directly to the JDBC interface saddles programmers with handling the OO interfaces and possibly session pools in their handwritten code. Programmers will also have to code the logic to convert rows and columns into business objects in memory, handling conversions and formatting manually. Since handling all of the object-to-relational interfaces is usually more trouble than it's worth, most programmers resort to ORM tools or other techniques instead.
HIBERNATE DAOS
Hibernate open source began in 1999 to solve limitations in the Enterprise Java Bean specifications and has achieved broad adoption. For purposes of generating DAOs that access a relational database, Hibernate provides a full complement of libraries, tools and services. Most often used with Spring Framework (another popular open-source product), Hibernate maps Java classes to relational tables. It also provides Hibernate Query Language (HQL), Hibernate SQL (HSQL) and a criteria-and-example API.
Using HQL, Hibernate generates the necessary SQL statements and maps the result sets into Java object classes. It is typically employed in online transactions to access a few rows in operational databases. Since Teradata Relationship Manager utilizes Hibernate, a Hibernate-certified Teradata dialect was added to the Hibernate Web site.
Hibernate supports a Java programmer's view of databases, and works well if the data model being accessed can be changed at any time to fit the object model. For the developer, Hibernate makes this easy, simple and transparent. Since it is not a data modeler's view of the database, the result is generally a plethora of tables that map directly to Java objects.
This open-source system works well when the database is modeled from the Java business object structure and when most accesses are based on discrete retrievals of primary or foreign keys. For example, when using HQL, Hibernate will often issue a separate request to each table, sometimes doing a join in the database, sometimes doing a join inside the Java Virtual Machine. Thus, Hibernate makes many calls to the RDBMS when looking up relationships, iterating through objects, etc. In effect, Hibernate is a "chatty" language, because it often has several conversations with the RDBMS in order to get all of the data elements of a customer_object from underlying relational tables.
While it works well with surrogate keys, Hibernate struggles with compound keys. Consequently, for performance reasons, a skilled DAO programmer will use relational views to hide underlying tables from Hibernate. Using views, Hibernate only "sees" one table, issues only one request and lets the RDBMS do all the joins and transformations inside the database. Hibernate also provides the HSQL language, which offers more control. However, this is a tool programmers rarely use until the default ORM mapping doesn't work.
IBATIS
On the other hand, iBATIS is useful when the database schema cannot be frequently changed to a reflection of new OO objects. iBATIS, which arrived in 2002, supports a data modeler's view of databases. Although it is less common in Java programming departments, iBATIS is gaining traction fast—especially now that it is a top-level project at Apache.org.
Like Hibernate, iBATIS is typically used with the open-source Spring Framework. Both Spring Framework and iBATIS are used inside Teradata Viewpoint.
The following iBATIS sample code shows how SQL is stored in an XML iBATIS map, which is invoked by Java code at runtime:
Place the SQL into an iBATIS XML object
<select id="getProduct"
parameterClass="com.example.Product">
resultClass="com.example.Product">
select
PRD_ID as id,
PRD_DESCRIPTION as description
from
PRODUCT
where
PRD_ID = #id#
</select>
Java code to run the iBATIS SQL object
Product paramProduct = new Product();
paramProduct.setId(123);
Product resultProduct = IBATIS.getObject
("getProduct", paramProduct);
iBATIS is not an ORM tool. Instead, it maps SQL statements to and from database tables. This allows complex statements to be mapped to Java classes. Teradata programmers will quickly zoom in on the iBATIS Data Mapper library, which uses XML to loosely couple Java objects and SQL statement "maps." At design time, iBATIS stores the actual SQL in an XML document library, which Teradata programmers can easily edit. Then at runtime, Spring Framework selects a map based on the iBATIS configuration file, passing them to an iBATIS DAO for execution. As a result, any POJO can easily invoke the SQL map at runtime.
Unlike Hibernate, a one-to-one relationship of objects to tables generally does not occur with iBATIS. Best of all, iBATIS cleanly divides coding responsibilities: Data experts can work with SQL without being concerned about OO constructs, and Java developers can write business service objects without worrying about RDBMS semantics.
While iBATIS is superior for accessing the active data warehouse, many organi-zations have standardized on Hibernate or other ORM tools such as Toplink, Java Data Objects or Apache ObJectRelationalBridge. Each has strengths and weaknesses that require adaptation to third normal form data warehouse designs.
Hibernate and iBATIS experts may be inclined to use Oracle OLTP programming techniques with the Teradata Database. This doesn't work and produces poor performance. For example, special indexes in one RDBMS may not exist in the other. Thus, Hibernate and iBATIS experts need to apply Teradata programming and design techniques to get the best performance.
Session Pools
The DAO programming layer often operates inside a Web application server such as IBM WebSphere, BEA WebLogic, SAP NetWeaver, JBoss or Apache Tomcat. The Teradata Type 4 JDBC driver supports all of these. (See figure 2) The Web application server provides session pooling and transaction management capabilities that link to the JDBC driver.
Once the business services object establishes a database login session, the DAO can pass its SQL through JDBC to the database. It is within this layer that the following steps occur:
- Get a JDBC connection
- Begin a session
- Begin transaction
- Commit or rollback
- End session
This is where the DAO interface should use Teradata querybanding to identify the real end user as well as other workload management controls.
DAO benefits
DAOs are a best-practice design pattern sanctioned by all major database vendors. By moving all database accesses to DAOs, the application logic and data access logic can evolve independently without affecting each other. Additionally, the DAO design pattern:
- Is easily understood by most developers
- Fits well into distributed or Web services architectures
- Can take advantage of database vendor-specific capabilities (e.g. macros, stored procedures)
Building DAOs using ORM tools or SQL mapping will reduce coding efforts by handling the conversion of tables to and from business objects. These tools also assist in managing connections, sessions and interactions with the JDBC and ODBC drivers so Java and .NET programmers can easily integrate into the active data warehouse.
While programming DAOs isn't too difficult, a culture and skills gap exists between the OO programmer and the data warehouse SQL programmer. Fortunately, most IT organizations employ both skill sets so that when collaborating on active data warehouse projects, each team contributes its expertise and learns from the other team.
Dan Graham has more than 35 years in IT and leads Teradata's Active Data Warehouse Technical Marketing team.