Overview
The ability to efficiently process, collate, and analyze data has only grown more valuable in recent years. Cloud data warehousing allows large organizations to meet this need with considerable agility, effectiveness, and speed. Gartner has predicted that by 2022, 75% of all databases will have migrated to — or be initially deployed within — the cloud.
All available evidence seems to suggest that enterprise data use will continue to increase, and that the types of data being circulated, and their sources, will become more varied. This evolving diversity of data requires a flexible approach that may be outside the scope of what traditional data warehousing methods can handle, and that's where the cloud comes in.
On-premises infrastructure still offers value for data warehousing, but the cloud-first data warehouse, including hybrid environments, has rapidly become the norm. Trends indicate this shift will continue. Let's take a look at the fundamentals of this technology and examine how best to evaluate and implement a cloud data warehousing solution for your enterprise.
What is cloud data warehousing and how does it work?
A cloud data warehouse performs all of the functions you would expect of a traditional data warehouse — data processing, collation, integration, cleansing, loading, reporting, and so on — but does so within a public cloud environment. Major examples include Microsoft Azure SQL Data Warehouse, Amazon Redshift, Teradata Vantage, Google Cloud's BigQuery, and Snowflake Cloud Data Platform.
Like its on-premises counterpart, an enterprise data warehouse deployed in the cloud is typically a relational database, focusing on structured and semi-structured data. This is the kind you'd see in various customer relationship management (CRM), enterprise resource planning (ERP), and point-of-sale applications, to name just a few. Unstructured data, meanwhile, is typically aggregated using a data lake framework, which can also be cloud-based.
At the most granular level, the majority of data stored in a warehouse is characterized as either facts, measures, or dimensions:
- Facts: Data points connected to specific events or transactions, e.g., "Compensated employee John Smith $4,000 for the month of $48,000 per year gross salary."
- Measures: Precise numbers connected to facts — based on the above example, one of the measures would be "Monthly pay: $4,000."
- Dimensions: These categorize facts and measures with more structured contextual information, such as "Employee name: John Smith" and "Direct deposit dates: Nov. 15 and Nov. 30."
A cloud database is distinguished by its versatility, and as such it can easily be multi-dimensional. In addition to its ability to easily manage many dimensions of both current and historical big data in a single venue, a modern cloud database can operate on serverless architecture, which can help minimize an enterprise's data management responsibilities. Alternatively, cloud databases may use the cluster-and-node approach, in which two or more physical servers are used.
Traditional data warehousing vs. cloud data warehousing
Traditional enterprise data warehouses are typically deployed on-premises and built around a structured, multi-tier architecture. In these environments, organizations manage the infrastructure themselves, including hardware provisioning, storage, and compute resources. Data flows through several layers before it becomes available for reporting and analytics.
Traditional data warehouse environments are also implemented using different structural models that determine how data is organized and accessed across the enterprise. Most traditional warehouse architectures follow a three-tier structure. At the base is the database layer that collects data from operational systems and other sources. The middle tier contains online analytical processing (OLAP) services that transform and organize data for analysis. At the top tier, reporting and analytics tools provide dashboards, visualizations, and query interfaces for business users.
Traditional data warehouses may also be organized in different structural models depending on how data is centralized and accessed. Some organizations maintain enterprise warehouses that aggregate data from across the business, while others build data marts that focus on specific departments or subject areas. Virtual warehouse approaches may also allow multiple databases to be queried together as though they were a single environment.
Cloud data warehouses take a different approach. Instead of relying on fixed infrastructure, they run on cloud platforms that provide elastic compute and storage resources. This allows organizations to scale capacity as data volumes and analytics workloads grow. Cloud deployments typically fall into two broad architectural patterns: cluster-based systems, which use multiple nodes to process queries, and serverless systems, where the cloud provider dynamically manages compute resources.
Another key difference between traditional and cloud data warehousing lies in how infrastructure is shared or isolated across customers. Cloud providers commonly offer several tenancy models that balance performance, cost efficiency, and operational control.
| Deployment Model | Architecture Description | Performance Characteristics | Cost Profile | Level of Control | Typical Use Cases |
|---|---|---|---|---|---|
| Single-tenant cloud data warehouse | Dedicated infrastructure for a single organization. Compute and storage resources are not shared with other customers. | High and consistent performance due to isolated resources and predictable workloads. | Higher cost because infrastructure is reserved for one tenant. | Maximum control over configuration, security policies, and resource allocation. | Highly regulated industries, sensitive workloads, and enterprises requiring strict performance isolation. |
| Multi-tenant cloud data warehouse | Multiple organizations share the same underlying infrastructure managed by the cloud provider. | Performance can vary depending on resource allocation and workload sharing across tenants. | Lower cost due to shared infrastructure and economies of scale. | Limited control since infrastructure management is handled primarily by the provider. | Organizations prioritizing cost efficiency, rapid deployment, and simplified operations. |
| Hybrid cloud data warehouse | Combines dedicated resources with shared cloud services or integrates on-premises infrastructure with cloud environments. | Balanced performance depending on which workloads run on dedicated versus shared infrastructure. | Flexible cost structure that allows organizations to optimize spending across environments. | Moderate to high control depending on deployment design and integration with existing systems. | Enterprises migrating from on-premises systems or running mixed workloads requiring both control and scalability. |
Benefits of using cloud data warehousing for analytics
Amassing and collating all of those gigabytes upon gigabytes (and, eventually, terabytes) of data isn't about storage or operations. The insights it can reveal are capable of being the foundation for strategic development that drives growth and the bottom line — and they must be unlocked with analytics tools.
Running data analytics and reporting on a data warehouse hosted via a cloud solution is quite different from completing the same tasks for an on-premises warehouse. In fact, it's arguably one of the most exciting cloud computing trends in the enterprise world right now.
Cloud and on-premises data warehouses can both support enterprise analytics, but they differ in cost model, scalability, and operational responsibility. Here’s a quick comparison across the factors teams most often evaluate.
| Decision factor | Cloud data warehouse | On-premises data warehouse |
|---|---|---|
| Costs | Operating expense model; pay for capacity/usage; watch variable spend (e.g., scaling and data movement) | Capital expense + ongoing maintenance; more fixed costs, but spend can be easier to predict |
| Scalability | Elastic scaling up/down for analytics demand; faster to adjust capacity | Scaling typically requires procurement and deployment cycles |
| Maintenance | Cloud provider manages much of the infrastructure; teams focus more on data and analytics | You manage hardware, upgrades, patching, and capacity planning |
| Security | Shared responsibility model; strong cloud controls available, but governance must be configured and monitored | Direct physical and network control; governance is fully your responsibility |
| Control | Less direct control of underlying infrastructure; more reliance on provider architecture and service limits | Highest level of infrastructure control and customization |
| Deployment speed | Faster provisioning and faster experimentation; easier to start small and expand | Longer lead time to procure, install, and scale |
With the right data analytics engine for the cloud, you can give your organization the flexibility to craft and implement algorithms as sophisticated as your circumstances require, using the programming languages you're familiar with, such as SQL, Python, SAS, and R. Scalable analytics in this context brings to bear leading-edge machine learning processes, clustering and segmentation, sentiment parsing, text extraction, graphing, and geospatial or time series analysis.
Additionally, running data warehouse analytics in the cloud allows you to integrate with numerous data management services: Amazon EBS, S3, SageMaker, Glue, and Lambda, as well as Azure Blob Storage, Data Factory, ML Studio, and PowerBI, are just a few examples.
How to select and deploy a cloud-based data warehouse
First, you must consider whether a cluster-based or serverless warehouse architecture will be right for your organization's cloud deployment.
Clustered warehouses have more predictable pricing and allow more direct oversight, but the latter advantage comes at the cost of devoting more time and resources to managing elasticity, capacity, and cluster health. By contrast, serverless models are completely overseen by your CSP and elasticity is scaled automatically, but you pay either per query or based on utilization, which can be difficult to predict.
Pricing, in fact, may be the most complicated aspect of choosing a cloud data warehouse, regardless of model. One of the chief advantages of a strong cloud platform is its elasticity, but at times when data workloads are steady, you may come upon cost inefficiencies. Additionally, it's critical to monitor any costs associated with workflows that move data out of the cloud, as well as complicated budgeting and cost controls that can quickly spin out of control.
Last but not least, initial implementation of a cloud-based warehouse may come with slower-than-expected performance and require users to change their practices to accommodate this early hiccup.
The key to making the most of a cloud data warehouse is using it alongside an agile, scalable, and flexibly priced connected multi-cloud data platform like Teradata Vantage. Vantage is compatible with complementary data tools from major cloud providers, and pricing is based solely on use. Also, the platform works seamlessly in any cloud environment or on-premises, and allows for fluid movement of data and applications back and forth from physical data infrastructure to the cloud — and even between cloud providers in a multi-cloud model — as need be.
Learn more about Vantage