Businesses have built a disconnected web of data and analytics technologies. These technologies gave users and teams more control over their own compute and storage at the outset, but as more technologies were added it created massive tech debt and management problems for the business. Companies are now looking to modernize their environment to untangle their data and compute mess.
Necessary Solutions Create a Mess
The problem originated with the on-prem Data Warehouse. The Data Warehouse design pattern has a tight linkage between storage and compute in a single appliance, where various data sources are fed to the Data Warehouse for computing needs. Data sources are integrated to the data warehouse so that once data is stored it can be used many times by different departments and organizations. This created a single source of truth which made management easy, but it also created limitations to the business, like users waiting in line to have IT load their data or get work done in their analytic environment.
Data Marts were introduced to provide flexibility to the business, with each team able to have their own independent environment. While this design pattern alleviated resource contention between business units, it created bigger problems. Multiple data pipelines and data duplication were introduced, with one pipeline going the Data Warehouse and one pipeline going to the Data Mart. Furthermore, multiple technologies might be used, which made managing SLAs, cost, administration, and security exceedingly difficult and time consuming. The data and compute mess was growing.
Then Data Lakes were introduced to address the exponential increase in data being generated during the Big Data movement. The cloud enabled Data Lake’s to store massive volumes of data at a low cost so businesses could continue to invest in Big Data. This added another layer of complexity to the mess, with multiple data formats being supported in larger volumes. Data was duplicated again, and this time without the compute performance needed to run workloads in a timely manner. Next thing you know, there were a plethora of cloud-enabled technologies in addition to Data Lakes, like ETL, data mining, and data science tools. Businesses were investing in these new cloud-enabled technologies, stacking them on top of the Data Warehouse, with no effective way to manage their growing data and compute mess.
As you can imagine, the data and compute mess became unmanageable. Each limb of the business had their own data scattered across various design patterns, technologies, and pipelines. And although these decisions may have been right at the time, they produced huge negative impacts on the business, such as limited access, bottlenecks, poor utilization, difficult management, and worst of all, climbing costs. At a certain point, the business value of using an analytics and data platform could plateau. So how does the business untangle their data and compute mess?
Taking a Lake-Centric Modernization Approach
The answer is to take a lake-centric modernization approach to consolidate the data, compute, and technologies into one wholistic, cloud environment. Teradata enables this with VantageCloud Lake, which is the Vantage analytics and data platform on a fully cloud-native architecture. Having a cloud-native architecture gives the critical ability to separate compute and storage. By replacing the physical hardware of an on-prem environment with cloud-provided compute and storage, the business can tactically grow and shrink each resource independently and in an optimized fashion. There are two key components to how this enables the business to untangle their data and compute mess.
On the data storage side, the first component is that every design pattern is consolidated into a centralized, low-cost Object Store. VantageCloud Lake supports the querying of open data formats in the cloud-provided Object Store, as well as database file formats stored in the Lake Object File System (OFS), which provides optimized performance for the Teradata database. On the compute side, the second component is that multiple Compute Clusters can be used to access and query data inside the centralized Object Store. This eliminates the need for physical versions of different data marts, as the business can simply use multiple Compute Clusters running on the same instance.
So how does this untangle your data and compute mess? Well, you now have a low-cost, centralized Object Store for data storage, and multiple, independent clusters for compute. And because you are running on a cloud-native architecture, you are no longer restricted to the on-prem design paradigm. This allows you to migrate data warehouses, marts, and lakes into one consolidated cloud environment to clean up the design pattern clutter. This allows you to work off a single copy of data in the Object Store to clean up the data pipelines, duplication, and movement. This allows you to provision and manage compute resources through Compute Clusters to clean up unnecessary mart spends and performance inefficiencies. And this is how your business can have an autonomous and agile environment, with VantageCloud Lake’s self-service model that eliminates bottlenecks created by the need for IT.
VantageCloud Lake Does More than Clean up the Mess
VantageCloud Lake enables your business to take a Lake-centric modernization path. It’s Console is a self-service browser with centralized navigation that contains an arsenal of features and telemetry that users and administrators will commonly interact with. The Primary Cluster leverages high-performance block storage and is where query requests are processed, parsed, optimized, and distributed to Compute Clusters. The Compute Clusters leverage the Object Store and are used to execute queries from different teams or use cases, such as Marketing, ETL, or Business Intelligence. QueryGrid is Teradata’s Query Fabric that allows VantageCloud Lake to connect to data stores and processing engines for a hybrid, multi-cloud ecosystem. And then open data lives in the Object Store, while higher-performance database files live in Lake’s OFS.
With VantageCloud Lake you can innovate faster, scale smarter, and govern better. You can innovate faster with the ability to empower users with a multi-cluster architecture that delivers autonomy for innovation. And with ClearScape Analytics, Teradata’s data analytics engine included with Lake, you can capture greater ROI and faster time-to-value with industry-leading AI/ML capabilities. Lake provides the ability to scale smarter. Our “smart” scaling technology offers dynamic resource allocation with extreme elasticity, proving to be a more cost-effective and an environmentally sustainable platform enabling you to scale along with your business needs. Lake also empowers the ability to govern better. Your organization can maintain oversight through centralized management of data and effective financial governance of cloud resources. This helps manage sprawl, eliminates overruns, and enables user flexibility without waiting for or impacting core operations.
To learn more about VantageCloud Lake, click here.
20 years in a row: Recognized as
Stay in the know
Subscribe to Teradata’s blog to get weekly insights delivered to you