Data Lake Solutions from Teradata
Learn more
Data lakes meet the need to economically harness and derive value from exploding data volumes. This “dark” data from new sources—web, mobile, connected devices—was often discarded in the past, but it contains valuable insight. Massive volumes, plus new forms of analytics, demand a new way to manage and derive value from data.
A data lake is a collection of long-term data containers that capture, refine, and explore any form of raw data at scale. It is enabled by low-cost technologies that multiple downstream facilities can draw upon, including data marts, data warehouses, and recommendation engines.
Prior to the big data trend, data integration normalized information in some sort of persistence – such as a database – and that created the value. This alone is no longer enough to manage all data in the enterprise and attempting to structure it all undermines the value. That’s why dark data is rarely captured in a database, but data scientists often dig through dark data to find a few facts worth repeating.
On the surface, data lakes appear straightforward—offering a way to manage and exploit massive volumes of structured and unstructured data. But, they are not as simple as they seem, and failed data lake projects are not uncommon across many types of industries and organizations. Early data lake projects faced challenges because best practices had yet to emerge. Now a lack of solid design is the primary reason data lakes don’t deliver their full value.
Data silo and cluster proliferation: There is a notion that data lakes have a low barrier to entry and can be done makeshift in the cloud. This leads to redundant data and inconsistency with no two data lakes reconciling, as well as synchronization problems.
Conflicting objectives for data access: There is a balancing act between determining how strict security measures should be versus agile access. Plans and procedures need to be in place that align all stakeholders.
Limited commercial-off-the-shelf tools: Many vendors claim to connect to Hadoop or cloud object stores, but the offerings lack deep integration and most of these products were built for data warehouses, not data lakes.
Lack of end user adoption: Users have the perception—right or wrong—that it’s too complicated to get answers from data lakes because it requires premium coding skills, or they can’t find the needles they need within the data haystacks.