Changing Data Access Paradigms for IoT, Edge Computing, and Data Lakes
The enormous amount of data generated from IoT devices and edge computing is driving a growing recognition that data is becoming a valuable commodity. That has led to investment in private and cloud-based data lake implementations and the hope of making that data available at scale for real-time processing, analysis, and automated intelligent decision-making using Machine Learning, AI, or other methods.
In the data lake model, data is gathered from the edge, consolidated into centralized data stores, and accessed for things like analytics and decision-making. Such monolithic centralized data lake architectures, or even collections of connected large data lakes, face challenges around scalability, performance, complexity, and cost for implementation and operations. As the volume of data increases, those challenges will grow exponentially.
What if the paradigm did not utilize a centralized data store but instead left data at the edge end-points? Now, rather than moving data, aggregating and centralizing it for analysis, the data remains in place at the edge. In this new model, data requests are routed to the relevant end-points, and the data responses from those disparate end-point sources get integrated into a unified reply.
Adopting such a model, where the data remains in place at the edge, and a data virtualization layer provides a unified interface for independent access to the data at end-points, yields better performance, lower cost, and scales with no limits. This paradigm turns the edge itself into a distributed data lake, or data mesh, with many benefits and possibilities within organizations, between companies, and across industries.
Originally published in LinkedIn Pulse.