Externally indexed torrent
If you are the original uploader, contact staff to have it moved to your account
Textbook in PDF format
What Is Data Mesh? Data mesh is a concept that ensures data access, governance, federation, and interoperability across distributed teams and systems. It is a new approach for designing modern data architectures, based on four principles.
The software industry has always been susceptible to trends. Some stick, some pass, but all vie for the limelight in one way or another. Agile, SOA, cloud, data lakes, microservices, DevOps, and event streaming have all had a fundamental effect on the software we build today.
Data mesh may well be the next innovation we can add to this list. I see it as a kind of microservices for data architecture. Part technology and part practice, this socio-technical theory aims to let large, interconnected organizations avoid putting all their data into one single place: a pattern that can lead to paralysis. In a data mesh, different applications, pipelines, databases, storage layers, etc., are instead connected through self-service data products, creating a network, or “mesh,” of data that has no central point where teams are forced to wait in line or step on each other’s toes. Such problems plague the monolithic application and the monolithic data warehouse alike.
Data mesh is a set of social and technological principles for designing modern data architectures. Data mesh makes data a first-class citizen by treating data sources as products, a key component for an organization’s success. The data in a data mesh is easily accessible, interconnected across an entire business, and provides its users with the means to discover, access, and consume it reliably.
While the data mesh concept is relatively new, the problems it proposes to solve are not. This book covers the historical issues of data access, including why these issues remain relevant to this day. It examines how data mesh architecture can solve these historical problems and how event streams play into this modern data stack. In addition, it explores your options for building and designing data products served by event streams, and the necessary decisions you’ll need to make for building your self-service tooling.
Data, as a discipline, is often treated as a separate domain from engineering. A data team composed of data engineers, data scientists, and data analysts extracts data from engineering systems and does “something useful” with it for the business. “Something useful” typically includes answering analytical questions, building reports, and structuring data from disparate systems into queryable form. An example of this might include correlating sales with various patterns of user behavior observed on a website. Making real-time product suggestions based on pages a user recently browsed would be a more contemporary example.
Domain ownership: Responsibility over modeling and providing important data is distributed to the people closest to it, providing access to the exact data they need, when they need it.
Data as a product: Data is treated as a product like any other, complete with a data product owner, consumer consultations, release cycles, and quality and service-level agreements.
Self-service: Empower consumers to independently search, discover, and consume data products. Data product owners are provided standardized tools for populating and publishing their data product.
Federated governance: This is embodied by a cross-organization team that provides global standards for the formats, modes, and requirements of publishing and using data products. This team must maintain the delicate balance between centralized standards for compatibility and decentralized autonomy for true domain ownership