I think that the main point of the article is that a company’s data strategy should result in discrete data products aligned with business domains. A domain-oriented team should be responsible for each data product. Data infrastructure should cover universal data-processing concerns, but should not include business logic. These characteristics contrast with a centralized data lake, where a single organization is responsible for both the infrastructure and content of the data resource.
I can't distinguish between what is described and service oriented (SOA) approach:
Discoverable
Addressable
Trustworthy and truthful
Self-describing semantics and syntax
Inter-operable and governed by global standards
Secure and governed by a global access control
A reminder that Thoughtworks was highly influential in pushing Microservices. This may be an elaborate mea-culpa ("oops, SOA was actually more sensible") without admiting 'culpa', rehashing SOA with a set of features (above) that look awfully like those highly elaborate SOA proposals with XML and all sorts of meta-data to 'couple' these "data products' (previously called Services).On the following link you can learn how Zalando implemented this concept in the real world: https://databricks.com/session_na20/data-mesh-in-practice-ho...
If any folks want to learn more about data mesh, we have a (vendor-independent) Slack to share ideas and insights. I teamed with Zhamak, the author, to launch it. It's still in early days but at 1K+ in a month so hopefully can really help people get the content they need to learn about it all. [0] https://launchpass.com/data-mesh-learning I also compiled a list of public user stories [1] https://www.reddit.com/r/datamesh/comments/m6ecuz/data_mesh_...
Just wanna say "data lakes"? Is this a real term? The buzz words are so thick, it's hard to see past the gush propaganda.
deEeEeEeeecent
TL;DR?
We're building a data mesh at Splitgraph [0]. We provide a unified interface to query and discover data products. You can query the data using the Postgres wire protocol at a single endpoint, with any of your existing tools. And you can discover it in the catalog, using a familiar GitHub-like interface. You can try this right now on the public website, where we federate access to 40k open datasets. Every dataset is addressable with a `namespace/repository:tag` format. The `tag` can refer to either the live data, in which case we forward the query upstream, or to a versioned snapshot of data (a "data image") that you build with declarative, Docker-like tooling. [1]
On the enterprise side, integrating the access and discovery layers gives a lot of advantages, especially around data governance. On the web, we give users tools to connect data sources, document them, and share/audit access to them. When a query comes through the endpoint, since we're implemented as a Postgres proxy, we can rewrite/filter/drop it in accordance with rules, or we can forward it along to the upstream data source(s) and/or join across them. If you use Splitfiles to generate versioned data, we can also provide data lineage/provenance and full reproducibility.
We've been working on this for ~3 years but are still pretty early. If anyone wants to help, we just raised a seed round and are hiring a remote team -- check my comment history for links.
[0] https://www.splitgraph.com
[1] https://www.splitgraph.com/docs/working-with-data/using-spli...