Elena Canorea
Communications Lead
Have you heard about the data mesh paradigm? Here we explore the principles underpinning this approach and how Sidra Data Platform’s architecture leverages the promised benefits that these principles can bring to organizations.
A few days ago, an interesting article was published about Data Mesh principles and logical architecture. This article builds on the idea that, while technology advances of the past decade have addressed the scale of volume of data and data processing, they have failed to address scale in other dimensions, like keeping up with the changes in the data landscape or the proliferation of sources of data and use cases. It suggests that some principles applied successfully to operational systems, like domain driven design or bounded contexts, may have been overlooked so far for big data platforms. As more data become available everywhere, it is more difficult to consume them all in one place under the control of one single platform and owner.
The article outlines an idea about data lakes and data warehouses as being centralized, monolithic and domain agnostic, thus failing to scale. The different failure modes are outlined in this article and include aspects like the inability to respond to new data sources or the disconnect between the data engineering team and the source teams. As such, the idea of a decentralized data mesh is introduced around four key principles:
Based on these principles, the data mesh objective is to create a foundation for getting value from analytical data at scale. Sidra Data Platform is a unique proposition much beyond and different to the definition of a monolith data lake. Sidra allows a full end to end governed, modular, and fully scalable flow of data.
In this post, we explore how Sidra’s key architectural principles compare to the recently discussed principles exposed in the data mesh paradigm. We explore how these principles, or at least their assumed benefits, can be achieved through Sidra Data Platform while keeping the promises on streamlined management and governance.
This principle concludes that the response to the traditional data silos of unreachable data is not solved by creating a single, centralized team who owns and curates the data from all the domains. Similar to the domain driven design principles, the distribution of responsibility on data should map to the teams closest to where the data is actually generated.
Sidra Data Platform is compatible with this principle, in the sense that the accelerators built for discovering and retrieving the data have been designed around an agnostic metadata hierarchy (Providers, Entities, Attributes), with the possibility to define different owners for each data provider. Also, Sidra supports several Data Storage Units (DSUs), where each DSU includes independent deployable components, including the storage and processing structural elements.
The actual data lake is then a collection of independent DSUs, which can help solve compliance, data location and separation constraints or just organizational requirements. The security and granular authorization model for Sidra allows integrating these capabilities with the organizational structure of the company.
On the side of business-specific data transformation and exploitation, the architectural principle of Sidra Client Applications is fully aligned with this principle as well. Sidra Client Applications are the independent sets of deployable services that transform and make use of the data stored in the DSU to serve a specific business case. The processing and persistence layer of Client Applications can be personalized to a myriad of scenarios, from BI classical analytics to experimentation sandboxes or knowledge mining tasks. Client Applications make use of their own copy of the data from the DSU (managed and synchronized transparently by the Sidra Core platform), its own code and metadata. All this emphasizes the principle of distributed data ownership and evolution.
This principle stems from the traditional high cost of discovering, trusting and using quality data. In a decentralized scenario, data as a product is presented as a way of packaging code, infra, data and metadata for each of the data domains. Handling data as a product means that there is need for domain data product owners, who are responsible for a set of success metrics around the usage of their data, as with any other product. Think about lead time for the usage of data, or data quality index, for example. The same applies to the product teams (engineers, analysts, data scientists), responsible for building and maintaining the data products.
In Sidra, we fully believe in having common infrastructure foundations, that are leveraged centrally in Sidra Core to provide common accelerators and services (preconfigured pipelines, security model, event system, management APIs, centralized monitoring, etc.). This allows having each Client Application embrace the possibility to have dedicated teams working on each domain (for example, each Client Application implementing one data domain business logic and serving one or multiple other domains), but just focus on the business-specifics. Each Client Application would represent a unit of code, infrastructure, and metadata, and at the same time benefit from centralized storage, common metadata and lineage framework, and common optimization rules (optimized storage).
This modular approach to Client Applications also makes cost attribution per domain easy. The common security authentication and authorization model of Sidra allows a decentralized, yet fully controlled management of data. This approach allows us to adapt fast to new use cases (e.g., from traditional BI to knowledge mining), including pure experimentation use cases, like Data Labs.
Additionally, Sidra Data Catalogue includes this required centralized piece of data discoverability, documentation, and lineage. Users and teams for each domain would be given permissions to access only certain DSUs or provider data if needed, thus respecting organizational boundaries around product data domains.
Building, deploying and operating a data product requires specialized skills and infrastructure. The self-serve data infrastructure as a platform principle requires tooling that aids a domain data product developer to create, maintain and run data products as an abstraction, requiring less specialized knowledge. Thus, self-service lowers the barriers of usage and innovation.
One key dimension that has been growing with Sidra since its inception is precisely its self-service capabilities. The architecture of Sidra has been going through an evolutionary approach to facilitate the aspects of installing, setting up and creating new inbound (data connectors) and outbound (Client Applications) integrations.
Current self-service capabilities of Sidra can be organized in logical sets such as:
On top of this, Sidra’s upcoming releases will represent an even bigger qualitative change, with a more streamlined and user-friendly process for creating new data connectors and Client Applications from the web interface.
Finally, the fourth principle refers to the governance model that embraces decentralization and domain self-sovereignty, while it keeps being interoperable through global standardization. This standardization allows adhering to a set of global rules, that are applied to all data products and their interfaces. This principle, of course, relies not only on architecture but on a supportive organizational structure.
Making the parallelism with Sidra would require repeating some of the already mentioned points above, especially around the Client Applications paradigm and the central transversal capabilities and services, like the centralized catalogue and metadata, security model and rest of common services (e.g., ML service platform, Management UI, API management). One special mention here to the Integration Hub, which leverages Service Bus to enable messaging between Client Applications and Sidra Core or among Client Applications. Integration Hub allows a deeper synchronization and enables further composition of business use cases.
All the covered points demonstrate that Sidra is, beyond a Data Lake, a comprehensive Data Platform that is modular and future-proof.
We hope you enjoyed this article. If you have any questions on the points above, feel free to reach out to us directly at sidra@plainconcepts.com
Elena Canorea
Communications Lead
Cookie | Duration | Description |
---|---|---|
__cfduid | 1 year | The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information. |
__cfduid | 29 days 23 hours 59 minutes | The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information. |
__cfduid | 1 year | The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information. |
__cfduid | 29 days 23 hours 59 minutes | The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information. |
_ga | 1 year | This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors. |
_ga | 1 year | This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors. |
_ga | 1 year | This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors. |
_ga | 1 year | This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors. |
_gat_UA-326213-2 | 1 year | No description |
_gat_UA-326213-2 | 1 year | No description |
_gat_UA-326213-2 | 1 year | No description |
_gat_UA-326213-2 | 1 year | No description |
_gid | 1 year | This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form. |
_gid | 1 year | This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form. |
_gid | 1 year | This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form. |
_gid | 1 year | This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form. |
attributionCookie | session | No description |
cookielawinfo-checkbox-analytics | 1 year | Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Analytics" category . |
cookielawinfo-checkbox-necessary | 1 year | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-necessary | 1 year | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-non-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary". |
cookielawinfo-checkbox-non-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary". |
cookielawinfo-checkbox-non-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary". |
cookielawinfo-checkbox-non-necessary | 1 year | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary". |
cookielawinfo-checkbox-performance | 1 year | Set by the GDPR Cookie Consent plugin, this cookie is used to store the user consent for cookies in the category "Performance". |
cppro-ft | 1 year | No description |
cppro-ft | 7 years 1 months 12 days 23 hours 59 minutes | No description |
cppro-ft | 7 years 1 months 12 days 23 hours 59 minutes | No description |
cppro-ft | 1 year | No description |
cppro-ft-style | 1 year | No description |
cppro-ft-style | 1 year | No description |
cppro-ft-style | session | No description |
cppro-ft-style | session | No description |
cppro-ft-style-temp | 23 hours 59 minutes | No description |
cppro-ft-style-temp | 23 hours 59 minutes | No description |
cppro-ft-style-temp | 23 hours 59 minutes | No description |
cppro-ft-style-temp | 1 year | No description |
i18n | 10 years | No description available. |
IE-jwt | 62 years 6 months 9 days 9 hours | No description |
IE-LANG_CODE | 62 years 6 months 9 days 9 hours | No description |
IE-set_country | 62 years 6 months 9 days 9 hours | No description |
JSESSIONID | session | The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application. |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
viewed_cookie_policy | 1 year | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
viewed_cookie_policy | 1 year | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
VISITOR_INFO1_LIVE | 5 months 27 days | A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface. |
wmc | 9 years 11 months 30 days 11 hours 59 minutes | No description |
Cookie | Duration | Description |
---|---|---|
__cf_bm | 30 minutes | This cookie, set by Cloudflare, is used to support Cloudflare Bot Management. |
sp_landing | 1 day | The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content. |
sp_t | 1 year | The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content. |
Cookie | Duration | Description |
---|---|---|
_hjAbsoluteSessionInProgress | 1 year | No description |
_hjAbsoluteSessionInProgress | 1 year | No description |
_hjAbsoluteSessionInProgress | 1 year | No description |
_hjAbsoluteSessionInProgress | 1 year | No description |
_hjFirstSeen | 29 minutes | No description |
_hjFirstSeen | 29 minutes | No description |
_hjFirstSeen | 29 minutes | No description |
_hjFirstSeen | 1 year | No description |
_hjid | 11 months 29 days 23 hours 59 minutes | This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID. |
_hjid | 11 months 29 days 23 hours 59 minutes | This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID. |
_hjid | 1 year | This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID. |
_hjid | 1 year | This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID. |
_hjIncludedInPageviewSample | 1 year | No description |
_hjIncludedInPageviewSample | 1 year | No description |
_hjIncludedInPageviewSample | 1 year | No description |
_hjIncludedInPageviewSample | 1 year | No description |
_hjSession_1776154 | session | No description |
_hjSessionUser_1776154 | session | No description |
_hjTLDTest | 1 year | No description |
_hjTLDTest | 1 year | No description |
_hjTLDTest | session | No description |
_hjTLDTest | session | No description |
_lfa_test_cookie_stored | past | No description |
Cookie | Duration | Description |
---|---|---|
loglevel | never | No description available. |
prism_90878714 | 1 month | No description |
redirectFacebook | 2 minutes | No description |
YSC | session | YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages. |
yt-remote-connected-devices | never | YouTube sets this cookie to store the video preferences of the user using embedded YouTube video. |
yt-remote-device-id | never | YouTube sets this cookie to store the video preferences of the user using embedded YouTube video. |
yt.innertube::nextId | never | This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen. |
yt.innertube::requests | never | This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen. |