Elena Canorea
Communications Lead
Introduction
A well-designed data model is key to effective operational systems, as well as BI and analytics applications that deliver business value by transforming enterprise data into a useful information asset.
We look at what data modeling is, its applications and benefits, as well as tips on how to use it in your business.
Data modeling is the process of analyzing and defining the different types of data a company collects and produces, as well as the relationships between them. Whether through text, symbols and diagrams, data modeling concepts create visual representations of data as it is captured, stored and used in the enterprise.
Data models are built based on business needs, and rules and requirements are defined in advance through stakeholder feedback, with the goal of being able to be incorporated into the design of a new system or adapted in the iteration of an existing one.
By modeling data, it is possible to document what type of data we have, how we use it, and the requirements for managing it in terms of its use, protection,n and governance. Therefore, some of its advantages include:
Therefore, when we talk about data modeling, the ideal scenario of data models would be living documents that evolve along with the changing business needs.
Data modeling has evolved along with database management systems, and the complexity of the types of models has increased as the data storage needs of enterprises have grown.
Let’s review some examples of data modeling:
This type defines the overall structure of a company’s business and data. It is used to organize business concepts and is defined by stakeholders and data engineers or architects. Therefore, both the entities and their relationships are defined within the conceptual data model.
It is based on the conceptual data model, with specific attributes of the data within each entity and the relationships between those attributes.
It is the technical model of data rules and structures, defined by data engineers, architects, and business analysts, which helps to make decisions about the physical model required by the data and the business.
This is the specific implementation of the logical data model created by database administrators and developers. It is developed for a specific database tool and data warehousing technology, and has data connectors to distribute data to business system users as needed.
In fact, this is the target at which the other models have been aimed: the actual implementation of the data estate.
To better understand the most popular modeling techniques today, it is also necessary to review the first ones that were used at the beginning of databases. These are the first four that we describe below, followed by the three most used today.
In this case, the data is stored in a tree structure with parent and child records that make up a collection of data fields. A parent can have one or more children, but a child record can only have one.
In addition, this model also consists of links, which are the connections between records, and types that specify the type of data contained in the field. It was very popular in the 1960s.
This model extended the hierarchical model by allowing a child record to have one or more parents. This network technique is a precursor to a graph data structure, with a data object represented within a node and the relationship between two nodes called an edge.
This model had its heyday in the 1970s.
With this model, data is stored in tables and columns, and the relationships between their elements are identified. In addition, database management functions, such as constraints and triggers, are incorporated in this model.
It became the most popular technique in the 1980s and the entity-relational and dimensional data models, the most popular at the time, are variations of the relational model.
This model combines aspects of object-oriented programming and the relational data model.
An object represents data and its relationships in a single structure, along with attributes that specify its properties and methods that define its behavior. These can have multiple relationships with each other.
The model also consists of classes and inheritance, and emerged in the early 1990s.
It is one of the most widely adopted relational databases in enterprise applications, especially for transaction processing. It is very efficient for data capture and updating processes.
It consists of entities representing people, places, things, events, or concepts, attribute,s and relationships. In addition, it is characterized by the degree of normalization, i.e. the level of redundancy implemented.
Like the previous one, it consists of attributes and relationships, but also of facts and dimensions.
This model has been widely adopted in business intelligence and analysis applications. In fact, it is known as a star schema because it can be visualized as a fact surrounded and connected to multiple facts, although this simplifies the structure of the model.
Most of these types of models have numerous fact tables linked to various dimensions, which are called “conformal” when they are shared by more than one fact table.
It has its roots in the network modeling technique, and is mainly used to model complex relationships in graph databases. Although it can also be used for other NoSQL databases, such as key-value and document databases.
In this model there are two fundamental elements: nodes, which represent entities with a unique identity, and edges, which connect nodes and define how they relate to each other.
The Data Vault is a methodology for structuring and modeling data in a Data Warehouse in an agile and scalable way.
One of the most prominent benefits of this technique is that it allows organizations to maintain a broad view of their data while easily adapting to changes in business and technology. But that’s not all.
The Data Vault architecture consists of 3 main components:
This approach provides a scalable and change-resilient data architecture, allowing the integration of new data sources without having to redo the entire data modelling.
Unlike other data models, Data Vault modelling focuses on data traceability and auditability, allowing full change logging and the ability to track the provenance of each piece of data.
In addition, its process is iterative, so it can be refined and improved as new data is added and new needs are discovered. Therefore, its main advantages lie in its scalability, flexibility, and complete data history.
Implementing Data Vault in a data warehouse requires a strategic approach and careful planning. Hence, before we start, we must identify the business objectives and data requirements. It is very important to understand the business objectives and data requirements to ensure that the needs of the organisation are being met.
DBT Core and Data Vault 2.0 are crucial to scaling data warehousing solutions for several key reasons:
In summary, the combination of DBT Core and Data Vault 2.0 provides a robust framework for creating scalable, flexible, and efficient data storage solutions that can adapt to evolving business requirements and ensure data integrity and regulatory compliance.
By maintaining an up-to-date data platform, you ensure that your organisation remains competitive, secure, and able to take advantage of the latest technological advances to drive business success.
Data modelling is a major business challenge. At Plain Concepts, we have extensive experience in combining Data Vault 2.0 methodology with tools such as DBT Core on data platforms such as Databricks and Snowflake.
Specifically with Snowflake, we have solutions running on Container Instance orchestrated by Ariflow, solutions with much more demanding scaling needs with Argo and Kubernetes, and finally, we are exploring the capabilities of Snowpark Containers, which allows us to simplify the final architecture, and we will tell you soon in another article. Do not wait any longer and start making the most of your data!
Elena Canorea
Communications Lead
Cookie | Duration | Description |
---|---|---|
__cfduid | 1 year | The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information. |
__cfduid | 29 days 23 hours 59 minutes | The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information. |
__cfduid | 1 year | The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information. |
__cfduid | 29 days 23 hours 59 minutes | The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information. |
_ga | 1 year | This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors. |
_ga | 1 year | This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors. |
_ga | 1 year | This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors. |
_ga | 1 year | This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors. |
_gat_UA-326213-2 | 1 year | No description |
_gat_UA-326213-2 | 1 year | No description |
_gat_UA-326213-2 | 1 year | No description |
_gat_UA-326213-2 | 1 year | No description |
_gid | 1 year | This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form. |
_gid | 1 year | This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form. |
_gid | 1 year | This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form. |
_gid | 1 year | This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form. |
attributionCookie | session | No description |
cookielawinfo-checkbox-analytics | 1 year | Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Analytics" category . |
cookielawinfo-checkbox-necessary | 1 year | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-necessary | 1 year | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-non-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary". |
cookielawinfo-checkbox-non-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary". |
cookielawinfo-checkbox-non-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary". |
cookielawinfo-checkbox-non-necessary | 1 year | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary". |
cookielawinfo-checkbox-performance | 1 year | Set by the GDPR Cookie Consent plugin, this cookie is used to store the user consent for cookies in the category "Performance". |
cppro-ft | 1 year | No description |
cppro-ft | 7 years 1 months 12 days 23 hours 59 minutes | No description |
cppro-ft | 7 years 1 months 12 days 23 hours 59 minutes | No description |
cppro-ft | 1 year | No description |
cppro-ft-style | 1 year | No description |
cppro-ft-style | 1 year | No description |
cppro-ft-style | session | No description |
cppro-ft-style | session | No description |
cppro-ft-style-temp | 23 hours 59 minutes | No description |
cppro-ft-style-temp | 23 hours 59 minutes | No description |
cppro-ft-style-temp | 23 hours 59 minutes | No description |
cppro-ft-style-temp | 1 year | No description |
i18n | 10 years | No description available. |
IE-jwt | 62 years 6 months 9 days 9 hours | No description |
IE-LANG_CODE | 62 years 6 months 9 days 9 hours | No description |
IE-set_country | 62 years 6 months 9 days 9 hours | No description |
JSESSIONID | session | The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application. |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
viewed_cookie_policy | 1 year | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
viewed_cookie_policy | 1 year | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
VISITOR_INFO1_LIVE | 5 months 27 days | A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface. |
wmc | 9 years 11 months 30 days 11 hours 59 minutes | No description |
Cookie | Duration | Description |
---|---|---|
__cf_bm | 30 minutes | This cookie, set by Cloudflare, is used to support Cloudflare Bot Management. |
sp_landing | 1 day | The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content. |
sp_t | 1 year | The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content. |
Cookie | Duration | Description |
---|---|---|
_hjAbsoluteSessionInProgress | 1 year | No description |
_hjAbsoluteSessionInProgress | 1 year | No description |
_hjAbsoluteSessionInProgress | 1 year | No description |
_hjAbsoluteSessionInProgress | 1 year | No description |
_hjFirstSeen | 29 minutes | No description |
_hjFirstSeen | 29 minutes | No description |
_hjFirstSeen | 29 minutes | No description |
_hjFirstSeen | 1 year | No description |
_hjid | 11 months 29 days 23 hours 59 minutes | This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID. |
_hjid | 11 months 29 days 23 hours 59 minutes | This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID. |
_hjid | 1 year | This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID. |
_hjid | 1 year | This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID. |
_hjIncludedInPageviewSample | 1 year | No description |
_hjIncludedInPageviewSample | 1 year | No description |
_hjIncludedInPageviewSample | 1 year | No description |
_hjIncludedInPageviewSample | 1 year | No description |
_hjSession_1776154 | session | No description |
_hjSessionUser_1776154 | session | No description |
_hjTLDTest | 1 year | No description |
_hjTLDTest | 1 year | No description |
_hjTLDTest | session | No description |
_hjTLDTest | session | No description |
_lfa_test_cookie_stored | past | No description |
Cookie | Duration | Description |
---|---|---|
loglevel | never | No description available. |
prism_90878714 | 1 month | No description |
redirectFacebook | 2 minutes | No description |
YSC | session | YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages. |
yt-remote-connected-devices | never | YouTube sets this cookie to store the video preferences of the user using embedded YouTube video. |
yt-remote-device-id | never | YouTube sets this cookie to store the video preferences of the user using embedded YouTube video. |
yt.innertube::nextId | never | This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen. |
yt.innertube::requests | never | This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen. |