Data Modeling: Keys for you to learn their techniques and conceptsPlain Concepts

Introduction

A well-designed data model is key to effective operational systems, as well as BI and analytics applications that deliver business value by transforming enterprise data into a useful information asset.

We look at what data modeling is, its applications and benefits, as well as tips on how to use it in your business.

What is Data Modeling?

Data modeling is the process of analyzing and defining the different types of data a company collects and produces, as well as the relationships between them. Whether through text, symbols and diagrams, data modeling concepts create visual representations of data as it is captured, stored and used in the enterprise.

Data models are built based on business needs, and rules and requirements are defined in advance through stakeholder feedback, with the goal of being able to be incorporated into the design of a new system or adapted in the iteration of an existing one.

By modeling data, it is possible to document what type of data we have, how we use it, and the requirements for managing it in terms of its use, protection,n and governance. Therefore, some of its advantages include:

Creating a collaborative structure between your IT and business teams.
Revealing opportunities to improve business processes by defining data needs and uses.
Saving time and money on IT and process investments through proper planning.
Reduce errors (and error-prone redundant data entry) while improving data integrity.
Increase the speed and performance of data retrieval and analysis through capacity planning and growth.
Establish and track key performance indicators tailored to business objectives.

Therefore, when we talk about data modeling, the ideal scenario of data models would be living documents that evolve along with the changing business needs.

Data Modeling Examples

Data modeling has evolved along with database management systems, and the complexity of the types of models has increased as the data storage needs of enterprises have grown.

Let’s review some examples of data modeling:

Conceptual Data Modeling

This type defines the overall structure of a company’s business and data. It is used to organize business concepts and is defined by stakeholders and data engineers or architects. Therefore, both the entities and their relationships are defined within the conceptual data model.

Logical Data Modeling

It is based on the conceptual data model, with specific attributes of the data within each entity and the relationships between those attributes.

It is the technical model of data rules and structures, defined by data engineers, architects, and business analysts, which helps to make decisions about the physical model required by the data and the business.

Modeling of Physical Data

This is the specific implementation of the logical data model created by database administrators and developers. It is developed for a specific database tool and data warehousing technology, and has data connectors to distribute data to business system users as needed.

In fact, this is the target at which the other models have been aimed: the actual implementation of the data estate.

Data modeling tools

To better understand the most popular modeling techniques today, it is also necessary to review the first ones that were used at the beginning of databases. These are the first four that we describe below, followed by the three most used today.

Hierarchical data model

In this case, the data is stored in a tree structure with parent and child records that make up a collection of data fields. A parent can have one or more children, but a child record can only have one.

In addition, this model also consists of links, which are the connections between records, and types that specify the type of data contained in the field. It was very popular in the 1960s.

Data network model

This model extended the hierarchical model by allowing a child record to have one or more parents. This network technique is a precursor to a graph data structure, with a data object represented within a node and the relationship between two nodes called an edge.

This model had its heyday in the 1970s.

Relational data modeling

With this model, data is stored in tables and columns, and the relationships between their elements are identified. In addition, database management functions, such as constraints and triggers, are incorporated in this model.

It became the most popular technique in the 1980s and the entity-relational and dimensional data models, the most popular at the time, are variations of the relational model.

Object-oriented data model

This model combines aspects of object-oriented programming and the relational data model.

An object represents data and its relationships in a single structure, along with attributes that specify its properties and methods that define its behavior. These can have multiple relationships with each other.

The model also consists of classes and inheritance, and emerged in the early 1990s.

Entity-relationship data model

It is one of the most widely adopted relational databases in enterprise applications, especially for transaction processing. It is very efficient for data capture and updating processes.

It consists of entities representing people, places, things, events, or concepts, attribute,s and relationships. In addition, it is characterized by the degree of normalization, i.e. the level of redundancy implemented.

Dimensional data model

Like the previous one, it consists of attributes and relationships, but also of facts and dimensions.

This model has been widely adopted in business intelligence and analysis applications. In fact, it is known as a star schema because it can be visualized as a fact surrounded and connected to multiple facts, although this simplifies the structure of the model.

Most of these types of models have numerous fact tables linked to various dimensions, which are called “conformal” when they are shared by more than one fact table.

Graphical data model

It has its roots in the network modeling technique, and is mainly used to model complex relationships in graph databases. Although it can also be used for other NoSQL databases, such as key-value and document databases.

In this model there are two fundamental elements: nodes, which represent entities with a unique identity, and edges, which connect nodes and define how they relate to each other.

Data Vault Modeling

The Data Vault is a methodology for structuring and modeling data in a Data Warehouse in an agile and scalable way.

One of the most prominent benefits of this technique is that it allows organizations to maintain a broad view of their data while easily adapting to changes in business and technology. But that’s not all.

The Data Vault architecture consists of 3 main components:

Hub: is a central table containing a single list of all business objects and their attributes.
Links: tables that are used to relate two or more hubs.
Satellite: a table containing detailed information about a particular business object at a specific point in time.

This approach provides a scalable and change-resilient data architecture, allowing the integration of new data sources without having to redo the entire data modelling.

Unlike other data models, Data Vault modelling focuses on data traceability and auditability, allowing full change logging and the ability to track the provenance of each piece of data.

In addition, its process is iterative, so it can be refined and improved as new data is added and new needs are discovered. Therefore, its main advantages lie in its scalability, flexibility, and complete data history.

Implementing Data Vault in a data warehouse requires a strategic approach and careful planning. Hence, before we start, we must identify the business objectives and data requirements. It is very important to understand the business objectives and data requirements to ensure that the needs of the organisation are being met.

How do you scale large Data Warehousing projects?

DBT Core and Data Vault 2.0 are crucial to scaling data warehousing solutions for several key reasons:

Scalability and flexibility: Data Vault 2.0 is designed to handle large-scale data warehousing projects, integrating data from multiple sources and ensuring long-term data storage. Its modular structure, consisting of hubs, links, and satellites, allows for easy expansion and adaptation to changing business needs.
Audit and compliance: Data Vault 2.0 ensures data integrity and traceability, essential for regulatory compliance. It maintains historical data, allowing you to track changes over time and audit data transformations.
Agile development: DBT Core improves the development process by enabling data transformations directly within the data warehouse. It supports version control, automated testing, and continuous integration, which streamlines the development workflow and improves collaboration between data teams.
Performance optimisation: DBT Core enables efficient data processing by harnessing the power of modern cloud data warehouses. It uses SQL and reusable macros to optimise data transformations, reducing the time and resources required for data processing.
Cost-effective: By using DBT Core, organisations can minimise the costs associated with data warehousing. Provides detailed control of data transformations, enabling cost-effective management of data warehousing and processing.

In summary, the combination of DBT Core and Data Vault 2.0 provides a robust framework for creating scalable, flexible, and efficient data storage solutions that can adapt to evolving business requirements and ensure data integrity and regulatory compliance.

How to modernise your enterprise data platform?

By maintaining an up-to-date data platform, you ensure that your organisation remains competitive, secure, and able to take advantage of the latest technological advances to drive business success.

Optimised data processing and performance: Modern data platforms leverage the latest technologies to improve data processing speed and overall performance. This ensures that large data sets can be processed efficiently, reducing latency and improving the responsiveness of data-driven applications.
Seamless data integration and interoperability: Modern platforms are designed to integrate seamlessly with a wide range of data sources and systems. This interoperability enables a unified view of data across the organisation, facilitating better decision-making and more comprehensive analysis.
Advanced analytics capabilities: By modernising your data platform, you can take advantage of advanced analytics tools, including machine learning and artificial intelligence. These capabilities enable deeper insights, predictive analytics, and more sophisticated data models, driving innovation and competitive advantage.
Enhanced security and compliance: Modern data platforms come equipped with robust security features and compliance frameworks. This ensures that your data is protected from breaches and that your organisation meets regulatory requirements, reducing risk and increasing confidence.
Cost efficiency and resource optimisation: Modernising your data platform can lead to significant cost savings. Cloud-based solutions, for example, offer flexible pricing models and the ability to scale resources up or down based on demand, optimising costs and improving resource utilisation.

Data modelling is a major business challenge. At Plain Concepts, we have extensive experience in combining Data Vault 2.0 methodology with tools such as DBT Core on data platforms such as Databricks and Snowflake.

Specifically with Snowflake, we have solutions running on Container Instance orchestrated by Ariflow, solutions with much more demanding scaling needs with Argo and Kubernetes, and finally, we are exploring the capabilities of Snowpark Containers, which allows us to simplify the final architecture, and we will tell you soon in another article. Do not wait any longer and start making the most of your data!

Cookie	Duration	Description
__cfduid	1 year	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
__cfduid	29 days 23 hours 59 minutes	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
__cfduid	1 year	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
__cfduid	29 days 23 hours 59 minutes	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
_ga	1 year	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_ga	1 year	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_ga	1 year	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_ga	1 year	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_UA-326213-2	1 year	No description
_gat_UA-326213-2	1 year	No description
_gat_UA-326213-2	1 year	No description
_gat_UA-326213-2	1 year	No description
_gid	1 year	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
_gid	1 year	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
_gid	1 year	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
_gid	1 year	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
attributionCookie	session	No description
cookielawinfo-checkbox-analytics	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Analytics" category .
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-non-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-non-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-non-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-non-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-performance	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to store the user consent for cookies in the category "Performance".
cppro-ft	1 year	No description
cppro-ft	7 years 1 months 12 days 23 hours 59 minutes	No description
cppro-ft	7 years 1 months 12 days 23 hours 59 minutes	No description
cppro-ft	1 year	No description
cppro-ft-style	1 year	No description
cppro-ft-style	1 year	No description
cppro-ft-style	session	No description
cppro-ft-style	session	No description
cppro-ft-style-temp	23 hours 59 minutes	No description
cppro-ft-style-temp	23 hours 59 minutes	No description
cppro-ft-style-temp	23 hours 59 minutes	No description
cppro-ft-style-temp	1 year	No description
i18n	10 years	No description available.
IE-jwt	62 years 6 months 9 days 9 hours	No description
IE-LANG_CODE	62 years 6 months 9 days 9 hours	No description
IE-set_country	62 years 6 months 9 days 9 hours	No description
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
wmc	9 years 11 months 30 days 11 hours 59 minutes	No description

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.

Cookie	Duration	Description
_hjAbsoluteSessionInProgress	1 year	No description
_hjAbsoluteSessionInProgress	1 year	No description
_hjAbsoluteSessionInProgress	1 year	No description
_hjAbsoluteSessionInProgress	1 year	No description
_hjFirstSeen	29 minutes	No description
_hjFirstSeen	29 minutes	No description
_hjFirstSeen	29 minutes	No description
_hjFirstSeen	1 year	No description
_hjid	11 months 29 days 23 hours 59 minutes	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjid	11 months 29 days 23 hours 59 minutes	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjid	1 year	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjid	1 year	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInPageviewSample	1 year	No description
_hjIncludedInPageviewSample	1 year	No description
_hjIncludedInPageviewSample	1 year	No description
_hjIncludedInPageviewSample	1 year	No description
_hjSession_1776154	session	No description
_hjSessionUser_1776154	session	No description
_hjTLDTest	1 year	No description
_hjTLDTest	1 year	No description
_hjTLDTest	session	No description
_hjTLDTest	session	No description
_lfa_test_cookie_stored	past	No description

Cookie	Duration	Description
loglevel	never	No description available.
prism_90878714	1 month	No description
redirectFacebook	2 minutes	No description
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.