Manage the Lifecycle of Your Models with MLOps

[vc_row][vc_column][vc_column_text]We have already talked about the characteristics of Machine Learning Operations or MLOps. Today it is the turn to advance in its technical characteristics to take advantage of its possibilities. Specifically, we will explain the MLOps cycles and their usefulness in an industry increasingly interested in Machine Learning models.

From the features of MLOps platforms to tasks such as model registration and implementation, we explain everything you need to know.

MLOps Evaluation: Platforms

MLOps uses three main artifact types: data, model, and code. Each has different development cycles and challenges; for example, the data cycle is typically faster than the code cycle. These differences and combinations explain the complexity of MLOps and the size of the ecosystem of tools to work with it.

In this sense, an MLOps platform abstracts the infrastructure layer running underneath that has been created to reduce operational times in building and deploying models, as well as to maintain stability and reproducibility of predictions.

So far, we can consider that there is no MLOps platform that supports end-to-end product development and has been widely accepted by the community. This leaves us with the option of using components of a platform that are useful (vendor lock-in) and/or integrating an ML workflow using different independent and more specialized frameworks (increased cost and delivery time).

Features of an MLOps platform

Automated, easier, and faster deployments.
It keeps a record of the experiments. It is possible to compare the successive results of the model development on its interface.
Ability to reproduce what other team members have done. This is incredibly challenging if you want another data scientist to use your code or if you want to run the same code at scale on another platform (e.g., in the cloud). Thus, there are different tools to cover this point.
A standard way of packaging and implementing models. Each data science team agrees on its own approach for each machine learning library it uses, and often the link between a model and the code and parameters produced is lost. This is why a standard is required, and all team members must follow the same practices.
Central repository for managing models, their versions, and scenario transitions. A data science team creates many models. In the absence of a common place to collaborate and manage the model lifecycle, data science teams face challenges in managing the stages of models – from development to production with their respective versions, annotations, and history. This entire process is automated to make it easier for teams and isolate errors.
Monitoring. After implementing models in production, evaluating their status and how they evolve in the future is necessary. That is the main reason why we need something to monitor these behaviors.

In the following sections, we will review all the parts involved in using an MLOps platform to properly track, evaluate, implement, and monitor the models.

Tracking Server

The training tracking servers allow recording any parameter associated with the modeling. For example, when we want to store any type of hyperparameter used in training the model.

At other times, we can store the current metric of the model. For example, in the case of a regression model, we can calculate the RMSE (Root Mean Square), or, in the case of a classification model, we can calculate the F1-score and store it together with the model. In this way, we can retrieve this information as metadata associated with the model.

Azure ML and MLflow

Azure ML allows you to browse artefacts stored in the Model Store through the web application and programmatically with the Python API.

[vc_column][vc_single_image image=”110149″ img_size=”large” alignment=”center” onclick=”link_image”][/vc_column][/vc_row]

Tracking experiments with the Azure ML API and UI. (Click to expand the image).

On the other hand, MLflow allows access to stored artefacts, parameters, and metrics using the web portal and the Python API. In both cases, filters can be set for experiments, labels, metrics, and model parameters.

[vc_row][vc_column][vc_single_image image=”109083″ img_size=”large” alignment=”center” onclick=”link_image”][/vc_column][/vc_row]

Follow-up of MLflow experiments

Depending on the framework used to track your experiments, there is a simple way to do it: automatic logging (autolog). It consists of automatically logging all the parameters, configuration, environment… of a running experiment. For example, you could use Azure ML and the MLflow tracking API to track model parameters and artefacts when training the model automatically.

[vc_row][vc_column][vc_single_image image=”110159″ img_size=”large” alignment=”center” onclick=”link_image”][/vc_column][/vc_row]

Automatic MLflow registration

MLflow defines its own interface called pyfunc for custom models outside common libraries, which establishes a common way to save, restore and interact with a machine learning model (along with the rest of its dependencies: code, context, data).

Finally, MLflow also offers automatic support for the metrics most commonly used by data scientists, although these can be extended to include model-specific metrics. In all cases, these automatic metrics have the same consideration as the rest of the metrics, so it is possible to search and analyze experiments using them.

[vc_row][vc_column][vc_single_image image=”109089″ img_size=”large” alignment=”center” onclick=”link_image”][/vc_column][/vc_row]

Example of metrics collected

Model registration and reproducibility

The development of machine learning models can be considered an iterative process, where keeping track of the work as development progresses is challenging. Among the possible changes that are introduced, we find:

Datasets and their preparation are constantly changing as the model is developed and checked which features are the most important, adding composite features, removing correlations, etc.
Models can change. For example, fine-tuning of model parameters can be done using different tools such as Hyperopt.
The source code evolves over time due to refactorings, bug fixes, etc.

In this context, the Model Store is built, a component that manages and stores the different associated elements of the model training and maintains the traceability of the model. The stored information can be used for purposes such as:

Create runtime environments, such as the one initially used to train the model (in case it is necessary to validate the training procedure by business requirements).
Analyze the training data and calculate the best model parameters.

A convention used in several machine learning platforms is to have a container called an experiment, which represents the main unit of organization of the information stored in the tracking server. Each experiment contains several training runs, so all runs belong to a single experiment. Each run records the data sources, code, and metadata used. The most allowed metadata are:

Source code.
Start date/end date of the training process.
Parameters.
Metrics. Metrics are commonly shown in visualizations.
Tags. Unlike other metadata, this information can be modified after the run.
Artefacts: also runs store files of any type, so that the trained model is included as part of the artefacts (serialized as pkl or any compatible format). It is also common to find data files (e.g., in Parquet format), images (e.g., a visualization of the importance of features using SHAP), or source code snapshots.

Model implementation

Once the model we are experimenting with is ready and we have finished the exploration stage, it must be taken into production. Remember that any model that is not in production is not given any return on investment.

MLflow y Azure Machine Learning

After registering our model, we have to decide how we are going to implement it. In this case, there are several ways to implement a model in production, e.g., using MLflow and Azure Machine Learning:

PySpark UDF: thanks to the different functions of MLflow, the model can be exported to a PySpark UDF function for direct inference through Databricks.
Azure Machine Learning Workspace (AzML): On the other hand, if you want to produce a web service, the best option is to export the model using AzML functions. In this case, you could.
- Register the model to be deployed in the workspace.
- Generate a Docker image.
- Deploy a web service via Azure Container Instance (to test the model on a web service) or directly to the production AKS, where the necessary deployment configuration of the Kubernetes pod will be specified.

This would cover the implementation part of the model, but there is still a need to automate this process. In the case of CI/CD in code, it is clear that when code is pushed, it must pass some testing and validation to be error-free in production and work correctly. However, the cycle of an ML model is more complex.

The lifecycle of a model is composed of the following:

Preparation of the available data to be consumed.
Exploratory data analysis to understand that the data answers the business questions.
Once the data has been validated, the data is transformed to obtain the necessary characteristics in the feature engineering.
With this data, we move on to model training, where different approaches will be followed, and the most appropriate one will be selected.
Once the best model has been chosen, it will be validated before deployment.
The model is deployed in different environments.
Finally, we will monitor the model’s performance so that we do not have future problems and can analyze any degradation.

As can be seen, although it is more complex than the code lifecycle, the DevOps cycle has to be adapted for them. In this case, we can use Azure pipelines within Azure DevOps, but other tools also solve the problem. An example of a pipeline that would be built is as follows:

A push of the model code is done.
On the other hand, the necessary training configuration and validation sets required to evaluate the model’s performance are added.
Once configured, the model is trained and validated with the provided configuration, with all the necessary validations added.
Once these validations are passed, Docker needs to be built.
Docker is stored in Azure Container Registry.
Finally, the Docker image is deployed to AKS to start the web service.

All these tasks can be configured, or different pipelines can be set up depending on the piece of code in production or the machine learning model. In addition, this allows for a lot of versatility in automating the specific pipeline that best suits the needs of the service.

Therefore, pipelines could take the trained model, perform validations and implement it afterward. In conclusion, the joint use of MLflow, Azure Machine Learning Workspace, and Azure DevOps is proposed to integrate and continuously deploy different machine learning models.

Model Monitoring

Deploying your model in production is not the end; it is just the beginning of positively impacting your business. However, you would need to monitor what you have already developed and ensure that your model is getting its answers right. For all these reasons, you need to monitor your model.

Both input data and model predictions are monitored to analyze statistical properties (data drift, model performance, etc.) and computational performance (errors, flow, etc.). These metrics can be published in dashboards or arrive via alerts. Specifically, we can divide monitoring into four parts:

Data ingestion: the data performance is stored in this initial step.
The output accuracy of the model and the data drift is checked. This is done by analyzing the input data and predictions to check the model’s state in the respective environment. In this phase, infrastructure performance information can also be stored to shorten the model response.
All these data are published in different dashboards for easy access. However, they can also be used to trigger alarms or other processes, such as, for example, model retraining.

For example, if you are using Azure, you can enable Azure Applications Insights to understand the status of your own service. With this service enabled, real-time dashboards are defined for the development team to understand the system’s status and resolve issues that negatively affect application performance.

Access to the traces of the deployed service is also available, and by deploying the service through AzML, data collection can be enabled for the input data. This allows for increased input data for models and replication errors if the model does not respond correctly.

[vc_row][vc_column][vc_single_image image=”110178″ img_size=”large” alignment=”center” onclick=”link_image”][/vc_column][/vc_row]

An example of code tracing using Applications Insights

As in many cases, the production services consist of an AKS, and the status of these services can be queried through the dashboard provided by Kubernetes. Knowing the services’ status and the different pods’ health (see figure below) is essential.

[vc_row][vc_column][vc_single_image image=”110184″ img_size=”large” alignment=”center” onclick=”link_image”][/vc_column][/vc_row]

We Help Your Develop Your MLOps Cycle Strategy

At Plain Concepts we have teams specialized in the development of machine learning strategies to automate processes or take advantage of the combined potential of data and Artificial Intelligence.

Machine learning allows you to create personalized offers or products for your customers and perform certain mechanical tasks in less time with the same efficiency. In this way, you and your team are more satisfied and can focus on other tasks.

We work with you to give your company a new direction through machine learning. How can we help you?

banner about plain concepts contact

Cookie	Duration	Description
__cfduid	1 year	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
__cfduid	29 days 23 hours 59 minutes	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
__cfduid	1 year	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
__cfduid	29 days 23 hours 59 minutes	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
_ga	1 year	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_ga	1 year	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_ga	1 year	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_ga	1 year	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_UA-326213-2	1 year	No description
_gat_UA-326213-2	1 year	No description
_gat_UA-326213-2	1 year	No description
_gat_UA-326213-2	1 year	No description
_gid	1 year	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
_gid	1 year	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
_gid	1 year	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
_gid	1 year	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
attributionCookie	session	No description
cookielawinfo-checkbox-analytics	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Analytics" category .
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-non-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-non-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-non-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-non-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-performance	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to store the user consent for cookies in the category "Performance".
cppro-ft	1 year	No description
cppro-ft	7 years 1 months 12 days 23 hours 59 minutes	No description
cppro-ft	7 years 1 months 12 days 23 hours 59 minutes	No description
cppro-ft	1 year	No description
cppro-ft-style	1 year	No description
cppro-ft-style	1 year	No description
cppro-ft-style	session	No description
cppro-ft-style	session	No description
cppro-ft-style-temp	23 hours 59 minutes	No description
cppro-ft-style-temp	23 hours 59 minutes	No description
cppro-ft-style-temp	23 hours 59 minutes	No description
cppro-ft-style-temp	1 year	No description
i18n	10 years	No description available.
IE-jwt	62 years 6 months 9 days 9 hours	No description
IE-LANG_CODE	62 years 6 months 9 days 9 hours	No description
IE-set_country	62 years 6 months 9 days 9 hours	No description
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
wmc	9 years 11 months 30 days 11 hours 59 minutes	No description

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.

Cookie	Duration	Description
_hjAbsoluteSessionInProgress	1 year	No description
_hjAbsoluteSessionInProgress	1 year	No description
_hjAbsoluteSessionInProgress	1 year	No description
_hjAbsoluteSessionInProgress	1 year	No description
_hjFirstSeen	29 minutes	No description
_hjFirstSeen	29 minutes	No description
_hjFirstSeen	29 minutes	No description
_hjFirstSeen	1 year	No description
_hjid	11 months 29 days 23 hours 59 minutes	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjid	11 months 29 days 23 hours 59 minutes	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjid	1 year	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjid	1 year	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInPageviewSample	1 year	No description
_hjIncludedInPageviewSample	1 year	No description
_hjIncludedInPageviewSample	1 year	No description
_hjIncludedInPageviewSample	1 year	No description
_hjSession_1776154	session	No description
_hjSessionUser_1776154	session	No description
_hjTLDTest	1 year	No description
_hjTLDTest	1 year	No description
_hjTLDTest	session	No description
_hjTLDTest	session	No description
_lfa_test_cookie_stored	past	No description

Cookie	Duration	Description
loglevel	never	No description available.
prism_90878714	1 month	No description
redirectFacebook	2 minutes	No description
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.