Skip to main content

Manage the Lifecycle of Your Models with MLOps

We have already talked about the characteristics of Machine Learning Operations or MLOps. Today it is the turn to advance in its technical characteristics to take advantage of its possibilities. Specifically, we will explain the MLOps cycles and their usefulness in an industry increasingly interested in Machine Learning models.

From the features of MLOps platforms to tasks such as model registration and implementation, we explain everything you need to know.

MLOps Evaluation: Platforms

MLOps uses three main artifact types: data, model, and code. Each has different development cycles and challenges; for example, the data cycle is typically faster than the code cycle. These differences and combinations explain the complexity of MLOps and the size of the ecosystem of tools to work with it.

In this sense, an MLOps platform abstracts the infrastructure layer running underneath that has been created to reduce operational times in building and deploying models, as well as to maintain stability and reproducibility of predictions.

So far, we can consider that there is no MLOps platform that supports end-to-end product development and has been widely accepted by the community. This leaves us with the option of using components of a platform that are useful (vendor lock-in) and/or integrating an ML workflow using different independent and more specialized frameworks (increased cost and delivery time).

Features of an MLOps platform

  • Automated, easier, and faster deployments.
  • It keeps a record of the experiments. It is possible to compare the successive results of the model development on its interface.
  • Ability to reproduce what other team members have done. This is incredibly challenging if you want another data scientist to use your code or if you want to run the same code at scale on another platform (e.g., in the cloud). Thus, there are different tools to cover this point.
  • A standard way of packaging and implementing models. Each data science team agrees on its own approach for each machine learning library it uses, and often the link between a model and the code and parameters produced is lost. This is why a standard is required, and all team members must follow the same practices.
  • Central repository for managing models, their versions, and scenario transitions. A data science team creates many models. In the absence of a common place to collaborate and manage the model lifecycle, data science teams face challenges in managing the stages of models – from development to production with their respective versions, annotations, and history. This entire process is automated to make it easier for teams and isolate errors.
  • Monitoring. After implementing models in production, evaluating their status and how they evolve in the future is necessary. That is the main reason why we need something to monitor these behaviors.

In the following sections, we will review all the parts involved in using an MLOps platform to properly track, evaluate, implement, and monitor the models.

Tracking Server

The training tracking servers allow recording any parameter associated with the modeling. For example, when we want to store any type of hyperparameter used in training the model.

At other times, we can store the current metric of the model. For example, in the case of a regression model, we can calculate the RMSE (Root Mean Square), or, in the case of a classification model, we can calculate the F1-score and store it together with the model. In this way, we can retrieve this information as metadata associated with the model.

Azure ML and MLflow

Azure ML allows you to browse artefacts stored in the Model Store through the web application and programmatically with the Python API.

Tracking experiments with the Azure ML API and UI. (Click to expand the image).

On the other hand, MLflow allows access to stored artefacts, parameters, and metrics using the web portal and the Python API. In both cases, filters can be set for experiments, labels, metrics, and model parameters.

picture about mlops, lifecycle and mlflow

Follow-up of MLflow experiments

Depending on the framework used to track your experiments, there is a simple way to do it: automatic logging (autolog). It consists of automatically logging all the parameters, configuration, environment… of a running experiment. For example, you could use Azure ML and the MLflow tracking API to track model parameters and artefacts when training the model automatically.

import mlflow

Automatic MLflow registration

MLflow defines its own interface called pyfunc for custom models outside common libraries, which establishes a common way to save, restore and interact with a machine learning model (along with the rest of its dependencies: code, context, data).

Finally, MLflow also offers automatic support for the metrics most commonly used by data scientists, although these can be extended to include model-specific metrics. In all cases, these automatic metrics have the same consideration as the rest of the metrics, so it is possible to search and analyze experiments using them.

picture about mlops, lifecycle and metrics

Example of metrics collected

Model registration and reproducibility

The development of machine learning models can be considered an iterative process, where keeping track of the work as development progresses is challenging. Among the possible changes that are introduced, we find:

  • Datasets and their preparation are constantly changing as the model is developed and checked which features are the most important, adding composite features, removing correlations, etc.
  • Models can change. For example, fine-tuning of model parameters can be done using different tools such as Hyperopt.
  • The source code evolves over time due to refactorings, bug fixes, etc.

In this context, the Model Store is built, a component that manages and stores the different associated elements of the model training and maintains the traceability of the model. The stored information can be used for purposes such as:

  • Create runtime environments, such as the one initially used to train the model (in case it is necessary to validate the training procedure by business requirements).
  • Analyze the training data and calculate the best model parameters.

A convention used in several machine learning platforms is to have a container called an experiment, which represents the main unit of organization of the information stored in the tracking server. Each experiment contains several training runs, so all runs belong to a single experiment. Each run records the data sources, code, and metadata used. The most allowed metadata are:

  • Source code.
  • Start date/end date of the training process.
  • Parameters.
  • Metrics. Metrics are commonly shown in visualizations.
  • Tags. Unlike other metadata, this information can be modified after the run.
  • Artefacts: also runs store files of any type, so that the trained model is included as part of the artefacts (serialized as pkl or any compatible format). It is also common to find data files (e.g., in Parquet format), images (e.g., a visualization of the importance of features using SHAP), or source code snapshots.

Model implementation

Once the model we are experimenting with is ready and we have finished the exploration stage, it must be taken into production. Remember that any model that is not in production is not given any return on investment.

MLflow y Azure Machine Learning

After registering our model, we have to decide how we are going to implement it. In this case, there are several ways to implement a model in production, e.g., using MLflow and Azure Machine Learning:

  • PySpark UDF: thanks to the different functions of MLflow, the model can be exported to a PySpark UDF function for direct inference through Databricks.
  • Azure Machine Learning Workspace (AzML): On the other hand, if you want to produce a web service, the best option is to export the model using AzML functions. In this case, you could.
    • Register the model to be deployed in the workspace.
    • Generate a Docker image.
    • Deploy a web service via Azure Container Instance (to test the model on a web service) or directly to the production AKS, where the necessary deployment configuration of the Kubernetes pod will be specified.

This would cover the implementation part of the model, but there is still a need to automate this process. In the case of CI/CD in code, it is clear that when code is pushed, it must pass some testing and validation to be error-free in production and work correctly. However, the cycle of an ML model is more complex.

The lifecycle of a model is composed of the following:

  1. Preparation of the available data to be consumed.
  2. Exploratory data analysis to understand that the data answers the business questions.
  3. Once the data has been validated, the data is transformed to obtain the necessary characteristics in the feature engineering.
  4. With this data, we move on to model training, where different approaches will be followed, and the most appropriate one will be selected.
  5. Once the best model has been chosen, it will be validated before deployment.
  6. The model is deployed in different environments.
  7. Finally, we will monitor the model’s performance so that we do not have future problems and can analyze any degradation.

As can be seen, although it is more complex than the code lifecycle, the DevOps cycle has to be adapted for them. In this case, we can use Azure pipelines within Azure DevOps, but other tools also solve the problem. An example of a pipeline that would be built is as follows:

  • A push of the model code is done.
  • On the other hand, the necessary training configuration and validation sets required to evaluate the model’s performance are added.
  • Once configured, the model is trained and validated with the provided configuration, with all the necessary validations added.
  • Once these validations are passed, Docker needs to be built.
  • Docker is stored in Azure Container Registry.
  • Finally, the Docker image is deployed to AKS to start the web service.

All these tasks can be configured, or different pipelines can be set up depending on the piece of code in production or the machine learning model. In addition, this allows for a lot of versatility in automating the specific pipeline that best suits the needs of the service.

Therefore, pipelines could take the trained model, perform validations and implement it afterward. In conclusion, the joint use of MLflow, Azure Machine Learning Workspace, and Azure DevOps is proposed to integrate and continuously deploy different machine learning models.

Model Monitoring

Deploying your model in production is not the end; it is just the beginning of positively impacting your business. However, you would need to monitor what you have already developed and ensure that your model is getting its answers right. For all these reasons, you need to monitor your model.

Both input data and model predictions are monitored to analyze statistical properties (data drift, model performance, etc.) and computational performance (errors, flow, etc.). These metrics can be published in dashboards or arrive via alerts. Specifically, we can divide monitoring into four parts:

  • Data ingestion: the data performance is stored in this initial step.
  • The output accuracy of the model and the data drift is checked. This is done by analyzing the input data and predictions to check the model’s state in the respective environment. In this phase, infrastructure performance information can also be stored to shorten the model response.
  • All these data are published in different dashboards for easy access. However, they can also be used to trigger alarms or other processes, such as, for example, model retraining.

For example, if you are using Azure, you can enable Azure Applications Insights to understand the status of your own service. With this service enabled, real-time dashboards are defined for the development team to understand the system’s status and resolve issues that negatively affect application performance.

Access to the traces of the deployed service is also available, and by deploying the service through AzML, data collection can be enabled for the input data. This allows for increased input data for models and replication errors if the model does not respond correctly.

An example of code tracing using Applications Insights

As in many cases, the production services consist of an AKS, and the status of these services can be queried through the dashboard provided by Kubernetes. Knowing the services’ status and the different pods’ health (see figure below) is essential.

We Help Your Develop Your MLOps Cycle Strategy

At Plain Concepts we have teams specialized in the development of machine learning strategies to automate processes or take advantage of the combined potential of data and Artificial Intelligence.

Machine learning allows you to create personalized offers or products for your customers and perform certain mechanical tasks in less time with the same efficiency. In this way, you and your team are more satisfied and can focus on other tasks.

We work with you to give your company a new direction through machine learning. How can we help you?

banner about plain concepts contact


Eduardo Matallanas
Senior Director of Engineering, Head of AI