Developers and data scientists can work together using MLOps through a shared methodology to make sure that machine learning initiatives are in line with wider software delivery and, even more broadly, IT-business alignment.

William McKnight, President, McKnight Consulting Group

December 28, 2022

6 Min Read
virtual chess pieces
Chan2545 via Adobe Stock

This spectrum of data usage in the enterprise now includes machine learning (ML). Every “what if” scenario can be calculated using machine learning, which can also add variables that are impossible to think of. Additionally, it accomplishes this without necessitating the explicit and brittle definition of intricate rule engines. It can deliver even higher levels of insight from data than conventional approaches, directly to the point of need, without manual intervention, because it is based on well-established scientific principles. The benefits will be greatest for those who possess the knowledge and expertise necessary to realize its full potential.

However, the ability to operate in the responsive, agile manner that organizations are seeking to do today -- a method we can refer to as MLOps -- is necessary for ML success. It takes knowledge of how to match the potential of MLOps against the unique needs and resources of an organization for such an environment and culture to emerge. It also faces a variety of difficulties:

  • ML is still in its very early stages, and procedures are still being ironed out

  • Many ML projects operate independently of one another and the larger business

  • Massive amounts of data may be needed for ML, and access to that data must be scalable

  • The value of ML projects can be challenging to assess and manage

  • Often, senior management does not yet perceive ML as strategic

  • Work in machine learning and data science involves a lot of trial and error, so it can be challenging to estimate how long a project will take to complete

How Does MLOps Benefit ML?

DevOps practices and principles are used in MLOps. DevOps is a concept that was developed to address the needs of the agile business, or, to put it another way, to be able to deliver innovation at scale. It is based on ideas of work efficiency, continuous integration, delivery, and deployment. We must take into account both the purposes of DevOps and its evolution in order to comprehend how to deliver MLOps.

DevOps represents a cultural transition from slower, linear practices to agile approaches that introduce rapid iteration and parallelism, allowing developers to build and deploy cutting-edge software-based solutions. Its fundamental principles haven't changed much in the 10 years since it was first put into motion.

DevOps must address the following business needs in addition to agile practices:

  • Become more customer focused. Today's business success stories focus on helping customers achieve their objectives rather than on brand, product, selection, or business model.

  • Service and data integration. The ability to integrate existing and new services while adapting to changing conditions is essential for the success of DevOps.

  • Deploy automatically. To ensure constant, consistent delivery of business value, automation must be taken into account in the context of the aforementioned.

  • Manage and coordinate resources. It is essential to have a commoditized, adaptable platform because DevOps effectiveness grows along with platform efficiency.

These guidelines are used in ML delivery by MLOps. Model creation, training, and deployment are the main focuses of the machine learning process. It's a common misconception that these models are generated automatically. They are actually typically created and trained by data scientists who are familiar with the problem domain. Models are deployed into an architecture that can handle large amounts of (often streamed) data after being trained and validated, allowing insights to be gained.

So that the domain can be better understood, and the models can be improved, the development of such models can benefit from an iterative approach. Then, it requires a pipeline of tools that is highly automated, repositories to store and track models, code, and data lineage, and a target environment that can be quickly deployed into. The end result is an application that uses machine learning. MLOps, which extends DevOps to include the data and models used for ML, calls for data scientists to collaborate with developers.

Applying MLOps in Practice

So, how should this manifest in real life? Let's first think about the processes that go into creating an ML-based application. To achieve this, data scientists must collaborate with application developers and take the following actions:

  • Configure Target – Set up the compute targets on which models will be trained.

  • Prepare data Set up how data is ingested, prepared, and used

  • Train Model – Develop ML training scripts and submit them to the compute target

  • Containerize the Service – After a satisfactory run is found, register the persisted model in a model registry.

  • Validate Results – Application integration test of the service deployed on the dev/test target.

  • Deploy Model – If the model is satisfactory, deploy it into the target environment

  • Monitor Model – Monitor the deployed model to evaluate its inferencing performance and accuracy

The ML pipeline requires review and iteration because models need to be tuned, results need to be tested, and data sources and models need to be improved, as we can see from the activities involved. For instance, you might find that the insights you need are only associated with a portion of the data sample; you might find that the results have some inherent bias that needs to be corrected with more data or better algorithms; or you might find discrepancies between the training and inference data sets, which is referred to as data drift.

In consequence, for iterative pipelines to continue to deliver results, we need to support certain criteria:

  • Reproducibility: ML pipelines and steps, along with their data sources and models, libraries, and SDKs, need to be stored and maintained so that they can be repeated exactly as before, just like with software configuration management and continuous integration.

  • Reusability: The pipeline must be able to package and deliver models and code into production, both to training and target environments, in order to adhere to the principles of continuous delivery.

  • Manageability: The capacity to implement governance, linking changes to models and code to development activities (for instance, through sprints), and enabling managers to monitor development and value delivery.

  • Automation: Continuous integration and delivery, like DevOps, depend on automation to ensure quick and repeatable pipelines, especially when supplemented by governance and testing (which can otherwise create a bottleneck).

It is possible to deliver on the iterative nature of the ML model and application development with these criteria in place. As a result, data scientists can enhance CI/CD with the advantages of continuous learning (CL), developing a pipeline for building models, a workspace, and a target architecture.

Developers and data scientists can work together using MLOps through a shared methodology to make sure that ML initiatives are in line with wider software delivery and, even more broadly, IT-business alignment. Participants can adopt a test-and-learn mentality to enhance outcomes while maintaining control and ensuring long-term value delivery.

About the Author(s)

William McKnight

President, McKnight Consulting Group

William McKnight has advised many of the world's best-known organizations. His strategies form the information management plan for leading companies in various industries. He is a prolific author and a popular keynote speaker and trainer. He has performed dozens of benchmarks on leading database, data lake, streaming and data integration products. William is a leading global influencer in data warehousing and master data management and he leads McKnight Consulting Group, which has twice placed on the Inc. 5000 list. He can be reached at [email protected].

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights