Machine Learning Operations: You Design It, You Train It, You Run It!
MLOps SIG Specification
ML in Production
Awesome production machine learning: State of MLOps Tools and Frameworks
Udemy “Deployment of ML Models”
Full Stack Deep Learning MLOps Communities
CDF Special Interest Group - MLOps MLOps Books
“Machine Learning Engineering” by Andriy Burkov, 2020
“ML Ops: Operationalizing Data Science” by David Sweenor, Steven Hillion, Dan Rope, Dev Kannabiran, Thomas Hill, Michael O’Connell
“Building Machine Learning Powered Applications” by Emmanuel Ameisen
“Building Machine Learning Pipelines” by Hannes Hapke, Catherine Nelson, 2020, O’Reilly
“Managing Data Science” by Kirill Dubovikov
“Accelerated DevOps with AI, ML & RPA: Non-Programmer’s Guide to AIOPS & MLOPS” by Stephen Fleming
“Evaluating Machine Learning Models” by Alice Zheng
Agile AI. 2020. By Carlo Appugliese, Paco Nathan, William S. Roberts. O’Reilly Media, Inc.
“Machine Learning Logistics”. 2017. By T. Dunning et al. O’Reilly Media Inc.
“Machine Learning Design Patterns” by Valliappa Lakshmanan, Sara Robinson, Michael Munn. O’Reilly 2020
“Serving Machine Learning Models: A Guide to Architecture, Stream Processing Engines, and Frameworks” by Boris Lublinsky, O’Reilly Media, Inc. 2017
“Kubeflow for Machine Learning” by Holden Karau, Trevor Grant, Ilan Filonenko, Richard Liu, Boris Lublinsky MLOps Articles
Continuous Delivery for Machine Learning (by Thoughtworks)
Linux Foundation AI Foundation
MLSpec: A project to standardize the intercomponent schemas for a multi-stage ML Pipeline.
State of Enterprise ML 2019: PDF
State of Enterprise ML 2019: Interactive
Organizing machine learning projects: project management guidelines.
Rules for ML Project (Best practices)
ML Pipeline Template
Data Science Project Structure
ML project template facilitating both research and production phases.
Machine learning requires a fundamentally different deployment approach. As organizations embrace machine learning, the need for new deployment tools and strategies grows.
Efficient ML engineering: Tools and best practices
Why is DevOps for Machine Learning so Different?
Lessons learned turning machine learning models into real products and services – O’Reilly
MLOps: Model management, deployment and monitoring with Azure Machine Learning
Guide to File Formats for Machine Learning: Columnar, Training, Inferencing, and the Feature Store
Architecting a Machine Learning Pipeline How to build scalable Machine Learning systems
Why Machine Learning Models Degrade In Production
Concept Drift and Model Decay in Machine Learning
Bringing ML to Production
A Tour of End-to-End Machine Learning Platforms
What Does it Mean to Deploy a Machine Learning Model?
Software Interfaces for Machine Learning Deployment
Batch Inference for Machine Learning Deployment
MLOps: Continuous delivery and automation pipelines in machine learning
AI meets operations
What would machine learning look like if you mixed in DevOps? Wonder no more, we lift the lid on MLOps
Forbes: The Emergence Of ML Ops
Cognilytica Report “ML Model Management and Operations 2020 (MLOps)”
Introducing Cloud AI Platform Pipelines
A Guide to Production Level Deep Learning
The 5 Components Towards Building Production-Ready Machine Learning Systems
Deep Learning in Production (references about deploying deep learning-based models in production)
Machine Learning Experiment Tracking
The Team Data Science Process (TDSP)
MLOps Solutions (Azure based)
Monitoring ML pipelines
Deployment & Explainability of Machine Learning COVID-19 Solutions at Scale with Seldon Core and Alibi
Demystifying AI Infrastructure
Monitoring Machine Learning Models in Production
Organizing machine learning projects: project management guidelines.
The Checklist for Machine Learning Projects (from Aurélien Géron,”Hands-On Machine Learning with Scikit-Learn and TensorFlow”)
Data Project Checklist by Jeremy Howard
MLOps: not as Boring as it Sounds
10 Steps to Making Machine Learning Operational. Cloudera White Paper
AI Infrastructure for Everyone: DeterminedAI
MLOps is Not Enough. The Need for an End-to-End Data Science Lifecycle Process.
Data Science Lifecycle Repository Template
Template: code and pipeline definition for a machine learning project demonstrating how to automate an end to end ML/AI workflow.
Nitpicking Machine Learning Technical Debt
The Best Tools, Libraries, Frameworks and Methodologies that Machine Learning Teams Actually Use – Things We Learned from 41 ML Startups
Software Engineering for AI/ML - An Annotated Bibliography
Intelligent System. Machine Learning in Practice
CMU 17-445/645: Software Engineering for AI-Enabled Systems (SE4AI)
Machine Learning is Requirements Engineering
Machine Learning Reproducibility Checklist
Why We Need DevOps for ML Data
Machine Learning Ops. A collection of resources on how to facilitate Machine Learning Ops with GitHub.
CI/CD for Machine Learning & AI
Data Preparation for Machine Learning (7-Day Mini-Course)
AWS Cost Optimization for ML Infrastructure - EC2 spend
Task Cheatsheet for Almost Every Machine Learning Project A checklist of tasks for building End-to-End ML projects
Web services vs. streaming for real-time machine learning endpoints
How PyTorch Lightning became the first ML framework to run continuous integration on TPUs
Best practices in data cleaning: A Complete Guide to Everything You Need to Do Before and After Collecting Your Data.
The ultimate guide to building maintainable Machine Learning pipelines using DVC
Continuous Machine Learning (CML) is CI/CD for Machine Learning Projects (DVC)
What I learned from looking at 200 machine learning tools
Big Data & AI Landscape
Deploying Machine Learning Models as Data, not Code — A better match?
“Thou shalt always scale” — 10 commandments of MLOps
Three Risks in Building Machine Learning Systems
Deploying R Models with MLflow and Docker
Blog about ML in production (by maiot.io) Back to the Machine Learning fundamentals: How to write code for Model deployment.
Part 1, Part 2, Part 3 (coming soon)
MLOps: Machine Learning as an Engineering Discipline
Itaú Unibanco: How we built a CI/CD Pipeline for machine learning with in Kubeflow online training
ML Engineering on Google Cloud Platform (hands-on labs and code samples)
Deep Reinforcement Learning in Production. The use of Reinforcement Learning to Personalize User Experience at Zynga
Feature Stores for ML
What is Data Observability?
A Practical Guide to Maintaining Machine Learning in Production
Building dashboards for operational visibility (AWS)
ML Infrastructure Tools for Production (Part 1) Production ML — The Final Stage of the Model Workflow
Continuous Machine Learning
The Agile approach in data science explained by an ML expert MLOps Papers
Studer, S., Bui, T.B., Drescher, C., Hanuschkin, A., Winkler, L., Peters, S. and Mueller, K.R., 2020. “Towards CRISP-ML (Q): A Machine Learning Process Model with Quality Assurance Methodology”. arXiv
Building a Reproducible Machine Learning Pipeline
A Systems Perspective to Reproducibility in Production Machine Learning Domain
Hidden Technical Debt in Machine Learning Systems
Scaling Machine Learning as a Service (Uber)
What’s your ML Test Score? A rubric for ML production systems
Adversarial Machine Learning Reading List
From What to How: An Initial Review of Publicly Available AI Ethics Tools, Methods and Research to Translate Principles into Practices
Workshop on MLOps Systems. 2020 Third Conference on Machine Learning and Systems (MLSys)
sensAI: Fast ConvNets Serving on Live Data via Class Parallelism. Guanhua Wang, Zhuang Liu, Siyuan Zhuang, Brandon Hsieh, Joseph Gonzalez and Ion Stoica.
Towards Automated ML Model Monitoring: Measure, Improve and Quantify Data Quality. Tammo Rukat, Dustin Lange, Sebastian Schelter and Felix Biessmann.
Towards Automating the AI Operations Lifecycle. Matthew Arnold, Jeff Boston, Michael Desmond, Evelyn Duesterwald, Benjamin Elder, Anupama Murthi, Jiri Navratil and Darrell Reimer.
Efficient Scheduling of DNN Training on Multitenant Clusters. Deepak Narayanan, Keshav Santhanam, Amar Phanishayee and Matei Zaharia.
Towards Complaint-driven ML Workflow Debugging. Weiyuan Wu, Lampros Flokas, Eugene Wu and Jiannan Wang.
PerfGuard: Deploying ML-for-Systems without Performance Regressions. H M Sajjad Hossain, Lucas Rosenblatt, Gilbert Antonius, Irene Shaffer, Remmelt Ammerlaan, Abhishek Roy, Markus Weimer, Hiren Patel, Marc Friedman, Shi Qiao, Peter Orenberg, Soundarajan Srinivasan and Alekh Jindal.
Implicit Provenance for Machine Learning Artifacts. Alexandru A. Ormenisan, Mahmoud Ismail, Seif Haridi and Jim Dowling.
Addressing the Memory Bottleneck in AI Model-Training. David Ojika, Bhavesh Patel, G Anthony Reina, Trent Boyer, Chad Martin and Prashant Shah.
Simulating Performance of ML Systems with Offline Profiling. Hongming Huang, Peng Cheng, Hong Xu and Yongqiang Xiong.
A Viz Recommendation System: ML Lifecycle at Tableau. Kazem Jahanbakhsh, Eric Borchu, Mya Warren, Xiang-Bo Mao and Yogesh Sood.
CodeReef: an open portal for cross-platform MLOps and reproducible benchmarking. Grigori Fursin, Herve Guillou and Nicolas Essayan.
Towards split learning at scale: System design. Iker Rodríguez, Eduardo Muñagorri, Alberto Roman, Abhishek Singh, Praneeth Vepakomma and Ramesh Raskar.
MLBox: Towards Reproducible ML. Victor Bittorf, Xinyuan Huang, Peter Mattson, Debojyoti Dutta, David Aronchick, Emad Barsoum, Sarah Bird, Sergey Serebryakov, Natalia Vassilieva, Tom St. John, Grigori Fursin, Srini Bala, Sivanagaraju Yarramaneni, Alka Roy, David Kanter and Elvira Dzhuraeva.
Conversational Applications and Natural Language Understanding Services at Scale. Minh Tue Vo Thanh and Vijay Ramakrishnan.
Towards Distribution Transparency for Supervised ML With Oblivious Training Functions. Moritz Meister, Sina Sheikholeslami, Robin Andersson, Alexandru Ormenisan and Jim Dowling.
Tools for machine learning experiment management. Vlad Velici and Adam Prügel-Bennett.
MLPM: Machine Learning Package Manager. Xiaozhe Yao.
Common Problems with Creating Machine Learning Pipelines from Existing Code. Katie O’Leary, Makoto Uchida.
Overton: A Data System for Monitoring and Improving Machine-Learned Products, Apple.
Reliance on Metrics is a Fundamental Challenge for AI
“Assuring the machine learning lifecycle: Desiderata, methods, and challenges.” Ashmore, Rob, Radu Calinescu, and Colin Paterson. (2019)
“Machine learning testing: Survey, landscapes and horizons.” Zhang, Jie M., et al. IEEE Transactions on Software Engineering (2020).
“Teaching Software Engineering for AI-Enabled Systems.” Kästner, Christian, and Eunsuk Kang. arXiv (2020).
“Explainable machine learning in deployment.” Bhatt, Umang, et al. Proceedings of the Conference on Fairness, Accountability, and Transparency. 2020.
“Studying software engineering patterns for designing machine learning systems.” Washizaki, Hironori, Hiromu Uchida, Foutse Khomh, and Yann-Gaël Guéhéneuc. In 2019 10th International Workshop on Empirical Software Engineering in Practice (IWESEP)
Chen, A., Chow, A., Davidson, A., DCunha, A., Ghodsi, A., Hong, S.A., Konwinski, A., Mewald, C., Murching, S., Nykodym, T. and Ogilvie, P., 2020, June. Developments in MLflow: A System to Accelerate the Machine Learning Lifecycle. In Proceedings of the Fourth International Workshop on Data Management for End-to-End Machine Learning
Karlaš, B., Interlandi, M., Renggli, C., Wu, W., Zhang, C., Mukunthu, D., Babu, I., Edwards, J., Lauren, C., Xu, A. and Weimer, M., Building Continuous Integration Services for Machine Learning. KDD 2020
Workshop at ICML 2020: “Challenges in Deploying and Monitoring Machine Learning Systems” (Accepted Papers)
Bosch, J., Crnkovic, I. and Olsson, H.H., 2020. Engineering AI Systems: A Research Agenda. arXiv preprint arXiv. 2020
Ribeiro, M.T., Wu, T., Guestrin, C. and Singh, S., 2020. Beyond Accuracy: Behavioral Testing of NLP Models with CheckList. ACL Talks About MLOps
DeliveryConf 2020. “Continuous Delivery For Machine Learning: Patterns And Pains” by Emily Gorcenski
MLOps Conference: Talks from 2019
A CI/CD Framework for Production Machine Learning at Massive Scale (using Jenkins X and Seldon Core)
MLOps Virtual Event (Databricks)
MLOps NY conference 2019
MLOps.community YouTube Channel
MLinProduction YouTube Channel
Introducing MLflow for End-to-End Machine Learning on Databricks. Spark+AI Summit 2020. Sean Owen
MLOps Tutorial #1: Intro to Continuous Integration for ML
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams (2019)
Damian Brady - The emerging field of MLops
MLOps - Entwurf, Entwicklung, Betrieb (INNOQ Podcast in German)
Instrumentation, Observability & Monitoring of Machine Learning Models Existing ML Systems
Introducing FBLearner Flow: Facebook’s AI backbone
TFX: A TensorFlow-Based Production-Scale Machine Learning Platform
Getting started with Kubeflow Pipelines
Meet Michelangelo: Uber’s Machine Learning Platform
Meson: Workflow Orchestration for Netflix Recommendations
What are Azure Machine Learning pipelines?
Uber ATG’s Machine Learning Infrastructure for Self-Driving Vehicles
An overview of ML development platforms
Snorkel AI: Putting Data First in ML Development Machine Learning
Book, Aurélien Géron,”Hands-On Machine Learning with Scikit-Learn and TensorFlow”
Foundations of Machine Learning
Best Resources to Learn Machine Learning
Curated List of Libraries For a Faster Machine Learning Workflow
“Papers with Code” - Browse the State-of-the-Art in Machine Learning
Zhi-Hua Zhou. 2012. Ensemble Methods: Foundations and Algorithms. Chapman & Hall/CRC.
Feature Engineering for Machine Learning. Principles and Techniques for Data Scientists.By Alice Zheng, Amanda Casari
Google Research: Looking Back at 2019, and Forward to 2020 and Beyond
O’Reilly: The road to Software 2.0
Machine Learning and Data Science Applications in Industry
Curated papers, articles, and blogs on data science & machine learning in production.
Deep Learning for Anomaly Detection
Federated Learning for Mobile Keyboard Prediction
Federated Learning. Building better products with on-device data and privacy on default
Federated Learning: Collaborative Machine Learning without Centralized Training Data
Yang, Q., Liu, Y., Cheng, Y., Kang, Y., Chen, T. and Yu, H., 2019. Federated learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 13(3). Chapters 1 and 2.
Book: Molnar, Christoph. “Interpretable machine learning. A Guide for Making Black Box Models Explainable”, 2019
Book: Hutter, Frank, Lars Kotthoff, and Joaquin Vanschoren. “Automated Machine Learning”. Springer,2019.
ML resources by topic, curated by the community.
An Introduction to Machine Learning Interpretability, by Patrick Hall, Navdeep Gill, 2nd Edition. O’Reilly 2019
Examples of techniques for training interpretable machine learning (ML) models, explaining ML models, and debugging ML models for accuracy, discrimination, and security.
Paper: “Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence”, by Sebastian Raschka, Joshua Patterson, and Corey Nolet. 2020
Distill: Machine Learning Research
AtHomeWithAI: Curated Resource List by DeepMind
Awesome Data Science
Intro to probabilistic programming. A use case using Tensorflow-Probability (TFP)
Dive into Snorkel: Weak-Superversion on German Texts. inovex Blog Software Engineering
The Twelve Factors
Book “Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations”, 2018 by Nicole Forsgren et.al
Book “The DevOps Handbook” by Gene Kim, et al. 2016
State of DevOps 2019
Clean Code concepts adapted for machine learning and data science. Product Management for ML/AI
What you need to know about product management for AI. A product manager for AI does everything a traditional PM does, and much more.
Bringing an AI Product to Market. Previous articles have gone through the basics of AI product management. Here we get to the meat: how do you bring a product to market?
The People + AI Guidebook
User Needs + Defining Success
Building machine learning products: a problem well-defined is a problem half-solved.
Talk: Designing Great ML Experiences (Apple)
Machine Learning for Product Managers The Economics of ML/AI
Book: “Prediction Machines: The Simple Economics of Artificial Intelligence”
Book: “The AI Organization” by David Carmona
Book: “Succeeding with AI”. 2020. By Veljko Krunic. Manning Publications
A list of articles about AI and the economy
Gartner AI Trends 2019
Global AI Survey: AI proves its worth, but few scale impact
Getting started with AI? Start here! Everything you need to know to dive into your project
11 questions to ask before starting a successful Machine Learning project
What AI still can’t do
Demystifying AI Part 4: What is an AI Canvas and how do you use it?
A Data Science Workflow Canvas to Kickstart Your Projects
Is your AI project a nonstarter? Here’s a reality check(list) to help you avoid the pain of learning the hard way
What is THE main reason most ML projects fail?
Designing great data products. The Drivetrain Approach: A four-step process for building data products.
The New Business of AI (and How It’s Different From Traditional Software)
The idea maze for AI startups
The Enterprise AI Challenge: Common Misconceptions
Misconception 1 (of 5): Enterprise AI Is Primarily About The Technology
Misconception 2 (of 5): Automated Machine Learning Will Unlock Enterprise AI
Three Principles for Designing ML-Powered Products
A Step-by-Step Guide to Machine Learning Problem Framing
AI adoption in the enterprise 2020
How Adopting MLOps can Help Companies With ML Culture?
Weaving AI into Your Organization
What to Do When AI Fails
Introduction to Machine Learning Problem Framing
Structured Approach for Identifying AI Use Cases
Book: “Machine Learning for Business” by Doug Hudgeon, Richard Nichol, Oreilly
Why Commercial Artificial Intelligence Products Do Not Scale (FemTech)
Google Cloud’s AI Adoption Framework (White Paper)
Data Science Project Management
Book: “Competing in the Age of AI” by Marco Iansiti, Karim R. Lakhani. Harvard Business Review Press. 2020
Laszlo Sragner Newsletter
The Three Questions about AI that Startups Need to Ask. The first is: Are you sure you need AI?
Taming the Tail: Adventures in Improving AI Economics Model Governance, Ethics, Responsible AI
What are model governance and model operations? A look at the landscape of tools for building and deploying robust, production-ready machine learning models
Specialized tools for machine learning development and model governance are becoming essential. Why companies are turning to specialized machine learning tools like MLflow.
What are model governance and model operations? – O’Reilly
Book: “Practical Fairness”. 2020. By Aileen Nielsen. O’Reilly Media, Inc.
AI Fairness 360, A Step Towards Trusted AI - IBM Research
Learn how to integrate Responsible AI practices into your ML workflow using TensorFlow
ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT)
Programming Fairness in Algorithms. Understanding and combating issues of fairness in supervised learning.
Secure, privacy-preserving and federated machine learning in medical imaging
Artifical intelligence and machine learning security (by Microsoft) The references therein are useful.
Evtimov, Ivan, Weidong Cui, Ece Kamar, Emre Kiciman, Tadayoshi Kohno, and Jerry Li. “Security and Machine Learning in the Real World.” arXiv (2020).
Explainable AI (Gartner Prediction for 2023)
What We’ve Learned to Control. By Ben Recht