Summary
Because machine learning models are constantly interacting with inputs from the real world they are subject to a wide variety of failures. The most commonly discussed error condition is concept drift, but there are numerous other ways that things can go wrong. In this episode Wojtek Kuberski explains how NannyML is designed to compare the predicted performance of your model against its actual behavior to identify silent failures and provide context to allow you to determine whether and how urgently to address them.
Announcements
- Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.
- Data powers machine learning, but poor data quality is the largest impediment to effective ML today. Galileo is a collaborative data bench for data scientists building Natural Language Processing (NLP) models to programmatically inspect, fix and track their data across the ML workflow (pre-training, post-training and post-production) – no more excel sheets or ad-hoc python scripts. Get meaningful gains in your model performance fast, dramatically reduce data labeling and procurement costs, while seeing 10x faster ML iterations. Galileo is offering listeners a free 30 day trial and a 30% discount on the product there after. This offer is available until Aug 31, so go to themachinelearningpodcast.com/galileo and request a demo today!
- Your host is Tobias Macey and today I’m interviewing Wojtek Kuberski about NannyML and the work involved in post-deployment data science
Interview
- Introduction
- How did you get involved in machine learning?
- Can you describe what NannyML is and the story behind it?
- What is "post-deployment data science"?
- How does it differ from the metrics/monitoring approach to managing the model lifecycle?
- Who is typically responsible for this work? How does NannyML augment their skills?
- What are some of your experiences with model failure that motivated you to spend your time and focus on this problem?
- What are the main contributing factors to alert fatigue for ML systems?
- What are some of the ways that a model can fail silently?
- How does NannyML detect those conditions?
- What are the remediation actions that might be necessary once an issue is detected in a model?
- Can you describe how NannyML is implemented?
- What are some of the technical and UX design problems that you have had to address?
- What are some of the ideas/assumptions that you have had to re-evaluate in the process of building NannyML?
- What additional capabilities are necessary for supporting less structured data?
- Can you describe what is involved in setting up NannyML and how it fits into an ML engineer’s workflow?
- Once a model is deployed, what additional outputs/data can/should be collected to improve the utility of NannyML and feed into analysis of the real-world operation?
- What are the most interesting, innovative, or unexpected ways that you have seen NannyML used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on NannyML?
- When is NannyML the wrong choice?
- What do you have planned for the future of NannyML?
Contact Info
Parting Question
- From your perspective, what is the biggest barrier to adoption of machine learning today?
Closing Announcements
- Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
Links
The intro and outro music is from
Hitman’s Lovesong feat. Paola Graziano by
The Freak Fandango Orchestra/
CC BY-SA 3.0