Summary
Machine learning is a data-hungry approach to problem solving. Unfortunately, there are a number of problems that would benefit from the automation provided by artificial intelligence capabilities that don’t come with troves of data to build from. Christopher Nguyen and his team at Aitomatic are working to address the "cold start" problem for ML by letting humans generate models by sharing their expertise through natural language. In this episode he explains how that works, the various ways that we can start to layer machine learning capabilities on top of each other, as well as the risks involved in doing so without incorporating lessons learned in the growth of the software industry.
Announcements
- Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.
- Predibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out!
- Your host is Tobias Macey and today I’m interviewing Christopher Nguyen about how to address the cold start problem for ML/AI projects
Interview
- Introduction
- How did you get involved in machine learning?
- Can you describe what the "cold start" or "small data" problem is and its impact on an organization’s ability to invest in machine learning?
- What are some examples of use cases where ML is a viable solution but there is a corresponding lack of usable data?
- How does the model design influence the data requirements to build it? (e.g. statistical model vs. deep learning, etc.)
- What are the available options for addressing a lack of data for ML?
- What are the characteristics of a given data set that make it suitable for ML use cases?
- Can you describe what you are building at Aitomatic and how it helps to address the cold start problem?
- How have the design and goals of the product changed since you first started working on it?
- What are some of the education challenges that you face when working with organizations to help them understand how to think about ML/AI investment and practical limitations? What are the most interesting, innovative, or unexpected ways that you have seen Aitomatic/H1st used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Aitomatic/H1st?
- When is a human/knowledge driven approach to ML development the wrong choice?
- What do you have planned for the future of Aitomatic?
Contact Info
Parting Question
- From your perspective, what is the biggest barrier to adoption of machine learning today?
Closing Announcements
- Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
Links
The intro and outro music is from
Hitman’s Lovesong feat. Paola Graziano by
The Freak Fandango Orchestra/
CC BY-SA 3.0