In deep learning and machine learning, having a large enough dataset is key to training a system and getting it to produce results.
So what does a ML researcher do when there just isn’t enough publicly accessible data?
Enter the MLCommons Association, a global engineering consortium with the aim of making ML better for everyone.
MLCommons recently announced the general availability of the People’s Speech Dataset, a 30,000 hour English-language conversational speech dataset, and the Multilingual Spoken Words Corpus, an audio speech dataset with over 340,000 keywords in 50 languages, to help advance ML research.
On this episode of NVIDIA’s AI Podcast, host Noah Kravitz spoke with David Kanter, founder and executive director of MLCommons, and NVIDIA senior AI developer technology engineer Daniel Galvez, about the democratization of access to speech technology and how ML Commons is helping advance the research and development of machine learning for everyone.
https://blogs.nvidia.com/blog/2022/04/13/mlcommons/