Sveriges mest populära poddar

IT Visionaries

Breaking the Data Bottleneck

42 min • 7 september 2021

Each day, we’re coming into contact more and more with artificial intelligence and machine learning that is meant to make our lives better. We’ve all had some A.I. experiences that have gone really well. Perhaps, we didn’t even realize A.I. was helping us at first. On the other hand, getting help from A.I. doesn’t always work out perfectly, at least not right away. So why the inconsistency? If the human mind can take in so much complex information and make sense of it, why can’t our computers? Or can they if they have good data to learn from? Brad Porter, CTO of Scale AI, believes the key to A.I. learning efficiently is the right labeling:

“What you need is those samples to be labeled perfectly because if they're labeled ambiguously, then the model can't actually decide what exactly is signal versus noise. So one way to solve that is to throw more and more data at it. Eventually you have enough data that the algorithms learn, okay, this is the signal and all these other pieces are the noise. If you get [a] really high quality signal, though, you can learn that signal very quickly if there's not a lot of noise in it.”

Computers need lots of data to learn. More accurately, they really need lots of quality data labeled properly. Fundamentally, this just makes sense. The best way to learn something is through repeated exposure and practice. This is just as true for people as it is for computers. That’s where Brad comes in. On this episode of IT Visionaries, Brad explains how his diverse work experience, particularly his work in robotics, ultimately led him to focus on solving the problem of data labeling for A.I, which is setting us up for an exciting future. After all, if proper labeling is the key, and the key is becoming more readily available, then we can expect great things in the A.I. space. Brad discusses some of those great things, including how the tech will help us understand medical histories and its use in autonomous vehicles. Enjoy the episode!

Main Takeaways

  • Breaking the Data Bottleneck: There is a lot of data in the world for A.I. to access. The primary issue for machine learning is for the computer to be able to distinguish what information is most important so it can learn. In this way, people and computers are similar. But computers need our help to know what data is essential. 
  • Labeling Data is Key: It’s easy to get caught up in the glamorous possibilities of A.I. and how it can help us. Computers need data to learn, but they need the right data to learn effectively and efficiently. Labeling data is essential to speed up the pace in computer learning. 
  • What is Signal Vs. What is Noise: Proper labeling helps A.I. distinguish between signal as opposed to noise. A.I. doesn’t necessarily need massive amounts of data to learn if the right, properly-labeled data is being provided.
  • Quantity vs Quality: Without proper labeling, there has been a tendency to simply inundate A.I. with data so learning can happen eventually. Of course, this is inefficient and costly. Proper labeling streamlines this process. In an ideal situation for learning, there’s a tremendous amount of data that’s also all properly labeled. With large amounts of properly labeled, automated data, A.I. has a real chance to take off.

---

IT Visionaries is brought to you by the Salesforce Platform - the #1 cloud platform for digital transformation of every experience. Build connected experiences, empower every employee, and deliver continuous innovation - with the customer at the center of everything you do. Learn more at salesforce.com/platform

Förekommer på
00:00 -00:00