Sveriges mest populära poddar

Data Driven

Matteo Interlandi on Project Hummingbird

44 min • 27 september 2021

Hello and Welcome to Data Driven.

In this episode, Frank and Andy speak with researcher Matteo Interlandi about project Hummingbird.  

Audio file 

matteo-mixdown.mp3 

Transcript 

00:00:00 BAILey 

Hello and welcome to dated driven. 

00:00:02 BAILey 

In this episode, Frank and Andy speak with researcher Matteo Interlandi about project Hummingbird. 

00:00:09 BAILey 

Now on with the show. 

00:00:10 Frank 

Second, hello and welcome to data driven. 

00:00:21 Frank 

The podcast where we explore the emerging fields of data science, machine learning and artificial intelligence. 

00:00:27 Frank 

If you'd like to think of data as the new oil, then you can consider us. 

00:00:30 Frank 

Car Talk because we focus on where the rubber meets the virtual road and with me on this epic Rd. 

00:00:36 Frank 

We're on the information superhighway as oh is Andy Leonard. 

00:00:39 Frank 

How you doing Andy? 

00:00:40 Andy 

I'm well Frank, how are? 

00:00:41 Frank 

You I'm doing alright. We're recording this on Wednesday, September 1st, 2021 and the the. 

00:00:51 Frank 

The the remnants of Hurricane Ida are ripping through the DC area. 

00:00:57 Frank 

Uh, so if, uh, if I suddenly get dropped, that's because we probably lost power. 

00:01:03 Frank 

But I do have the backup generator, the one that the professionals installed and my. 

00:01:10 Frank 

Duct taped together a solar generator so. 

00:01:15 Frank 

I will be offline. 

00:01:17 Frank 

For a short. 

00:01:18 Frank 

Bit and hopefully come back online. 

00:01:20 Frank 

How how you doing, Eddie. 

00:01:23 Andy 

I'm doing alright Frank. Well, we are you know I'm about gosh 250 miles South of UM we didn't get near the near the effects of Hurricane Ida as you did. 

00:01:34 Andy 

We're getting a little bit of rain now. 

00:01:36 Andy 

We've had some wind. 

00:01:37 Andy 

Gusts, but it's been really mild, and if you look on the radar. 

00:01:41 Andy 

Gotta watch it into track and I I do. 

00:01:43 Andy 

I'm a weather weenie and amateur but it it just kind of went around us to the to the West and it actually started the east when it got a little north of us and aimed right for your house. 

00:01:54 Andy 

I was looking outside that's where Frank lived, right? 

00:01:56 Andy 

And look, the eye is coming right for. 

00:01:58 Andy 

Frank what's left? 

00:02:00 Frank 

Well, fortunately we're safe. 

00:02:02 Frank 

There was some kind of flooding in Rockville and the small overnight, and some folks they got up. 

00:02:09 Frank 

No one, nobody died that I'm. 

00:02:10 Frank 

Aware of so. 

00:02:11 

It it says. 

00:02:12 Frank 

You know we're not. 

00:02:13 Frank 

Custom the floods or hurricanes or tornadoes up here in DC and and we're more used to the human threats of, you know, little things like terrorism and things. 

00:02:25 Frank 

Like that, but. 

00:02:26 Andy 

Yeah yeah, you guys got a little bit more to worry about that than we do here in FarmVille, right? 

00:02:32 Andy 

But you know these days. 

00:02:33 Andy 

Who knows? 

00:02:35 Andy 

The, uh, definitely our thoughts and prayers are with the folks in in Louisiana and Mississippi. 

00:02:40 Andy 

They were hit very hard. 

00:02:42 Andy 

I've got got friends in Georgia, Western Georgia were telling me that. 

00:02:47 Andy 

They they took a beating as well and you know it just it looks horrible I. 

00:02:53 Andy 

I you know, I've I've been in a few of those places after hurricanes have hit as part of like church efforts to help clean up and stabilize and stuff like that. 

00:03:04 Andy 

It looks like I don't know. 

00:03:06 Andy 

They people describe it as like a war. 

00:03:09 Andy 

I've never been in a war so I don't know. 

00:03:10 Andy 

I've seen pictures and. 

00:03:13 Andy 

There's a lot. 

00:03:14 Andy 

It looks like a lot of stuff is blowing over, and that sort of. 

00:03:16 Andy 

Stuff, it's just. 

00:03:18 Andy 

So, and they're talking weeks and weeks before power comes back on. 

00:03:22 Frank 

That's horrible, that's. 

00:03:23 Andy 

Similar places, yeah. 

00:03:25 Frank 

That's that's. 

00:03:26 Frank 

Probably going to be do more damage from for a lot of things. 

00:03:30 Andy 

Were you worried? 

00:03:30 

But on a. 

00:03:30 Frank 

More positive note, uh, a positive note. 

00:03:31 Andy 

Yes, on a positive note. 

00:03:35 Frank 

Uh, we are. 

00:03:37 Frank 

I am super excited to have a special guest and I say super excited because he's from Microsoft. 

00:03:42 Frank 

He's a senior scientist in Jelt at Microsoft, working on scalable machine learning systems. 

00:03:50 Frank 

Before he was at Microsoft, he was a postdoc scholar at the Computer Science department at UCLA, and this he was doing a lot of interesting stuff there. 

00:04:03 Frank 

He was doing research at Qatar or Qatar. 

00:04:05 Frank 

I'm not sure how to say that exactly, but he has a PhD in computer science. 

00:04:11 Frank 

In university. 

00:04:12 Frank 

Of Modena and or? 

00:04:15 Frank 

I'm going to botch this. 

00:04:15 Frank 

Reggio Emilia. 

00:04:17 Frank 

Welcome to the show, Mateo. 

00:04:22 Frank 

Awesome, so we are really excited to have you here. 

00:04:25 Frank 

We actually booked you a whole month in advance. 

00:04:27 Frank 

I've been looking forward to this. 

00:04:29 Frank 

Yeah, because you're coming by way of some of the folks at the Mlad conference. 

00:04:35 Frank 

And for those who don't know, I'm a I've mentioned this. 

00:04:37 Frank 

Mlad stands for machine learning and data science summit. 

00:04:40 Frank 

It used to be in person I think now it's entirely virtual for the foreseeable future. 

00:04:45 Frank 

Uh, but that why I attended M lads in 2016 summer of 2016 and it was uh, it was life altering like I don't say that. 

00:04:55 Frank 

Lightly so. 

00:04:56 Frank 

So Microsoft does amazing work in the machine learning and data science space. 

00:05:02 Frank 

Very much cutting edge stuff very much I. 

00:05:06 Frank 

I wouldn't say under the radar, but Microsoft does not do a great job putting its own horn, so we're very excited for you to come on Mateo and talk about this little project that you're working on. 

00:05:17 Frank 

And what is the is it have a code name or what? 

00:05:20 Frank 

What is it called? 

00:05:22 Matteo 

Hummingbird should the code name is actually I'm in. 

00:05:26 Matteo 

Don't have any specific internal names for. 

00:05:28 Matteo 

This for this. 

00:05:28 Frank 

OK, what what is GL stand for? 

00:05:32 Frank 

That was my that was my first question. 

00:05:33 Frank 

When I saw your bio. 

00:05:35 Matteo 

Uh is for Gray system lamp and is the after Jim Gray which. 

00:05:41 

Oh, OK. 

00:05:41 Matteo 

Is putting award yeah? 

00:05:45 

OK. 

00:05:46 Matteo 

So these are the search lab after this name yeah and use within the Azure data organization. 

00:05:49 

Oh, interesting. 

00:05:53 Frank 

And uhm, So what? 

00:05:56 Frank 

What what cool stuff does Hummingbird do? 

00:06:00 Matteo 

So, Hummingbird, uh? 

00:06:03 Matteo 

Is a little bit, uh, weird project in the sense that when we started this project we didn't know if it was going to. 

00:06:10 Matteo 

To be a success or not? 

00:06:12 Matteo 

Because what we try to do basically is to uhm translate traditional machine learning models and into neural networks. 

00:06:22 Matteo 

Actually not Internet format into tensor programs such that then we can run over tensor runtime, such as pipers. 

00:06:30 Matteo 

In terms of. 

00:06:32 Matteo 

Uhm, so when we started this project actually idea was hey there is a lot of investment in general pulling into this neural network frameworks and. 

00:06:45 Matteo 

Coming from the Azure data organization, instead, we are more interested in these traditional machine learning methods such as decision trees. 

00:06:52 Matteo 

Linear models were not encoding all those boring traditional algorithms. 

00:07:00 Matteo 

And so we look at this. 

00:07:01 Matteo 

The neural network system and say hey how we can take advantage of all this technology that is built. 

00:07:05 Matteo 

Into this domain so you can run neural. 

00:07:08 Matteo 

Network over CPU. 

00:07:10 Matteo 

Over the GPU, then you can use like fancy compilers to compile to generate the transfer programs. 

00:07:16 Matteo 

All those sort of techniques and we were. 

00:07:19 Matteo 

Kind of struggling. 

00:07:20 Matteo 

To see what we could do with the with this stack and and what we come up with with is this Amber project. 

00:07:27 Matteo 

So we basically take a. 

00:07:32 Matteo 

Traditional machine learning pipelines composed right feature iser and machine learning models. 

00:07:37 Matteo 

After the day trained. 

00:07:39 Matteo 

So first you need to train it using cycle ornamental net or. 

00:07:43 Matteo 

Uhm, uhm, one of those traditional machine learning platforms and then once it is trained we basically convert it into a set of tensor operations in. 

00:07:54 Matteo 

In the current version we use basically PY torch for doing this conversion and then basically you have a pipeline model so you can do whatever you can do with Python. 

00:08:03 Matteo 

Models so you can deploy it in in it into a PY torch. 

00:08:08 Matteo 

Uhm, deployments you can run over CPU ran over the GPU or you can do the torch script if you want to get rid of all the Python dependency and just have a C++ program you can. 

00:08:19 Matteo 

Do all those all those tricks. 

00:08:22 Frank 

Interesting, does it impact accuracy precision? 

00:08:26 Frank 

Does it improve it? 

00:08:27 Frank 

Keep it the same. 

00:08:29 Matteo 

We tried to keep it the same so we are able to keep. 

00:08:33 Matteo 

It The same up to floating point numbers roundings? 

00:08:36 Matteo 

So since we use, you know we use PY torch to run these programs and not like a socket or ornamental net. 

00:08:44 Matteo 

There are some differences in how they do you know, floating point operations. 

00:08:48 Matteo 

So the. 

00:08:49 Matteo 

Accuracy is up to roundings in the Floating Points, which sometimes are actually. 

00:08:54 Matteo 

It can be quite a bit, but most of the time is really small, almost not noticeable. 

00:09:00 Frank 

Interesting, interesting, uhm. 

00:09:03 Frank 

Do you would you know. 

00:09:05 Frank 

If there was like. 

00:09:06 Frank 

A discrepancy, or you Dutch as part of testing? 

00:09:09 Matteo 

It's part of testing. 

00:09:10 Frank 

Right, all software is tested, right Andy? 

00:09:11 Matteo 

So we have we have. 

00:09:13 Frank 

Sometimes intentionally is that the email. 

00:09:15 Andy 

That's right. 

00:09:17 Frank 

And he has a saying where all softwares I I forget exactly what it is. 

00:09:21 Frank 

But what is it? 

00:09:23 Andy 

Yeah, all software is tested, some intentionally. 

00:09:27 Frank 

There you go. 

00:09:30 Frank 

Uhm, so what's the? 

00:09:33 Frank 

What's the real? 

00:09:34 Frank 

What are? 

00:09:34 Frank 

What are the advantages of of of converting kind of a traditional model over to a tensor model? 

00:09:41 Frank 

Is it? 

00:09:41 Frank 

Is it portability? 

00:09:42 Frank 

Is it speed? 

00:09:43 Frank 

You did mention that you can run it on. 

00:09:45 Frank 

You could take advantage of GPU as well as CPU. 

00:09:51 Matteo 

Yes, exactly so you most mostly is related to speed, so you can basically run your socket, learn model on GPU end to end and and this user provides you know a little bit of quite a bit of speed up we for some of our example we even saw like 2 ordinal Magneto speedups. 

00:10:11 Matteo 

For some of the models. 

00:10:13 Matteo 

And uhm, and usually we try to show that. 

00:10:18 Matteo 

If you use GPU. 

00:10:19 Matteo 

Can be much faster, but on CPU we try to be kind of as close as possible scikit learn or the base or the base or diminished model. 

00:10:27 Matteo 

Sometimes we can, sometimes we are a little bit slower. 

00:10:31 Matteo 

Uh, but we. 

00:10:32 Matteo 

We had some really interesting result. 

00:10:34 Matteo 

Like for instance, we did some experiment with some. 

00:10:39 Matteo 

Some folks at the VM and we took some extra boost model and we compiled some training accuracy boost model. 

00:10:47 Matteo 

Uh, using Hummingbird anti VM into some uh, we basically do code generation and we show that the that model that was compiled to Python was even faster than they quoted the C++ implementation that they're having next used, but those CPU and GPU. Yeah, there was kind of OK. What's going on? 

00:11:06 Matteo 

This is not. 

00:11:08 Matteo 

This was not expected. 

00:11:08 Frank 

Wait, did you say it was faster than a C++ implementation? 

00:11:11 Matteo 

Yes, I mean if she used. 

00:11:13 Matteo 

Underneath C++ even scikit learn. 

00:11:15 Matteo 

You know they use like. 

00:11:16 Matteo 

From C++ library and yeah, using TVM for doing the code generation, they are able to do like a operator fusion which you don't normally have for like these traditional models. 

00:11:28 Matteo 

So we told these tricks bigger, basically that are coming from the neural network. 

00:11:31 Matteo 

Famous we were able to get like this. 

00:11:34 Matteo 

These surprising numbers. 

00:11:36 Frank 

Interesting, so that's a real performance boost, and probably if you scale that up into the cloud that probably. 

00:11:44 Frank 

Means a lot of money saving too in terms of on cloud computing things like, I imagine a company like the size of Microsoft would be very interested in getting better results faster with less cloud compute. 

00:11:56 Frank 

You did mention an acronym, I just wanna make sure folks know. 

00:11:59 Frank 

What that is? 

00:12:00 Frank 

Tyvm what is that? 

00:12:03 Matteo 

Uh, I don't know what is exactly for, uh, some tensor maybe? 

00:12:08 Frank 

Andy looks like he knows, but he's on mute. 

00:12:10 Andy 

I don't, yeah I I don't know. 

00:12:13 Frank 

OK, I'm just curious. 

00:12:13 Andy 

I'll go look it up. 

00:12:15 Frank 

There you go. 

00:12:16 Andy 

EVM acronym. 

00:12:19 Matteo 

I think is for tensor virtual machine, but I'm. 

00:12:21 Matteo 

Not sure if this is approach. 

00:12:22 Frank 

That sounds about right. 

00:12:23 Frank 

Tector,...

Förekommer på
00:00 -00:00