Jonathan: This is Floss Weekly, episode 769, recorded Wednesday, February the 7th.
OpenCost. We spend how much?
Hey, this week Catherine joins me. We talk with Matt Ray, the Community Manager at OpenCost. That project is all about tracking where your dollars and cents go for your cloud compute costs. And it turns out it may just keep you out of trouble if part of your infrastructure gets compromised, too.
You don't want to miss it. So stay tuned.
Hey, welcome. It is time for Floss Weekly. That's the show about free, libre, and open source software. I'm your host, Jonathan Bennett. It's not just me. We have Katherine with us today. Hey, Katherine, welcome. Hey, Jonathan. Thanks. Yeah, it's good to have you again. For those that don't know, aren't aware, Catherine has quite the pedigree of Linux and open source geekiness from all the way from the Linux Journal to now a, is it open source evangelist at Intel?
Is that right?
Katherine: Yes, that is my, my, my official job that I do every day and some nights.
Matt: And some nights, yeah, that sounds about right. So
Jonathan: along this open source path, um, have
Matt: you ever done anything with the cloud?
Katherine: Gee, well, yes, funny you mentioned it. Haven't we all? I mean, you know, is there any cloud even aside?
I mean, today there are so many things that have become ubiquitous in the last 10 years, like Has anybody not had to touch Kubernetes, right? Like 10 years ago, I was like, what, what is that? But even five, it wasn't even as, as, you know, widely adopted. But anyway, yeah, yes, I have. I have touched some
Jonathan: clouds.
The saying that I love the best and it's a little bit cynical, but I still like it is the cloud is just a fancy way to talk about someone else's computers.
Katherine: Yes, and I can't remember which of the the open source people said that. I can't remember who said it Stallman? It might have been Stallman. I don't know.
Jonathan: It may have been. That seems like a sort of cynical Stallman thing to say. There's a bit
Matt: of truth. Other people's computers. Yeah, there's a
Jonathan: bit of truth to it, though. And one of the interesting things about that is when you use other people's computers, they charge you for it. Imagine
Matt: that. You have to pay money for using the
Jonathan: cloud.
Darn it. I know. Well, but that's what today's guest is all about. It's uh, Cube cost, open cost, I think, I think they go by both terms. We'll ask him here in just a second. Um, so we've got Matt Ray with us talking about this project. Is
Matt: this something that you're familiar with, Catherine?
Katherine: I am not, uh, I guess fortunately, I have not been the person in the position to be concerned with, uh, keeping costs, uh, reasonable, but I, you know, I get, I get the, I get the problem that they're solving though.
I'd like to hear more.
Jonathan: Yeah, so let's go ahead and bring him on. Matt, welcome to the show. And I've got to ask first, was there a multi million dollar cloud bill that led to all of this?
Matt: Um, definitely. There were, there were people around a lot of cloud bills, uh, in the early days. Um, so, so open cost is, uh, The name of the project, and it came from a company, uh, that my employer called KubeCost.
Uh, the, the, the two founders were on the, I guess they were on the board team, um, on the monitoring side of it, and they were watching the large volumes of compute and running internal metrics, uh, at Google. And so, uh, They were definitely around some very, very large numbers, and, um, You know, hopefully the chargeback wasn't too bad, but, uh, Yeah, uh, we've definitely seen some, some very, very large numbers.
Jonathan: There's a, uh, I was looking at this before the show started, there's a company, uh, they're the ones behind Basecamp. Uh, 37 signals and they came out, I think it was either 2022 or 2023. They came out and said, we spent 3. 2 million on the cloud this year. Uh, we're going to go back to, to, to real servers. And it was kind of a, I don't know.
I don't know if I would call it a wake up call, but it was sort of a sea change in the way everybody thought about it. So like, you can save money using the cloud. You don't have to pay for your own servers, and oh wait, that costs money, too.
Matt: Yeah. It's always gonna be money somewhere, uh, but you could definitely be more efficient with, with how you spend it.
So let's, let's talk about
Jonathan: that. What, uh, what's kind of the, the 30, 000 foot view of, of this project? project. Um, is it just a project? Is it a commercial offering? Uh, what? What's what are the pieces that go together here?
Matt: Sure. So, so open cost is a cloud native computing foundation sandbox project. So, and June of 2022, uh, KubeCost, um, worked with a bunch of other companies and volunteers and folks to write what they called the open cost specification, which was, uh, trying to standardize how, um, how to compute costs on Kubernetes.
So, you know, when you get your cloud bill, usually it says, Hey, you. Got a bunch of EC2 and you're like, well, how much does that? And then, you know, if you drill down into the numbers, you can say, well, I've got, you know, 40 MX larges. At, you know, 5 an hour. I'm just making up numbers, but, you know, it doesn't tell you, um, how much kubernetes cost.
You know, it might, there might be a management fee for kubernetes, but it doesn't say like the namespace within kubernetes cost, you know, 1 an hour. And, you know, this other one was 3 an hour and you're wasting 1 an hour. Um, and so, uh, what the specification did was it actually says, well, here's how we calculate how those things are split.
You know how you determine. Um, Shared CPU, shared memory, shared storage. How all that stuff is sorted out. And so, they, you know, hammered out a specification for that. And then, um, KubeCost open sourced, uh, OpenCost the implementation. And so, um, the OpenCost project is both the specification and the implementation.
So it's a monitoring engine, uh, that is now a CNCF project. And what you can do with that engine, Is, uh, deploy it within your Kubernetes cluster and it will pay attention to everything that Kubernetes is doing, you know, so it's recording, you know, pods and namespaces and deployments and, you know, all those primitives.
And then it looks, uh, at the on demand pricing from your cloud provider. It says, you know, Oh, you know, MX large, you know, uh, Arch 64 cost, you know, 3 an hour. Um, and it checks in periodically and it says, you know, okay, this is what the price was at that time. I checked spot instances. Um, you know, it works on Azure GCP and, and, uh, and others.
Uh, and then it just records out Prometheus. And so then later on, you can come and say, well, I want to see. Uh, how much this namespace cost, you know, from Tuesday to Thursday of last week. And so you can run those queries against Prometheus. And, you know, pretty much any Kubernetes primitive is stored in there.
Um, and, you know, we have that pricing at that time available to us. Uh, so that's what open cost does. It's, it's. It's also got a UI, uh, so we've got a pretty simple two page, uh, UI for viewing that stuff. You can run it as a Prometheus metric exporter, so, you know, you just run a headless and send those metrics to Grafana or, you know, your other, um, you know, uh, BI tool or visualization tool of choice.
Um, But, uh, yeah. And then recently we added, uh, what we call cloud cost support, which is, um, where you actually go and read the cost and usage billing reports, you know, so the, so if you're using AWS, um, or Azure GCP, they, they have a list price for, for how much everything costs, you know, but, people who spend a lot of money, um, are going to go in and make deals with their cloud providers.
You know, you have what they call reserved instances or, um, savings plans or, you know, some, some of the, the, the billing, as you hit certain thresholds, it gets cheaper. None of that's actually caught. Um, in, in, in, in open cost where we're, we're gathering the on demand prices, but we don't, what we, we don't do what they call reconciliation, um, which is go and fix those numbers based off your savings, but we recently added support for reading those bills, um, and, and adding an API and reporting over that.
So you can go in and, and dig into them, uh, in open, in open cost too. So, it's, it's a monitoring engine for cloud billing and, and Kubernetes
Jonathan: costs. Now, is there any support for, let's say, uh, S3 storage? Uh, you know, that's obviously, that's part of some people's solutions for their, all of their cloud stuff.
Is this just Kubernetes or are there other pieces like that
Matt: that plug in? Um, we are working on adding that. Um, so. So, uh, until like December, um, of last, last year, uh, it was just Kubernetes. And, um, we were working on, on adding, uh, what we were calling external asset costs. Uh, that's still under development.
Um, but with the new cloud costs, you can go and see your S3 costs. You just can't attribute them back to Kubernetes. Um, which, you know. It's actually really useful because there's not a lot of open source out there for reading the cost and usage reports. It's, it's, uh, I mean, I've seen customers with like gigabytes a day of billing data.
Um, because the, the Amazon ones, you know, it publishes out to an S3 bucket, you know, 24 hours, 48 hours, after you consume it. And then it's line item, Hey, here's how much. Compute, you know, here's how much of this m3x large you were using, you know, uh, every day each note And then, you know, how much S3 storage each bucket, how much, you know, network traffic, how much every single line item it's all in that, that cost and usage report, that thing is big.
And you know, what we recently added is you can read it and it's maintained. And it has an API so you can run queries against it, which is, is, is actually pretty new. And so, um, what we've been doing to support that is rather, you know, there are going to be people who that's exciting, you know, I mean, it doesn't sound exciting, maybe, but in the, in the world of open source spin offs, like there hasn't been anything like that.
And so we've, uh, you know, I recently, we recently added like Docker support. So you don't have to run inside of Kubernetes. You could run just the cloud cost part by itself. And so that's, that's kind of on, on the, the, the front. Yeah.
Jonathan: Now, obviously we're, we're, we're kind of focused in on Amazon because they're the, the big player here.
Not the only one, but they're the big one. Um, did you have to get buy in from Amazon to be able to make these tools work? Like, did they have to add APIs for you guys to be able to track this the way you are?
Matt: No, no, they have, I mean, probably, you know, I don't recall what year it was that, that, uh, EC2 and S3 launched, but probably within three months of them launching, people are like, I need better building.
I need to see what's inside this black box. And so, um, You know, they, they, they definitely led with how they published their billing. Uh, where, where they, they're like, fine, you can have a fire hose. 'cause your choices used to be, you get, you get a one page bill that's like, you know, you owe us, you know, 400,000 a month and you know, you just have this line item that's like 200,000 of that is EC2.
And you're like, what does this mean? Right. Or. Or you can eat the firehose. And so the firehose is an S3 bucket where they publish everything. And that, you know, people have written a lot of, you know, scripts. And, you know, fancy, you know, jQueries to read through that stuff. But, um, you know, it's, it's documented.
But it's just a lot of content. And, and, and maintaining that is hard. And so, um, that's, that's why like I, I tell people about this cloud cost because people who know, know that this is hard. You know, know that that part's tricky. And people who know Kubernetes know that tracking Kubernetes is hard. And then tying the billion all together is, is, is even, you know, it's, it's complicated.
Uh, but, and it's not just AWS, you know, so I, I would, we, we definitely support Azure and, uh, GCP. Um, Uh, Azure has, they've been very enthusiastic about open cost. Um, they have actually taken the open cost, uh, you know, the Kubernetes monitoring component and they've integrated it into their own billing. So, so now within Azure, um, it's, it's, uh, from my understanding, it's the first Azure service where you can drill down and see your own usage.
So it used to be, you know, if you went to the Azure billing dashboard, you could see, oh, I've got some Um, some compute. I've got some networks. I've got some storage, but it didn't tell you how it was being used. And now in Azure Kubernetes service, you can actually break it down into namespace. You can see which pods, which deployments, which containers are costing what inside your bill.
And so, um, you know, open cost is it's got end users. You know, people definitely use it. Just I need some. I need these metrics, but it also gets embedded into other solutions. And so, uh, Azure uses it, Grafana Cloud uses it, um, you know, KubeCost uses
Katherine: it. So, okay, a couple things. So, first, you mentioned Azure, uh, has really embraced the project.
Are they contributing to it?
Matt: Yes. Uh, so, one of the exciting milestones for us as a, uh, as a cloud native computing foundation is, um, we've just got our first non KubeCost maintainer. And, uh, it's a gentleman from, from, uh, Microsoft. So, um, you know, if you're familiar with the, the CNCF, there are different levels of projects, um, sandbox, incubation, and, and graduated.
Uh, You, you can move into incubation with all the maintainers working for one company, but you'll never, ever graduate if everybody works at one company. So, so, yeah, we're, we're, we're definitely excited to have, uh, you know, our first non, uh, Kubecost maintainer. And, you know, we're, we're working on, on getting more of those.
Um, we're, we're trying to move into incubation right now. So, uh, that's, that's our current status.
Katherine: Okay, yeah, worthy goal. Um, so, so speaking of the CNCF, that's actually something I wanted, I wanted to talk about a little bit. So, um, again, I think that the people who listen to me elsewhere, like my Intel podcast are probably sick of me harping on the massive scale of the cloud native landscape, but I do, but it's, it's massive and overwhelming and there's.
so many options at any step of the, of the development process. Right. Um, anyway, I, so I wonder, so when you come in, so I have several questions, actually, first is I I'd like to hear about the experience of entering, uh, and, you know, contributing your project to the organization and becoming. You know, in, uh, becoming a CNCF project and then, you know, and then finding your way as a project and your identity and figuring out how you fit into that landscape and how to promote yourself and how to attract contributors and all of that stuff.
How do you, how do you approach all of that?
Matt: Yeah, well, there's a lot to unpack there. Um, When, when the project, uh, wants to enter the sandbox, uh, you, you write up, you know, hey, we have this open source, we have this project that we would like that is open source, or we're going to be open sourcing. Here's what we're going to do with it.
Here's what, how we think it fits into your, uh, into your landscape. Um, the, uh, the CNCF has a, uh, a board, you know, the, the. technical oversight committee, the TOC, and they review sandbox applications. And, you know, they, they will look at it and say, like, you know, nobody, you know, nobody cares, or this looks compelling.
Um, how do you see yourself progressing? What, you know, what, you know, what, why should we accept this? What are you going to do with it? How does it fit? Um, And so, so OpenCost definitely fit a niche within the CNCF that didn't, wasn't filled. You know, there's no other FinOps projects within the CNCF yet. Um, and, you know, Kubernetes is near and dear to the CNCF, obviously.
It's, uh, the first graduated project. Um, And so, you know, we got in and immediately had to start doing a lot of cleanup. Um, OpenCost was called the KubeCost Cost Model. It was already open source. Their engine of their commercial product had been open source, but it hadn't been actively part of a community, or, you know, part of a foundation and a larger open source community.
It was, um, You know, kind of the, the, the classic, Oh yeah, we've got some open source. You're welcome to kick the tires and look around, but you know, nobody, very few people used it by itself. Um, and so. Kind of, you know, I, I joined a little bit of, I, I was at KubeCross when it was open sourced, but I wasn't the community manager yet.
And so when I, when I, you know, shortly after that, I became the community manager and started doing a lot of cleanup. Um, you know, telling people like. Changing the read me. So if something's wrong, you go to Slack, you don't go to KubeCast support. Sorry. You know, it's open source now you, you own both pieces.
Um, but yeah, so, so a lot of things like, uh, You know, internal variable renames or, you know, just cleanups in the readme and documentation to point people back to the community, um, to let people know, like, this is, this is open source. It's not a commercial offering. You know, this is, um, you know, we're as a, as a open source community, we're going to support each other.
Uh, but, you know, it's, it's divorced away from KubeCost a bit. Um, that's not to say KubeCost isn't involved. They're, you know, easily, you know, 80, 80 percent of the commits are, are from them. Um, but they're becoming They're getting used to being good open source citizens, right there. It used to be like, this is our thing.
We just commit to it. And now it's like, Oh, we have maintainers who don't work here. We have people, other people committing to this project. You know, we're, we're, you know, so now we have, you know, uh, you know, fortnightly, you know, uh, community meetings every two weeks. Uh, we have a calendar, we have our slack.
Um, you know, we have all the social media for, for open cost. And, um, yeah, we're, we're, Forming a, a, a kind of a fledgling community. Um, you know, we've got about a thousand people in our Slack. Um, it's, you know, I've, because we're moving into incubation, I've been, you know, running a lot of, uh, due diligence about, you know, hey, this is, this is how healthy we are as an open source project.
You know, because there are the kind of metrics you look for, you know, where are the contributions coming from, uh, who's actually using it. in production, who's using it as, as an integration, who's using it as an end user. Um, CNCF wants to know all that stuff before they, they move a project out of the sandbox.
Um, there are I think there are like 120 sandbox projects, which is why that landscape document is so crazily large. Um, and then maybe there are 60, uh, incubation projects and I think there are 25 graduated. So incubation means it's a healthy project. You know, this is kind of a, a green light to other folks that like, look, we think this thing has legs.
It's got a pretty healthy community. Um. We're seeing regular releases, you know, they're responsive to issues and, and PRS and it's, it's, you know, it's progressing and, you know, the, you know, tuning, tuning our own horn. I open cost is one of the better sandbox projects. Um, and then last year for the 2023 wrap up, CNCF said, you know, we were, uh, A top 40 Linux foundation project.
Uh, and you know, whatever metrics they're using, it's really, it's related to, you know, contributions, commits, releases, responsiveness. Um, so, you know, that's, that's high praise, uh, at least, you know. To me.
Jonathan: Yeah. No, one of the, one of the metrics that I like paying attention to that's just fascinating to me is your number of open issues.
You know, is it kind of a, a steady line or are you on the logarithmic curve of
Matt: open issues? It's that, that is a tough one because the, the funny thing about, um, about, you know, when KubeCast, uh, contributed it. to, um, CNCF, they kept the get history. And so the project already had a hundred issues that, you know, had been open.
They'd been using it as their commercial, um, you know, issue tracking. And so I kind of had to go through and say like, this is a commercial issue. This is not an open cost issue. And, you know, just kind of pruning the backlog and, and turning on the stale bot and, you know, Generally telling people like don't open your KubeCross issues here.
Um, you know, we, there are repositories for that and, you know, we've, we've updated, you know, some of the, the templates like, you know, where does this go? Um, but, you know, it's KubeCross is, is, is definitely a good open source citizen. You know, they're, they're trying to, you know, hey, this is great. You know, we're going to fix this issue in OpenCost because they are downstream of OpenCost.
And so, uh, it's, it's, it's pretty, pretty good relationship. And now we've got, you know, we've got Grafana cloud. We've got Azure as downstream of open cost too. Yeah. And others. Yeah. Y Yeah.
Jonathan: So there, there is a term that you used a minute ago that I wanna dig into because it's not one that I'm particularly familiar with and that's finops.
Yeah. I can take a guess at what that means, but let's talk about that. What, what are finops, what does all does that include? Uh,
Matt: so, so finops is, uh, the, you know, the Linux Foundation is a, is a very large tent and, uh, one of their, um, sub-projects, uh, is. or one of their sub foundations is called the FinOps Foundation.
It's um, it's the finance and operation, it's the intersection of finance and cloud operations. Uh, so understanding how to track what's going on in your cloud bill versus how you're consuming cloud. You know, so as, as you kind of mentioned in the pre show. You know, you're luckily, lucky enough not to have to worry about the bills, um, but as you start to get to scale, uh, the, the, you're going to run into like, wait, why are we spending, you know, half a million a month on, on this?
Could we be doing it better? And you know, it's not just turn everything off. It's how do we, you know, how do we tailor our consumption? Are there things that we could be doing? to improve this. And so the FinOps foundation, uh, has actually published a lot of guidance. Um, uh, uh, the FinOps framework explains how to kind of start tackling this problem, how to think about gathering those numbers, you know, gathering up your metrics, um, Testing your, uh, your assumptions about, you know, how can we fix this thing?
How can we make savings? You know, what, what are the things that we can do to optimize our spending other than, you know, just turn everything off, which is actually sometimes step one is like what's running that shouldn't be running because you're paying for it anyway. Um, and then, you know, repeating that process.
It's, it's not just, unfortunately, it's not a silver bullet. Uh, you're, you're going to. Uh, you know, iterate over the process, but they have this, uh, crawl, walk, run, you know, methodology and in different, you know, phases of, of, uh, operations and finance to kind of tackle that. And so. Yeah, there are a pair of, you know, there's an O'Reilly book that's now on its second edition, um, uh, called, it's on my desk somewhere.
Um, you know, called like, uh, right. And, um, you know, they now have a conference and they have a very active, uh, you know, seeing, uh, not CSF, but they have their own Slack. And, um, It's growing like wildfire because everybody's on the cloud and everybody's got this experience. And so, uh, part of what they do is, um, you know, they certify different solutions as part of, you know, Hey, this is part of the, how to solve these issues.
And so open cost is a, uh, FinOps certified solution, which means. You know, we're, we're a tool that you use to solve these problems. And so digging into your Kubernetes usage, um, digging into your cloud costs, that's what OpenCost does. And so, uh, we're, we're not, you know, we're not going to make recommendations, but that's the first step is having good metrics, knowing what's happening.
And sometimes it might be as simple as, Oh, check it out. You're paying for all this compute. That's not even being used. Um, no, and so there are a lot of tools built on top of open costs that are going to make, you know, recommendations. Um, sometimes it's, it's, you know, just, uh, you know, pattern matching where, oh, when you see that 50 percent of your compute is unused, maybe you should resize your cluster, you know, maybe, or, you know, maybe, uh, You're paying for, you know, a dozen medium instances when you'd get, you'd be able to get away with three extra larges and save money.
Um, you know, so there, there's a lot of optimizations you can do there. Um, OpenCrust is gathering those numbers. People have built, like, machine learning solutions, you know, AI on top of that. Uh, but, um, KubeCrust is, that's what they're doing. They're, they're taking all these metrics that OpenCrust gathers and they're building a whole lot of Optimizations on top of that.
They've got your recommendations reporting budgets all sorts of Great stuff, you know machine learning You know fancy dashboards. It's all in there to dashboard
Katherine: um, so you kind of I feel like you kind of hinted at this at a couple spots, but the type of optimization you're talking about that that that brings costs down also would necessarily maybe be related to, uh, sustainability efforts and the kind of sustainability efforts around the CNCF.
I wondered, um, I know there's a sustainability working group. Are you, are you involved in that? Are you plugged in?
Matt: Yeah, we, we, we definitely, um, Um, definitely are paying attention. Uh, so you know, the, the, the CNCF has a technical advisory group, uh, sustainability tag. And I've gone to, uh, you know, I lurk in their channels, I'm on their mailing list and, and, uh, I'm involved in, you know, I've gone to those and I said, Hey, I have this great CNCF project that is tracking all of our Kubernetes usage.
Um, what can you give me? So I could say like, you know, when I use Intel instances, they cost this much in carbon when I use it, uh, our arm, it costs, you know, this much less, um, can you give me those numbers? And they can't yet right now. Um, what you get for most of the, uh, most of the carbon costs is, oh, here is your total compute carbon footprint.
Uh, which, you know, you could look at that and maybe if you, if it was all kubernetes, we could split it up and, you know, eventually tell you, like, here's how much carbon that namespace costs. We can't do that yet, but we're working on it. So I'm working with, uh, there's another open source project, uh, called cloud carbon footprint, uh, that is working on getting finer grain numbers, you know, right now, um, Um, like I said, most of the numbers are at the compute level.
They don't tell you like by the individual machines. And so, uh, we want, we, we need, you know, we need the carbon numbers for more than just the service. We need it down to the, the machines or, you know, the, the, the S3 bucket or, you know, whatever it might be, because then we can actually correlate it back to your Kubernetes usage.
But we are working on that. Um, Yeah. And, you know, fingers crossed, we'll, we'll have some announcements around that soon. Uh, but then that'll unlock all sorts of optimization opportunities. You know, you'll be able to say, okay, um, I can, I, I, I see I could turn off some stuff that's, that's good for carbon costs.
Um, I could, you know, potentially I see that I have this, you know, this one workload that is super intensive. Uh, we could move that to some arm instances. You know, maybe that'll save us some money or, you know, maybe. You see, you know, something that is just burning through through, you know, burning through money and carbon.
Um, it might be like that's where we should be optimizing our performance. And so, uh, it's it's an investigative tool. You know, it gives you kind of where you should be looking and what you should be fixing. But, um, you know, We are going to have something in the carbon footprint in 2024.
Katherine: In time for a cube con coming up, maybe, I don't know.
I
Matt: don't want to jinx it, but we're working on it. I don't know. We're working on it. There's a lot of moving pieces, right?
Katherine: Paris is coming up pretty quickly, but maybe North America. I know,
Matt: I know the, the, the cloud providers. You know, they, they talk about providing those numbers, but, um, some of those numbers are only available through like their billing, um, visualizing, they're not exposed in their public APIs.
So they have them in internal APIs. But not public facing ones. And so, you know, I'm behind the scenes talking to different cloud providers saying, Hey, if you publish these numbers, we can ingest them and then turn around and show how much that costs. And, you know, they're, they're receptive to it, but it's, you know, it's something that they're slowly doing.
So I can't, I can't promise that it'll all be there, but we will, you know, there will be one who breaks, breaks open the dam for the others. And similarly, the Finops Foundation has a project, uh, an effort that they're calling the Finops. Um, uh, the fin ups, open cost and usage specification focus. And so what focus is trying to do is take all these different cloud bills and standardize and normalize them into one format.
So that way. You know, the, you know, reading your Azure bill, reading your GCP bill, you know, your Oracle bill, it's, it's each one of those is unique and different. And what focus is trying to do is normalize them and make them all use the same terminology. So then you'd be able to compare your apples and oranges.
You'd be able to say, you know, my compute on Oracle versus my compute on AWS. Um, it's, you know, this, this is cheaper for the same instance types. Um, And so, you know, we're, we're involved in that, uh, you know, that, um, standardization process too. So, you know, hopefully, uh, we'll be, we'll be bringing focus integration and, uh, carbon footprint integration into open cost, uh, in 2024.
Jonathan: So I'm, I'm thinking about this, and one of, one of the first things to throw out here is, you know, when you're talking about sustainability, there's more to it than just carbon, but that's kind of like the, the, the one easy thing to talk about that kind of refers to the rest of it. And, you know, we're not going to dive into all of those details here, but one of the things that comes to mind is there seems to be sort of a correlation, a one to one correlation, um, between the amount of money you're spending on your cloud compute and the amount of Carbon pollution and all of that that is the result of it and so it seems like this is one of those places where the the two, uh, the two goals of You know not having more expenses than you need to and not polluting more than you need to really go hand in hand and I just I have to say it's really nice when that works when when being good stewards of the environment Is the same thing as trying to run your business well.
I wish it worked out that way everywhere.
Matt: Yeah. Well, there's, there's another, I mean, that's definitely true and, and definitely love seeing that. Um, there's another level to the carbon footprint though is, is It actually, you can find out which data centers get their power from which sources. And so you might actually spend the same amount of money and you know, I'm just going to throw names out there like, you know, an Oregon data center versus, um, one in Texas, but one of them may be.
You know, powered by hydro, and one of them may be powered by coal, and they might charge you the same, but they have very different carbon footprints or depending on the time of day, you know, you may be having different carbon costs, uh, you know, associated with solar, um, you know, versus nuclear or whatever it might be.
And so. Uh, there's another level of optimization that happens. You know, yes, you can save money just by lowering that. But also you might start looking at like, Oh, does this workload actually have to be, you know, in this data center? What if we moved it to this one? We're not going to save money, but we'll still lower our carbon footprint.
And so that, that's kind of a secondary effect that we're going to get out of it. Oh yeah, that's
Jonathan: fascinating. I, I like that, that you, you just, you present that data and you let the company make the, make the decision, you know, how much do they care about this particular thing versus just simply saving costs.
Um, that is, uh, that's pretty nifty. I like that. So we've talked about this, this term cloud native, the cloud native computer foundation, how that's part of the Linux Foundation.
Matt: What
Jonathan: what exactly? I've had people ask me this before. When we say cloud native, what actually is the definition of that? What what boxes do we have to check for a, you know, an application or deployment to be actually cloud native?
Uh,
Matt: that that's that's a good question. I mean, to me, to me, I've, you know, I've been doing this for a while. I, um, my, my answer would be, uh, it's cloud native if you don't ever touch a box. Um, and so, uh, you can still behave in a, in a cloud native. Fashion within your own data center, right? If you can turn on a data center and start running your workloads without having to go and, you know, insert a thumb drive or, you know, log into a box and start typing, um, that feels more cloud native to me.
To me, it's, it's, uh, you know, workflows and processes that are run through automation, um, You know, yes, it might be as simple as, you know, SSH in a for loop, but, you know, we've come, we've come a long way since those days, which is surprising that, you know, probably, probably there are a lot of people out there who, you know, are still not cloud native, um, they, but they're still in the cloud.
And so we're, we're definitely trying to, you know, find those users to, you know, we want to give them the tooling that they need to like. Easily dig into these numbers, even if they're not, even if they're not on Kubernetes, even if they're not automating everything, uh, we want to provide metrics so they can start tracking these things.
But yeah, cloud native, um, you know, it's, you know, it's, it's a, uh, it's a curve for sure about how far along that you are. And of course you're going to have, you know, your, your Netflix and like at the far end of it. Uh, but I think the vast majority of people are still in the middle of that curve where, yeah, we're sure we're cloud native, but, you know, they're still kind of, uh, you know, they've named their servers there.
They're still logging into them and, uh, you know, checking on them. Yeah, well, I think
Jonathan: it probably depends upon how big of scale you want to talk about. And if you, you know, if you're just serving one website and you have maybe 100 hits on it a week, you don't need Kubernetes, right? It's only, it's only when I have.
Well, I mean, maybe, or maybe it makes more sense to run that website off of Raspberry Pi sitting on the bookshelf. Um, it's only, it's only when you think, okay, this is going to scale up to a million people hitting it. Well, at that point, we have to do something else other than the Raspberry Pi sitting on the bookshelf.
Um, and I'm, I'm curious, though, about open cost. Does it fit in to those less cloud native? Uh, applications. You know, if someone is self hosting, if they have their own hardware, um, if they're using Raspberry Pis, is there, is there a place where open cost still makes sense, even for those smaller deployments?
Matt: Absolutely. Absolutely. So open cost, um, does support on premises so you can provide your own pricing and, and billing. So, you know, even without a cloud provider, I could be tracking my internal costs. And so if I'm, you know, say I'm a managed service provider, I could have my own custom billing. And I say, like, look, We've got a bunch of racks of raspberry pies and, you know, your usage of them is you're consuming 60 percent of them and you'll just, you know, kind of make up some, some billing numbers, but you can use it to track usage across those clusters.
Um, I, you know, personally, uh, I've got two internal, uh, clusters. I've got. You know, an x86 and a, um, it's not all raspberry pi. There's some other devices in there, but I have an arm cluster, you know, running K3s that I run open cost on and, you know, I'm just tracking internal workloads just, you know, to exercise that part of the code, but, but also, uh, you know, it's open cost is not particularly, uh, heavyweight, you know, it's, it's, uh, it's not, uh, consuming a lot of resources anyway.
I mean, it's, uh. writing out every minute. Um, so it's not, you know, digging too much. And so there's definitely a usage, uh, for small deployments. If you need to track, um, splitting this up a lot of, you know, it's hard to, to track how much a, a namespace or a workload costs, um, and in a cloud native environment, because a lot of these are ephemeral.
They're coming and going, you have a job that might last, you know, 30 minutes. You might have others that last 30 days. And so what Opencast is going to do is track. Just all those numbers, you know, as things come and go, uh, we'll be able to tell you like, oh, you know, this namespace. Sure, the containers are only living for, you know, 10 minutes on average, but over the month, you know, you had 10, 000 deployments and, you know, it was, it cost this much.
So, um, there's definitely. On prem usage. There's definitely small deployment usage. Uh, we also have some, some really large appointments, you know, people running literally thousands.
Katherine: So we talked about all of those great data that you can gather,
Matt: what do people
Katherine: do with that data? And in particular, is there anything on your roadmap to make it easier to use that data?
Matt: Yeah, well, so out of the box, um, open cost depends on kind of the, the default Prometheus and, uh, the default Prometheus is tuned for 14 days of storage. Um, it's open source. You could change that to whatever you want. Uh, and then, uh, open cost has a relatively simple react UI. You know, it's, it's one page, two page that has calls to the API that renders.
Hey, here's your 14 days of data. Um, occasionally we get folks who are like, Hey, this thing doesn't scale. And they've got a hundred nodes and, you know, they're running some, you know, some queries over a week and it's literally going and saying like, give me every five minutes of data across, you know, a hundred nodes for me.
This doesn't like that that much. Um, usually, uh, what people are going to do is forward that to. Something bigger, you know, uh, Thanos, Mimir, Cortex, there are a lot of Prometheus compatible databases that are meant for, like, longer term storage. Uh, what, sorry, um, you know, what, what people do with that data, uh, you know, you can federate those, bring them all into one visualization, you know, start to compare all your different clusters.
Um, that's, that's kind of what they do with that. Right now, OpenCost is just Prometheus. Um, we've got a couple of issues open to, like, document how to do that forwarding. The support's there. People are definitely doing it. We just don't have it well documented. Um, KubeCost does that. You know, they forward to, uh, Thanos.
Um, Grafana forwards to Mimir. Um, you know, uh, it's a GPL licensed, um, uh, version of, of Thanos. Uh, you know, and so, you know, there's, that's one form of long term storage. Um, we're constantly getting Prometheus, like, query updates and fixes. You know, I mean, Azure's contributed a lot. Grafana Labs contributed a lot.
You know, a lot of people are just saying like, Oh, you know, this query could be optimized. And so the performance gets better all the time. Just because, um, you know, people run this thing in production. Um, we also have CSV export, you know, so if you want just daily dumps of your, your data, uh, you know, we can export them as CSV.
Uh, one of our community members, um, recently contributed a new repository around Parquet export. Parquet is, uh, uh, an open source standard for large, um, data. Exports. Uh, I don't remember which Apache project it came out of, but, uh, you know, so we recently added an exporter so you can get daily dumps of your data to be ingested into some other BI tool, you know, crystal reports or, you know, what, what have you.
They'll take parquet. So, um, I
Katherine: have to ask, this is both my favorite questions and one of Jonathan's favorite questions, but I'll go there. I'll go ahead and go there. I'm a steal it. I mean, it's a great question. I always ask this, but, um, are users surprising you? Like, in other words, um, has anybody used it in a way that was unexpected?
It's designed for a certain thing, but has anybody ever adapted it that you know of to do something a little different and that was surprising to you? Yeah.
Matt: Yeah, I, I mean, one of the, the, the great things is, you know, at KubeCon, uh, we've had a kiosk where, you know, they have a project Sam, a project pavilion where all the, the sandbox projects get to hang out and people are come up and they're always like, Oh, I'm doing this with it. Um, so definitely we get, I wonder that exact
Katherine: thing, the auto scalers and they can automate the solution
Matt: as you can say.
Hey, when this happens, do that. You know, when, when you see these Yeah. When you, when you, when you see these metrics, yeah, so people built autoscalers on top of open cost that say, you know, when my billing does this, start doing that, you know, turn things off. Yeah, please. Right. Don't don't allow. I mean, it's guardrails.
Um, or, you know, when, when the costs drop, you know, maybe make it bigger. Maybe, you know, if you see the pricing start to drop because it's after hours, um, start ramping up our usage, you know, or spot instances are available, you know, start using those instead. So, so definitely we are, we're part of a lot of, um, You know, auto scaling and machine learning and, you know, a I, um, where essentially you're looking at the patterns of usage and costs and then making decisions based on that.
And so, um, People build their own solutions on top of OpenCross, that's, that's always exciting. And so we have a lot of, OpenCross gets embedded in a lot of those, because it's, you know, we're just providing really useful metrics to, to gather that. Um, and, you know, that's, that's what KubeCross does on top of OpenCross.
That's what, you know, Grafana Labs is doing on top of OpenCross. You know, people who embed it, that, it's the engine for making those sorts of cool integrations.
Jonathan: So, I'm, I'm curious, does open cost, are you eating your own dog food? Like, how much, how much of this do you use internally to keep track of things?
Matt: Um, I, you know, the nice thing about being a, a CNCF project is you don't have to eat the dog food, or you don't have to pay for the dog food. There you go. Um, We are not, but, uh, interestingly, uh, the Linux Foundation is using, uh, OpenCost by way of KubeCost. So, um, the Linux Foundation is actually using KubeCost to watch their own internal usage of, of, um, compute.
So a lot of, uh, you know, a lot of the different cloud providers, uh, AWS, Azure, GCP, Oracle, Scaleway, Equinix, um, they're contributing. Compute hours to the Linux Foundation, who then turns around and gives them to the Kubernetes project, gives them to all these build forms, all, all of, uh, you know, all the different projects that need to run CICD and testing.
Um, they have an internal team, uh, the, the SIG Infra for, um, the Linux Foundation that, Gathers up, you know, serves all these projects and and all the incubation projects, all the graduated projects. They're all getting compute, you know, and they all get, you know, you just kind of get some access to this. Um, they're actually using kubecost to track their own usage internally because literally they have thousands of kubernetes clusters, um.
Which is hard to do with OpenCost. Yeah. I mean, it's, it's hard that you, you know, OpenCost is primarily single cluster based and then, you know, you could federate on top of that. Um, but you're starting to build a lot of your own tooling. Um, KubeCost is doing that for the Linux foundation.
Jonathan: Okay. Um, I just had a humorous thought that came to mind, and I don't know that you have any visibility into this, but so it goes like this, you know, there has to be some businesses out there that they're taking the open cost tools and, you know, they, they break things down into their, their different business units and they take those costs and bring them into their accounting.
And so then they have a spreadsheet where, you know, This is how much this particular business unit is making us. This is how much this particular business unit costs. And one of those costs is your cloud stuff. And, and the, the humorous thought that came to mind is how many things have been killed using the data from open cost?
Do you have any stories about businesses, and I don't know that you would, like I said, this is kind of internal data, but do you know of anything that open cost has led to the demise of?
Matt: Um, in a good way, yes. So, so what you've just described in, in, in FinOps and, and, you know, finance talk, um, is, is chargeback.
Chargeback is when. You have an internal bucket of money and you split it among your organizations and you're like look Our cloud bill is, you know, a million a month and, you know, 300, 000 goes to this team, 300, 000 goes to that team and 400, 000 goes to that team. And you know, you start tracking that, those numbers, the company pays that a million, but each team doesn't have their own bill, right?
So you have chargeback where you have internal billing. Um, the precursor to that is showback, right? You're just showing people, Oh, here's how much you're spending. Maybe you should do something about this, right? And, and then of course the first version of Showback is what we affectionately call Shameback.
Where you're like, look what you're doing, you know, um, change, stop what you're doing. And so, uh, Shameback is, is kind of the first stop when you turn this on and you start seeing where all the money's going. Um. Anecdotally, we catch a fair number of botnets. Uh, people have things that have been exploited in their, in their bill.
They're getting this bill and they're like, you know, this month it's, you know, 500, 000 next month, it's 550, 000. They're not seeing the individual things that are like popping it up. But when you start digging into the numbers, you're like, Whoa, it turns out like this. Namespace had been exploited or, you know, you know, maybe, you know, not at the Kubernetes level, but something inside the application had had gone wrong.
Someone had gotten credentials and, you know, all of a sudden, you know, there's a, you know, Bitcoin miner running nonstop. And so, unfortunately, we find a fair amount of that. Um, And so open cost, you know, when you start tying into dashboards and reporting, uh, becomes like just an early warning system. Um, one of the interesting things about the way we're gathering data is, you know, we're using on demand pricing because there's, there's a time delay of when your bill gets published.
You know, AWS and Azure, you know, all of the, all the cloud providers. They're not saying like, Oh, we know what all your discounts are. We can tell you. Within five minutes, how much everything costs? It's actually 24 or 48 hours later. They're like, Oh, you had some discounts. You hit those thresholds. Here's your real bill.
But people want to see like, you know, they want to see the cost immediately. And so we might not always have the exact cost. We might not match your final bill because we don't have those discounts built into Opacos, but we show you changes in velocity. We show you, you know, oh, you're spending a lot more money now.
Put that on a dashboard, you know, and use that as early warning. And so we get used in that function as well.
Jonathan: That, that humors me a lot, that OpenCost is, is almost accidentally, uh, an incident, uh, detection tool. That, that's, I mean, I've been there. I've been there. I've gotten the email from my provider.
It's like, hey, by the way, you know, this particular server is an open redirect for DNS now. Or this particular server is sending out a whole lot of spam. We'd love for you to look into that. And I, it would be great to have an automated tool. tool like this that finds that stuff earlier that you don't have to get the email of shame from your ISP.
Matt: Yeah, good stuff. Yeah. So that, that's why we're a monitoring tool. I mean, you know, we, in, in that, you know, CNCF landscape, we are in the observability realm. Yeah, that's, that's where we are.
Jonathan: Makes sense. It's, it's more than just the dollars and cents. I like it. Alright, so I'm gonna ask you a hard question, because you gotta do set math.
You have to think about all the things that you wanted to talk about, and all the things we asked you about. And that is, is there anything that we didn't cover that you really wanted to cover today?
Matt: I mean, I think, you know, we've hit all the major topics that, you know, we've, we've talked about the roadmap, you know, we've got, uh, you know, carbon footprint is coming.
Uh, we have some more cloud providers are going to be supported, uh, soon. Um, you know, we, uh, hope to be making some announcements around, around those. Um, you know, we are always, uh, always looking for more, you know, contributors, um, you know, definitely join our slack and get involved. Um, Um, you know, we, just like every other open source project, documentation is the hard part.
Um, cloud billing has a lot of knobs and so, you know, a lot of configuration options. Um, there's a lot of Prometheus versions out there as well. So we're this hard matrix of configuration. So that's, that's what actually open source is really good at is, is covering off all the edge cases. And so, um, You know, definitely appreciate everyone who's involved and look forward to seeing more folks.
Um, yeah, I think, I think, you know, we've, we've touched on, on, on most everything. Uh, you know, the, the project, like I said, the project is, is going really well. Um, you know, we've got the carbon footprints coming, more cloud providers are coming. Yeah. Yep. It's all good. Yeah, good stuff.
Jonathan: All right, I've got to ask you two final questions before we let you go.
And that is, what is your favorite text editor and scripting
Matt: language? My favorite text editor is Emacs. Um, I have been an Emacs user, whoo, a long time. And, um, you know, what's funny is, is I actually Uh, I, I started on VI and, um, proper VI, you know, back on, uh, Real VI. Yeah. And I worked at a company that, uh, literally was NetBSD on the desktop.
Um, because the, the. architect of the company was a NetBSD core maintainer and made us all go down that path. And he used Emacs, and so Emacs was the standard. So I started with Evil, uh, which is the VI bindings for Emacs. And eventually, you know, we ran everything through Emacs. And I can't escape its gravitational pull.
Um, I've actually presented at EmacsConf. And, um, I used to maintain the IntelliJ bindings for Emacs, and I've tried to switch to VS Code with Emacs bindings. So, I go, I, I, I'm an Emacs user through and through. Yeah,
Jonathan: and
Matt: then scripting language. Um, you know, I, I, I used to work in a Ruby shop, uh, so, uh, I, I love me some Ruby, but, um, you know, I, I still use a lot of bash too.
So I'm going to stick with bash.
Jonathan: All right. That is a, that is a fair answer. Well, it has been, it has been great to have you and, uh, we're at the bottom of the hour. We've covered a lot and hopefully after you guys make some announcements and we can have you back and talk about those. So thank you, sir, for being here.
Katherine: I guess. Yeah, I'm intrigued. I've learned something. I've, um, I'm going to go research synopsis on the side later.
Dig in.
Jonathan: Yes. I, I think of all the things, the one that fascinated me the most, of course I'm, I'm sort of a security guy, so it's No, it's no surprise. Yeah. But this idea, I like that too, by the way. When you track all of this stuff, it also lets you know when something has been compromised and someone is,
Katherine: money compromises expensive in more ways than one.
Jonathan: I, I kind of wish we had, we had touched on that sooner, so, so many ways because we could have talked about it for a bit more. I think that's, that's funny. That's pretty neat.
Katherine: Uh, sure. Yeah. So, uh, you know, I still do the other podcast. I've got open at Intel, which is fun. And then I, you know, Doc and I still have reality 2. 0. I also should mention, uh, since we're talking about cloud native stuff, I will be at KubeCon in Paris, uh, podcasting live from the expo and my little cute fish bowl.
Uh, so if anybody's going to be there, I hope you'll come by and wave, but
Jonathan: yeah.
Matt: Cool.
Jonathan: Yeah. Very cool. All right. Well, next week we have Kumar. Oh, I'm going to slaughter his last name, uh, Singrikanda. And he's going to talk about open source DevOps at Toyota. And he's got a book he's written about open source DevOps. I'm hoping we can talk about the DevOps stuff, and I'm hoping we can also talk about open source at Toyota.
I think that would be extremely fascinating. Uh, It does. It does sound really cool. And so that is what we are looking forward to next week. Uh, and then as far as plugs that I've got, well, of course, there's the Untitled Linux Show over at Twit. And, uh, for now that is a Club Twit native, or a native, oh my, a Club Twit exclusive.
I suppose native works. It's native to Club Twit. Um, Um, and then the other thing is on Friday mornings, you can go take a look at Hackaday. The security column goes live there every Friday morning and would love for you to check that out. Thank you so much to those of you that caught us live in the discord and thank you to everyone on the download too.
We sure appreciate it and we will see you next time on Floss Weekly.