Jonathan: Hey folks. This week Rob joins me and we talk with Stefan Grabber about Incus and LXC, a pair of solutions that it seems everyone from home labes to Fortune 500 companies are interested in these days. You don't wanna miss it, so stay tuned. This is Floss Weekly, episode 828, recorded Tuesday, April the eighth, Incas inception.
It's time for Floss Weekly. That's the show about Free Libre and open source software. I'm your host, Jonathan Bennett. Today we are talking about L-X-C-L-X-D. Incus Linux containers and some of the drama that happened around that in the past few months. First off, I've got a talented and wonderful co-host and it's, it's Rob.
Hey Rob. Hey. That's the best introduction I've ever had. Well, there you go. It's kind of like when universes collide and one of the things that I've, I've enjoyed doing with Floss Weekly is to bring some outside blood in as co-host, and Rob is one of those. So I appreciate you being here.
Rob: Always great to be here.
Jonathan: Yeah. So we we're actually, we're talking with Stefan Grabber, and this is this is sort of, so he is the maintainer for several projects, but one of which you actually brought to the Untitled Linux show as a story. And that's when LXD was forked from Incus because Ubuntu was sort of asserting their dominance over the project, let's say.
Rob: Yeah, they were making, you know, doing the typical, making a lot of big changes to LXD that you know, people didn't like. So yeah. Had to bring that story.
Jonathan: Yeah. Yeah. So as we, as we said in the pre-show, that's one of those easy stories for the cliques. 'cause people love the drama. Alright, well let's go ahead and bring Stefan on our guest for the day.
And so welcome sir. We appreciate you being here. Hello.
Stéphane: Yeah. Thanks for getting me. Thanks for inviting me here now.
Jonathan: Yes, I'm, I'm very much looking forward to this. So you've got, you've got kind of your fingers in a lot of pies as it were with the, the entire Linux containers ecosystem. What, what all, what all with that are you maintaining and what, I mean, I'm not even familiar with the entire ecosystem of Linux containers.
I used, I used a little bit of Docker and I used lib vert and that's about it. And you guys are kind of like. The other side of the pond, it seems.
Stéphane: Yeah. I mean, so some of it has amount of overlap, especially at the, towards the beginning of the Dock Up project, for example. Mm-hmm. But yeah, Linux containers has been around for quite a long time, well over a decade or probably around 15 years or something at this point.
Mm-hmm. And it, it started mostly with the LXC project, which was the first user space implementation of containers on Linux as container support was being pushed into the Linux channel. Mm-hmm. So that came around the time where containers were mostly Linux V server or Open vz, both of which required extensive patches on top of the Linux channel.
Mm-hmm. And so the idea was to implement containers properly in Linux itself. And so Alexi became the, effectively the reference implementation on in user space. So that grew quite a lot of over a number of years. And if we, we go all the way to when Docker started, the first versions of Docker actually used Lxe.
So Docker was effectively a wrapper a layer on top of Lxe to do all of the layering and image management parts. That was really the innovation. Mm-hmm. But when it came to running the ER itself, they were using alexy. And then my, my best understanding, I'm not sure if that's accurate, is that as they were building that into a company, it made a lot more sense from like an IP point of view for them to also to own the entire thing, not just be a rapper on someone else's thing.
Imagine that. Yeah. Especially when you're looking for funding and VC and all that kind of stuff. Sure. That's something that, you know, keeps coming up. So I think that's when they decided to start with like the container, the aspect and all of that stuff. So effectively running. The container themselves instead of, instead of just setting things into place and then asking Alexa to do it.
So that's when that changed. Like as far as the user experience to the user, there wasn't really much in the way of changes there. It was really more like a, I suspect it was actually more about control and it was about about the technical difference at that stage. Mm-hmm. Because as someone who writes a lot of go these days and do written in, go writing a container manager directly in go is a terrible idea.
Go is really not good at doing low level camera level interactions. They had to jump through so many hoops to get something like Docker working properly. Whereas when they were using Alex, Alex is written in C and so has that very tight integration within external that's a lot easier in that way. So yeah, it was mostly kinda the, you know, how Alex kind of started a long time ago.
Mm-hmm. Was initially created by a, a small French company, then was funded by IBM for a while. Then after that it was mostly canonical funding it while I and a bunch of others were working there quite a while later. We effectively felt the need. Well, we felt that LXC as a tool worked fine, but was not user friendly.
Mm-hmm. And also was not using most of the new features in a good way because we always wanted Alexa to be backward compatible so we could never make it use all the shiny stuff out of the box. Yep. And so a bit of a reset was, was a good idea. So that's when we came up with the Legacy LEAP project.
Like kind of, yeah. Creating of legacy product was kind interesting. Like, we wanted to do something on the LXC side, like maybe writing a new tool or something.
Mm-hmm.
And then canonical management, as in Mark came down with like, Hey, I really want a new thing. More like dockers, like, well, that kind of works out.
We, we, we were looking at doing that anyways. Then came a pool, bunch of design and bike shedding on naming and programming language and all that kind of stuff.
Yeah.
After some back and forth and some agreements and disagreements, we found a compromise that we ended up with, which was a project called Lex D.
The name was a compromise, not something that we liked. Mm-hmm. Written in go. That was another compromise that wasn't gonna be our first pick. But still part of the Alex community? No CLA released and and Apache, the Apache to license. Those were the bits that we effectively worn during the negotiation.
Mm-hmm. And we kinda kept going with that for quite a few years, probably. I mean, at this point we are, we are getting when was it? Yeah, well, probably the decade point now. I think I left before the first decade of, of Flex D but now at this point they've crossed a decade. And yeah, that's, you know, been, that's been growing quite, quite a bit.
The user experience and everything was good. We did manage to make it a, as much of a committee project as possible, which Right, while working for Canonical can be a bit tricky. Like the, so some of the views and some of the ways they normally run projects at Canonical are very much gonna.
Company driven and with a, a, a bunch of obstacles in the way of con company contributions. Mm-hmm. We thankfully managed to negotiate most of that away. And so I think as far as public facing open source projects and Canonical LexID was one of the kind of good ones. There were a bunch of other kind of drama internally with Canonical and stuff, which led to me wanting to leave.
And so I announced that I was leaving towards the, well, halfway through 2023. Mm-hmm. And that's when we very quickly noticed how much of, of, of, of those exceptions and things were there, literally because of me. 'cause the moment I announced I was leaving, that's when it's like, okay, let's rebrand the project.
Let's move it away from the electric community. Let's kick out all of the community maintainers out of the project and only keep employees in there. Mm-hmm. All of those changes happened basically within weeks of me announcing I was leaving. I wasn't even out yet. That that stuff was all happening effectively.
Interesting. So that's kind of unfortunate. Like, my, my goal at the time was actually to leave the company, but remain a contributor to Lexi. I would've liked to remain one of the maintainers. I could have still done a bunch of the release management, run a YouTube channel, all that kind of stuff. I was quite happy to do that outside of the company.
But they've, they made it very clear that they really didn't want that. And hmm. That then caused our community to get quite frustrated by this. Which then led to Alexa ra at SUSE to go and fork Lex D into Inca. And well then we had a Lex D sized hole left in the UX containers project at that time.
So we figured, hey, well that Incas thing would be a good fit to fill in that hole. And we just took it over. And effectively all of the lab resources, all of the build machinery, all of the stuff that we had as part of the open source project just all went straight back into Incas. And that's how we've been running this thing ever since.
It's, it's actually kinda interesting 'cause I've been running effectively have, yeah. I've been running in effectively the same way I used to with Flexy, but we've seen a lot more interest, a lot more contributions, a lot more active, other companies getting involved in that kind of stuff with Incur and we have added with Flexy.
Mm-hmm. Despite. Like on the face of it, it's still the same license, it's still on GitHub and the same project. It's everything is identical. The only thing is that there's no more canonical being tied to it. Mm-hmm. And that had a lot more impact than I expected. Like I was actually expecting the opposite, being like, well, you know, companies might.
Like having a bit more of, a bit more backing, full-time paid employees and stuff like that, working on a project before they decide to use it. Right. And no. If anything, we've seen a bunch of, of, of, of folks like large companies and stuff jumping from legacy over to Inca just being like, Hey, that's a way to get rid of that canonical thing and have, and have a project that we can use on whatever district we want that's not gonna cause us issues for support and that kind of stuff that we can feel like we can contribute to and is not at risk of being canned at any point.
Mm-hmm. Like Canonical is not Google, like they don't kill projects every five minutes, but. Still, like it's, there's always that risk. That's happened a few times, like a big company pivot type stuff, and like three, four or five projects just immediately become unmaintained.
Mm-hmm.
I think is not particularly good at killing off projects, but they're very good at moving everyone working on it to do something else which is kind of the same, just less obvious.
Yep.
Jonathan: Yep. So you mentioned the Linux containers project. What, what is that and how is it distinct from say, Incas?
Stéphane: Yeah, so Linux contain project is kind the, the umbrella project. We've put around a bunch of things. Initially. Initially we just add LXC, which actually stands for Linux containers, but that's just kinda confusing.
And we then added more things. We created a project called l Ccf s Alex CFS, which is a file system to expose exact resource usage inside of the containers so that when you go in a container and you run. Top three or whatever, it shows you the actual amount of memory that just that you're located in a container and not the system, the global system value.
Mm-hmm.
So we implemented that a long time ago now. So that was our second project and like Okay, what do we put it in? Like, what's gonna holding all of those projects together? Right. Thankfully we didn't say it is canonical Winsted went with, there's the Linux containers project, which then effectively hosts the project like CFS project.
And it also has incur for a while. We had another project called CG Manager, which was like a legacy cgroup manager until System D kind took over. Mm-hmm. That from us and also host technically probably another 12 or 15 other projects that are just not significant enough for to be listed directly on the website.
But it's like all of the related stuff like incur deploying SOS for like incur deployment. Terraform provider there's like a CRI, so if you want to use LXC wooden Kubernetes, there's, there's a thing for that. There's like a whole bunch of other projects and, and repositories that are all within the LLXC org on GitHub.
And so therefore part of the Linux containers project and the, the main benefit, like we don't, like, it's, it's not formally registered, like it's not a nonprofit or anything like that. Mm-hmm. But it still benefits from having effectively a strong set of org, like organization wide maintainers that can step in at any point in any of the projects to deal with things like security issues or that kind of stuff.
And then we have per project specific maintainers. Mm-hmm. So, so that kinda, that makes it easier to handle cases where like one of those projects might go and maintain because the one guy who was working on it is not really focused on that stuff anymore. Now we can go look for someone else or we can, you know, step in and at least handle critical bug fixes and security on it.
Jonathan: Yeah. So I'm, I'm curious about LXC. Is that you, you said it was kind of the original reference implementation for containers and Lennox. Mm-hmm. Is it, is it like, is it still maintained? Do people still use it? Is it still Yes. Recommended for new installs? Where, how does that like,
Stéphane: so yeah. So it really depends direct, so there's kind of two aspects to LXC.
There is the library itself, so live LXC mm-hmm. Which is very much used by everyone using Lex D or Incus, because that's still how we actually run containers behind the scenes. Like, as mentioned, we're also written in Go, but we didn't do the same thing as Docker, which is try to run containers natively from Go.
'cause that's a bad idea. Instead we, instead we use C Go so that we call into a C library, which is the, and then we run things from there. Mm-hmm. So that's effectively if you use Ingo, if you use Lexi, you are using Label xe just not something you really notice because it's driven by that upper layer for you.
It's also directly used by some other folks. Especially if you think of like the, the guys running open the WRT for example.
Jonathan: Mm-hmm.
Stéphane: Those platforms are like so tiny that running a piece of gold that requires 50 megs of memory just to start the demon is not exactly a good fit. So on those platforms, when you want to do containers, quite often it's using a.
And the c tooling, which is like the, the most like interesting kind of old school way of running containers. Yeah. And, and that's still being used there. So like embedded platforms. Yeah, that's, that's definitely a thing. Like open the RTI know has it, mm-hmm. We've seen it on some phones as well where you know, if you think something like a Samsung deck or those kind of things where like you can dock your phone and get a desktop.
Mm-hmm. I think these days most of them use virtual machines 'cause phones can do that. But for a while, yeah, we had some contributions from Samsung, Huawei, and some others. We have a suggesting that they were using Alexa.
Jonathan: We have a listener who says that his mashed potatoes, this is Chromebook, Linux VM actually uses LXC.
So apparently,
Stéphane: No not quite. So on, on Chromebooks the, the, the Linux development environment on Chromebooks is Lex D so that's, that's using Lux d That was, that's actually the largest, that's actually the largest user of Lday arc Chromebooks. So there's like one point something million active machines that are counted as users of Flex D, but they're effectively older Chromebooks, vast majority of whom are effectively school kids who want to play video games.
So it uses
Jonathan: l it uses the Chrome OS used to use LXD, not LXC.
Stéphane: Exactly. Yeah. So it's and they do a mix of both because Google is quite paranoid with security. So they actually run a virtual machine layer using something called Cross vm, which is the, the Chromes VM layer. And then in that VM they, they have Lex D running on a gen two kind of baso s thing in the vm.
And then when you get to the terminal, you actually get it inside of a Lexi D container running on there. Now this is gonna go away not clear when, but it's gonna go away for a very simple reason. Google has a company wide band on A-G-P-L-V three software. And canonical lic Lex D to a GPL. So Lex D is not, is no longer allowed on within Google Development teams, which means it's still fine on Chromebooks because they're using the older LTS, which is still Apache two licensed, but they can't upgrade and they effectively have two ways out of that.
One is they switch to incur which they're quite open to it. Like I, I know the guys that Google reasonably well. Mm-hmm. But the other approach, which is a lot more likely at this point is that they will just offer a straight up vm from cross VM to the user and not do that container layer because that would actually line up better with what we've they've done on other platforms.
Right. Like they, last month they rolled out on pixel phones that you can get a DN vm and they want to, like, ChromeOS and Android are gonna like, effectively on the collision course at this point. Mm-hmm. It looks like they want to save some costs and start effectively having the, the Chromebooks move slowly move towards Android.
So using the same code base or very similar code base to what's used on the phones makes a lot more sense. Mm-hmm. So I don't expect the Chromebooks to move over to Inca, so I expect them to just move over to Pure VM and, and get rid of Lexi D that way effectively.
Jonathan: Yeah. So back to LXC, keep our acronyms right here.
The, the whole world seems to have gone in the docker direction and you know, I'm, I'm humor that even if you want to use something other than Docker, like I make quite, quite a bit of use of Pod Man. It has all the integrations so that you can use Docker commands with Pod man, and it's pretty much a drop in replacement.
What, what is, what's the situation like to use LXC instead? Is there kind of a similar compatibility layer? You talked about a way to be able to use LXC with Kubernetes, which is very interesting. What, what's that situation like?
Stéphane: Yeah, so mean Alexa can be used as, as a runtime. It doesn't really care what the image is.
Most people, like when they think Alexa containers, they think full distros because Alex was mostly seen as a replacement for Linux V server or open vc, both of which really focused on providing VPSs. So when you wanted to, to have high density VPS hosting, virtual machines were not really viable.
You could use containers and give an environment was a lot cheaper to run. And that's gonna, that's, that was the market for V seven open vvc, and then eventually LLC. Mm-hmm. That's still a big thing. There's still a lot of cheap VPS and stuff that still uses containers that way ing. There's also quite a lot mostly on the web hosting side, which uses Lxe quite extensively because they want to run a lot of like run, you know, a lamp stack or something else, just like web server, MySQL, PHP go.
And they, like, you could run that through Docker. Sure. But then you need to run like three containers and interconnect them and deal with restarting them and all that kind of stuff when you could just install the packages in the container and just move on. So those, like, we've see quite a of those guys still going on.
But I said like, XE fundamentally doesn't care. We have when you create a low level XE container, we've got multiple templates you can choose from, and one of the templates is called OCI and literally pulls an image from the docker hub and just runs. It. So, so we can, we can do that just fine. It's not, again 'cause it's LXC, which is the low level kind sample tool type thing.
At this point, it's not super nice user end, user friendly. But it's a kind of feature that we've then in added into projects like Inca, right? Incas these days let's you do system containers. So the traditional LXE type thing let's you do virtual machines and let's you do OCI containers. We can point it a docker hub, point it at another registry and just run those containers.
Yeah. So then you get kinda all of them covered.
Rob: So with the LXD, Incas and LXC, what is, what is the relationship between those? Like I'm familiar, I'm familiar with L-X-D-L-X-C, I mean not functionally, but it sounds like LXC. My understanding is LXC is like the container and is LXD like. A manager?
Is that more than just that?
Stéphane: Yeah, effectively that's what it is. Like both Flex D and Incur are effectively managers on top of other things. So both of them do LXC and QMU as far as what they, they allow driving. Okay. LXC for containers, QMU for virtual machines. And then the, the main benefit is that they, they give you a nice rest, API, they provide image management.
They provide like multi-tenancy clustering like complex storage and network management, all that kind of stuff. Which if you are using old school LXA, you would have to do everything USSH by hand on a machine which is very appealing to some folks, but also very much not appealing to a lot of other folks.
So yeah, that's, that's really like a, a full management layer on top of it. The idea is to make it feel like a private cloud like it supports object storage, it supports like virtual networking supports, all those kind of things. So that it looks like, and acts like a private cloud, that you can run this on your laptop or you can draw it to a few machines in your home lab.
Or you can be like one of my customers and run it on 2,500 servers. Like it's, you, you, you can do it at different scales and it's remains mostly the same. Like someone can who's working for another. Those organizations running dozens of large clusters, they can still run it on their laptop, they can do their development on their laptop, they can run even the Terraform provider and all of the development deployment tooling on their laptop, make sure everything behaves.
And they basically show this is gonna work the same way in production. Which is not something you, you can do quite as easily with something like Open Stock or even Kubernetes. Like there are some single machine Kubernetes diss, like Micro Case or K three s that gets you most of the way there, but there's still a lot of moving pieces as part of those.
Whereas yeah, something like Lexi Incur is, is. Very tightly integrated, very lightweight and, and simple to run. Like typically you've got one process running on your machine, that's it. Like you don't have, I don't know, 50 plus I guess, for Kubernetes these days.
Rob: Yeah. So, so when you said when you're correcting on the Chromebooks that they run LXD and not LXC, it's because they're using LXD and, and Qmu for, for the virtual machine product.
So they don't actually
Stéphane: use the VM parts, so they, oh, they're the, the vm the, their VM layer is cross vm, which is their own virtualization manager. Okay. So they've written, they written their own equivalent to QMU effectively, and then inside that VM they run next day. And then the Chrome Os os talks to like the other, other address, API, to then create containers.
So if you enable all the features on the Chromebook, you can make it so that in your terminal, you can have one terminal, that's Deb 12, one terminal that's open to something, one terminal that's, that's sent to us. And those are gonna be three different containers running on, on the Lex D instance that they're running in wm.
Rob: Okay. So let's get on to Incas more again. So, is Incas still feature compatible with LXD today? And do you see them, you know, is it gonna stay upstream or do you see it splitting off kind of going its own way?
Stéphane: So it's, it's already split off to an extent 'cause we don't have a choice. For the first six months or so, Lex D was still Apache two, so we could exchange code back and forth.
So we would add new stuff, we would fix things, they would look at our code, they would integrate that. We would do the same. And some features were just on one side or the other. Like if it was a feature we didn't want in Incus, we're just gonna leave it in next day and vice versa. But there was good back and forth for the initial six months.
And then around December, 2023 is when they realized to a GPL. Which now makes thing make things as symmetric because we are still Apache two, which means they can take our code and put it inside of next day and release it as a combined work under A GPL. We can't even look at their code without being tainted.
So we on purpose do not look at the report or issues or anything on the LD side because even just seeing patches and stuff on that side could cause us to be tainted by a GPL codes that could cause issues. So, yeah, that's so now we've, we've diverged quite a bit. Like I mentioned earlier, we support training or CI containers so we can run Docker containers on Lex D doesn't have that they just didn't take that feature.
We support some storage options like Lin store or like using an NVMe TCP or Fiber channel san, they didn't take those changes either. So there's like some fairly big features now that are quite different between the two. It's. And it also limits our ability to handle kind of moving from one to the other since we can't go look at their code or look at the database che, look at any of that.
Basically anyone who's running on next day prior up until 5 21, which is their last LTS to be released on the Apache two, we can migrate from that to Twink gu. We've got a tool that just does it in place. It takes like 30 seconds and you've converted from, let's say twin Gus. That's very nice and easy.
Anything after that we, we can't do anything about. So then you're looking at twin installing,
Rob: So you're likely to diverge quite a bit as as time goes on, if anyone copies anyone though, they'll be copying you. Yeah, exactly.
Stéphane: So my understanding, 'cause I'm still friends with some folks on that team, is that they have a massive spreadsheet where like they look at all of Apple requests and they put them on a spreadsheet for when they've got some time, have someone go look at it and then include and backboard the code.
Switches. Yeah, that's kind of, it is, it's kind of funny 'cause I think at this point we've got a very active community. Like some, some weeks I'm, I mean, like right now I've got I think, 45 issues that are actively being worked on by different people.
Jonathan: Mm-hmm.
Stéphane: So some weeks we, we merge like dozens of port requests.
And I know that on the canonical side, they only have like one, if not two full, maybe two full-time people. So if they just try to keep up with us, they're already spending all their time just keeping up with us. They don't actually get to do anything. Which is, which is kind of funny.
Rob: That is a, that's an interesting place for for a fork to be in where they're kind of really taking over and they kind of did to themselves.
Stéphane: I mean, it's not the first time. I mean, imagine that, you know, open office, lib office, there are a bunch of other similar things that happened. You know, it's, it's, it's not, it's, it's not really the first time this happens. And yeah, we've also kind of changed, like at the beginning we were always introducing ourselves as like, oh yeah, incurs is a for of legacy.
Now we've actually gotten enough people that just started with Inca from the beginning that we're not even mentioning the fork part anymore. It's just like Inca is like a private cloud platform that drives Alexia and QMU and stuff, and we don't even mention the Lexi thing anymore.
Mm-hmm.
Because everyone was gonna move away from Lexi to Inca.
We'll have done so by now, and everyone else upgraded to a version that can't, that can't move seamlessly to Incas anyways. So they're gonna need to, to reinstall and redo things. So, yeah.
Jonathan: I'm, I'm curious, does Incas have options for high availability?
Stéphane: Yep.
Jonathan: How, how, how sophisticated are those? So I, I've, I've done, I've not built it, but I've, I sketched out the plan for, and I actually wrote the estimate up for building a high, a very small, high availability cluster for a customer.
And there's a lot that goes into that.
Stéphane: Oh yeah. It's, it's not easy. But yes, so the clustering piece we did in Legacy and then improved significantly in Incas handle that, handles that pretty well. The main thing is you. For all the kind of hha to work properly, you need to have your storage and network BHA and that's usually the hard part.
So like on the storage front we have three options these days. So we've got cef if you're using CEF or storage, then it's already distributed and you've got h ha on your storage. Mm-hmm. We've got LVM cluster that we can run on a shared block storage that all machines can access. Mm-hmm. So that can be like at home you could use like a, a NA that supports ICE KZ and then have your servers use that, that ice KZ volume as their shared storage and Okay, cool.
Now your storage is accessible from all three or all five or whatever number of machines you have. You're good. And the third option, which we merged in the previous industrial release is Lin Store, which is it's by the company that. Did most of the work on DRBD and it's effectively a managed layer on top of DRBD that feels like cef, but is a lot lower latency and bio throughputs than sef because you effectively only ever rights to your local disc and it gets replicated to one other machine next to you.
Instead of doing like the full distributed rights across all of the machine type stuff that CEF does. So we've got those three options on the storage front. Once you've got your, your instances volumes and everything on, on shared storage, then network is the other one. And for network, then you have two options.
One is you've got a physical network that you just want to use. So you've got VLANs or something like that with a physical router sitting on that vlan. Mm-hmm. All machines can see the same vlan. Cool. Just dump the containers or VMs on that and you're good. Or you need to go with distributed networking.
And for that we support oven. So that's the distributed open V switch layer, which. Yeah. Then lets you do virtual L two networks and that kinda stuff that's also distributed where like if one machine dies, the virtual router just seamlessly moves to the others. And things just, yeah, handles. Ha So once you've got storage and network, then you've do the easy part, which is Incas itself.
And Lex D and Inca both can do it the same way where the database that we use to store most of our state can be natively clustered. Mm-hmm. So once you start adding more servers in the cluster, as soon as you get three, the database switches to clustered mode and it uses raft for distributed computing elections, all that kinda stuff.
Mm-hmm. To distribute that, that state and make sure that machines like effectively vote for to achieve quorum on any changes before it actually gets committed to the database. Mm-hmm. At which point all of your servers effectively see the exact same thing. One of them gets elected, the leader and is the one that handles most of the, most of the right changes within the cluster.
If that machine goes down typically within five seconds a new leader has been elected within the cluster. Mm-hmm. And then you've got the remaining part of ha, which is, what do you do with the instances that just went boom, when the seven went down?
Jonathan: Stone. Stone?
Stéphane: Well, so we don't need to do that. So stone, so you need to do it in some cases, like you don't need to do it from the database standpoint, because since we use raft, raft avoids the split brain issue completely by requiring a quo vote.
So you need Right, right. You need uneven number of machines, a majority of whom agree on a change. For it to go through, which avoids any kind of split brainin type situation. Stone
Jonathan: stone, by the way, for those listening that are not ha nerds, STONEY stands for shoot the other node in the head. And it is one of the ways to solve this problem.
And it it, it really comes up if you have two machines and they each have the ability to just turn the power off on the other machine. Mm-hmm. And so whichever one, you know, if there, if, if a problem happens, whichever one wins the race, just kills the other one. And then you have one kind of canonical VM or whatever service you're hosting.
Mm-hmm.
Stéphane: Yep. And, and so like typically we don't need that at the in cost level. Mm-hmm. But in the storage case you do so with c or in store we don't have that problem because there's like a built in lock lock manager and everything that prevents contract rights and that kind of stuff. So it's fine.
Also c is typically network. So if a machine goes down, it usually means it's not responding on the network. Therefore its storage is probably also. Broken. Mm-hmm. So you don't risk to have rights. It's fine. The issue is really with the LVM cluster option, because if you're using fiber channel, for example, fiber channel is a completely separate network from your normal network.
Yeah. Which means you could get into a case where your network went down, the machine looks like it's offline, but it's storage is still functional Good. At which point, if you start the same VM a second time, it's now running twice. So in that scenario, that's when a documentation is like, yeah, it's, so you need to have a way for, we emit an event that's saying this machine is dead.
You need to ensure it's actually dead. So that you avoid, you avoid that whole issue. Yep. But internally, within have effectively a, a counter you can turn on, which is like self-healing. And you say, you know, after a minute or five minutes or whatever of that machine ha being seen as being dead as in, has not been responding to the database pings and is also not responding on the network.
Then treat everything that's running on it as being dead and start them back up on the remaining machines. So it just goes back through. Our scheduler picks different machines and starts them back up on that side. And when the machine that was detected as dead comes back online, it's gonna automatically come back in maintenance mode.
So it doesn't take back anything that's running.
Mm-hmm.
Which again, avoids another kind of potential issue with some split brain type stuff. Yeah. So it never starts anything back up when it comes back online. And then you can go and check and if it looks like it's safe, as in the hardware is not broken and everything seems to behaved, then you can get it out of maintenance mode, at which point it's gonna go and move back the stuff that was like spread across the rest of the cluster.
Jonathan: Yeah, that's, that's impressive. This is actually being used in production in places I assume.
Stéphane: Oh yeah. So I mean, currently, so I mean, the way I can set up is I've got, after I left Canonical, I created my own company. Mm-hmm. So I'm basically self-employed for a bunch of that. I'm helping out with a bunch of folks that do anything from like game streaming platforms, web hosting government stuff, kind of all over the place.
Public clouds, that kind of thing, that directly use Incus and just need the occasional, very technical, low level help. Mm-hmm. But are also the CTO of another company that we created back in July whose job it is to move Fortune five hundreds from VMware to not VMware, so, and anyone to follow the news.
Broadcom is quite expensive these days.
Jonathan: Yeah.
Stéphane: So a lot, a lot of those guys don't really enjoy getting eight. Figures bills. Yeah, so they, they, they were, they're quite actively looking at moving and for that we are, we're effectively, we've written tooling and are moving VMware VMs in tens of thousands from, from VMware clusters over to Incas.
And that's where like a lot of those kind of ha balancing type stuff was needed because they're running VMware clusters that have self-healing. The DRS features, like the automatic rebalancing and mm-hmm. And load sharing within the cluster, all those kind of things. So we effectively made sure that we have equivalence for, for all of the useful features.
Mm-hmm. We obviously don't want to be identical one-to-one with VMware. 'cause VMware probably has like 70% of its features that nobody uses and others. You know, dead weight, we don't want to replicate any on that. Right. But the, the useful bit, especially on the transition side Yeah. We've, we've been working quite actively to support on that.
Jonathan: Yeah. Interesting. You, you talked about having a, an upgrade path going from LXD to Incas. Is there also an upgrade path going from one version of to the next?
Stéphane: Oh yeah. So in both Flex Day and Incus have always been extremely backward compatible. I think, I mean, for a long time with Flex Day, you could go at any point from next D 0.1 to next D 4.0, so effectively a eight or nine years jump.
Mm-hmm. And you could, that would handle it perfectly fine. Like the database would just detect what version it was and apply all of the changes and do all of the data migration needed. And it was perfectly fine. Yes. It's with 4.0 we switched the database from some database packages from using a raft implementation written in, go to one in C and that goes enough bumpiness.
That supporting, upgrading from like 0.1 all the way to 6.0 would've caused us to keep so much debt code around just in the migration. That didn't make sense. So in Lex D when we did 5.0, we made it that you needed to do a stops through 4.0. Mm-hmm. So if you were not on 4.0 already, you need to upgrade to 4.0 first, then you could go all the way to five or whatever.
And now with Incur, we mostly have the same thing where if you were running Lex D 4.0 or newer we can do that upgrade in place. It's fine. And if you were running Lex D prior to that and indeed to first do an upgrade to like Z 4.0 and then you can jump onto Incur 6.0 6, 11 6, whatever. Mm-hmm.
And within Incur itself yet the upgrades are extremely seamless, mostly because we do monthly releases. So people are kind of used to doing, going through updates and we don't want that to be a bumpy process. So it's basically new package comes in interest restarts, you're done. And it usually, even when you do a big jump, like an LTS to LTS jump, it still takes less than a second for database to have done the update.
So that's, that's very, very quick and, and seamless.
Jonathan: You, you kind of, you kind of jumped on a train that I was already thinking about, and that is the, you mentioned earlier that you were not super happy with having everything written in go, and it sounds like you've started moving back to writing some things, rewriting some things in sea.
Well,
Stéphane: so actually of, so running containers from Go is not a good idea and we've never done that. So thankfully we've avoided that kind of worms. Mm-hmm. When I keep seeing what some of the container D and residency folks have gone through, as far as like the weird bugs and workarounds and stuff they had to contribute to go proper to make things behave in all cases, I'm very happy that we don't need to deal with any of that.
Right. That being said even though we were forced into doing Go when we created the LD project. That was actually a good fit. Like Go is actually something I enjoyed writing these days, especially for those kind of larger code bases that write that have a very large rest, API and it to, to run a lot of background tasks, go is actually a pretty darn good fit for that Go is not a good fit for anything low, super low level.
So that's why yes, for example, the database running the database purely in Go was not a particularly good fit because the on this format is cite, cite is C only. And so go having to constantly kind of back like contact switch between go and see was just extremely slow and, and problematic. So instead the way it works now and, and that's been the case for a while, both in Lex and, and Incur, Lex uses something called dite Incurs uses something called CAR sql.
Same project, but the author of the cite also left Canonical, so he also fucked this project on the but like in effectively how that works for anyone who's deep into go stuff is with Spawn what looks like a go routine that loads all of the, the car SQL stuff, which is 99% cgo. And then we lock that go into a single thread.
And that thread from that point on effectively only runs C stuff. And the way we actually access the database from the goal size to that C thread is we go over the network. So we use a unique socket between the two threads. Yep. Because the go socket API is very, very fast. No block, no blocks, no C context switch, none of that.
Mm-hmm. And the other thread just runs C only so it's, it's nice and fast.
Jonathan: Mm-hmm.
Stéphane: So that's kind the workaround we took on that one. And that's been working fairly well.
Jonathan: So there's no, no movement to rewrite the whole project in Rust?
Stéphane: No. It doesn't, it wouldn't make too much sense on this one. Like like I, I, I mean, in general, I'm someone who, who's or doesn't really get the, why are we rewriting stuff that's been written 10 years ago and has been quite well maintained and fixed.
And, you know, if we look at the Lxe project, it's been in, it's been going through security audits by security firms probably a dozen times by now. Mm-hmm. It's gone through, God knows how many similar of code security research stuff with Google. We're pretty sure that the quality of the, the, the Lxe project is pretty good.
Now rewriting it in rust, you're gonna spend six months to a year or something rewriting it, where your best case scenario is nobody notices.
Jonathan: Yeah.
Stéphane: Like that's literally your best case scenario is like you didn't introduce any new bug, any new trends and nobody notices, but you've also not implemented anything people want during that time.
Jonathan: Yep.
Stéphane: Yep. So it, it's quite hard to justify, re
Jonathan: rewrite it, rewrite it in rust is become quite the meme in, in our in our circles. I think much much the same way that, you know, how are, how are you adding crypto coin to, this was five years ago.
Stéphane: Yeah. So I mean, like we, we are actually looking at adding some new side projects to Alexi for, for some stuff.
And like when we're looking at writing a complete new code base that needs to be low level talking to the camera and stuff, that's when like, okay, we should probably do this in rest because if we're gonna do something new from scratch now that needs to be low level and memory safe. Mm-hmm. That's a good fit.
Wasting resources, rewriting something, not so much. And I know, I mean, we've probably seen the news like, you know, open to switching to more rest. Detailss and stuff. Mm-hmm. And I, I just kind of came back from a kernel developer conference two weeks ago and we just for fun of our lunch, we're looking through that code and we're like, why?
Like why would you switch away from the good util Lennox that yes, the code is absolutely horrific to read, but from a kernel point of view is doing things right to something that is not memory safe, but uses the wrong kernel APIs everywhere. So you've got time of check, time of use, security flows in every single function we looked at.
Mm-hmm.
Like the, that that's utility Linux RS thing or whatever they call it,
the
mount implementation and some of the other common helpers in there. There is like, we found probably a dozen security f flows in the first function we looked at. Oh, interesting. Because yes, there's no memory safety issue, but that doesn't mean that you've not screwed up the Canada APIs and have race conditions everywhere.
So that's, that's a bit of an issue for sure. And it also kind of funny because util Lennox, it's not a demon, it doesn't run with Set your ID bit. Mm-hmm. It doesn't, it's not exploitable. Like even if it's badly written, like even if the C code had a bunch of memory issues all over the place, it's only ever runs as the user was running it.
So it doesn't matter,
Jonathan: huh? I if, if I remember correctly are, is it, is it core utils? Yeah,
Stéphane: I think Core Utils, yeah. Yeah, yeah. Core utils are, yeah.
Jonathan: I, I think we actually I think we actually had the, the, the Russ core utils maintainer on the show here, and I don't think he was calling for any distros to use the rust version.
No. If I remember, and this has been multiple shows back, but if I remember his, his comment on it was essentially. They were sort of doing it for fun.
Stéphane: Yeah, I mean, I, I I can definitely get that like, it, it's got some interesting characteristics like trying to run on multiple operating systems. That's interesting.
Mm-hmm.
It, it, it, it's doing some, they've got some interesting concepts on there, but this shows not just wanting to jump on it because this is rust. This must be better than see No because something is memory safe doesn't, by default make it better than what's there. Like, you need to actually understand what your attack surface is, what you actually want to guard yourself against.
And yeah, when you're looking like I'm, I'm a lot more interested in something like sudu, like yeah, Sudu. That one is a set binary that's used for privilege escalation. Mm-hmm. Yeah, that one, I could see the point of not wanting any potential memory safety issue in that thing. Call tails. Nah, not really.
So it, I, I think it's gonna be interesting, like, you know, like I'm surprised that Ubuntu seems to be jumping the guns scan on this one. I, I think it's the kind of thing that would make a lot more sense for some of the more fringe diss basically to pick up, like, you know, do a build of Alpine that's got all the rust stuff or do void or something like that.
Right. Right. Prove that it makes sense that it works well somewhere else before you pick it into the most used distro. Maybe it's just for the cliques that's quite plausible. I mean, the, the new VP of engineering a burner is a big Rust fan, so I'm not super surprised. John is, def has been pushing for Ghost for Rest stuff for quite a while now that he's got the VP title.
I'm not super surprised to have seen that change show up like a month later. But, you know, at the end of the day I can have to, to wonder like, is that really. A good change for Aun user? Should you have made that more, you know, optional and stuff like, do you really want that in the next L? Yes. I don't know.
Again, I, I would've, I would've picked some, probably some things before. Cote is like, so do 70 pretty high on the layer stuff. Like effectively take your system, look for all the secure ID binaries. Those should be the priority for moving away to the not see look at what demons are running out of the box on the machines.
Those would be good candidates for wanting memory, safety random details, not really.
Jonathan: So we did, we did talk with Sylvester La Drew back in July of 2024. That's episode 7 92. It was, it was a great conversation. He was not calling for this. Yep.
Stéphane: Yes. I mean, that's, that's another thing is like, you know, maybe talk to the upstream project to see.
You know, also on that, on their front, like what are they, what do they think the set of their project is? Are they happy with this being more widely used? Is this production ready? Do you have a story around security updates, security disclosure? Do you have, you know, that kind of stuff in place before A major distrust, which is over to it.
That's, that's kind of thing I would've expected to see. And I've not actually checked half hour long. They got into that way, Ubuntu. 'cause I used to be quite involved in Ubuntu. I used to be on the, the Ubuntu Technical Board. I, I'm still in Ubuntu, core developer. I used to be on the archive team, release team, a whole bunch of other things.
Mm-hmm. So I used to be someone who was in the way of those kind of massive changes. And normally for a change like that to happen, it would have to go through an archive review, which involves not only the legal review, just like. Copyright license combination, all that kind of stuff, but also a full code review by the security team and looking at how does the project handle security disclosure, how does, like, all that kinda stuff before we change a, a, a main component over.
Mm-hmm. So I don't know if that already happened or if it's still yet to happen, and maybe there will be some bumpiness at that point. But yeah, like I think rust is a good thing in general, but there's definitely the the usual kind of early adopter thing of people wanting to use it for absolutely everything that exists.
Before kind of focusing back to where does it make sense? What should, what should actually be useful?
Jonathan: Yeah. So I, I don't wanna put words in anybody's mouth. I did find a comment, it's from Ubuntu, and I think it, I think it must be from, let's see JNS Ruck. Is that?
Stéphane: Yes, that's John sga, John SGA uk John.
Okay. That's the VP of engineering at Canon.
Jonathan: Okay. So he did have a conversation with, with Sylvester okay. And they, they said that he met with Sylvester to discuss the proposal to make U Utils core utils the default in Aun 2 25 10. And he was pleased to see, to hear that he Sylvester feels the project is ready for that level of exposure.
So I don't wanna put words in anybody's mouth. Okay. That is the, that is the conclusion that they came to. All right. Okay. So
Stéphane: it looks, looks like I, well, that is, John had the discussion that, that, that was needed there and that's Sylvester fails confident that this is fine. So I guess we'll see where that goes.
Yeah, I'm, I'm, I'm sure we're gonna see a bunch of bug fixes resulting from that. If anything else, like, you know, that's the kind of thing where sk at some point, those kind of projects, if they, if they seriously want to be an alternative to, to the incumbent they need the exposure at some point so that people flood them with bugs and things get solved.
Like it's. Otherwise, you've got a bit of a chicken and egg thing. Like if you're gonna wait for the project that's just developed by, by one or two people to be perfect before it gets adopted massively. Mm-hmm. It's never gonna get there. So I, I will, I will say, I'll
Jonathan: say specifically and, and we, we can leave this tangent, but I will say specifically with core utils, there is a huge test suite and all of the different core utils implementations actually work together on that test suite.
And so they, they already have a pretty good idea of what works and what doesn't. Although there's nothing quite like being thrown into the fire of being the default on a, on a dis drew like a button too. So I'm sure there will be surprises.
Stéphane: Yeah. We've been seeing to see like a, like if some of the stuff that, that we've noticed, I, I was with five system developers who were looking at mostly five system APIs.
Mm-hmm. And the Linux can support something called O Path for a while now, which lets you effectively open a internal file descriptor to a path that you can then use everywhere else to do. Open something relative to that or reopen this path as something else or pass it to mount or that kind of stuff so that you never have the risk condition of, I've checked something nar called Mount or NAR called something and the file has been changed in between.
And from looking at the call tails, this was not done in there, which means that like all of the functions interacting with files, you could raise them if you were very quick in parallel at like swapping the file. The usual attack would be you swap the file for a same link pointing to another path and it's just kind of triggers to the same link and, and do and access that.
Mm-hmm. So that's one of the things that, that we've briefly noticed, but it was also like on someone's random laptop over lunchtime. So, you know, not gonna go into much more details on that. Like there's also a good chance it was, it could have been an old version, it could have been, who knows? Sure. But yeah, there's that's most, that's mostly to say like there's more to security done, memory safety.
Yes. And. And yeah, just, yep. Because it's rust doesn't mean it's better. Like you still needs to, to tick all of the other boxes and make sure you are, you're doing all your interactions with the US and everything else correctly and safely.
Jonathan: Mm-hmm. So if somebody wants to use ink is back to our main topic.
Stéphane: Yep.
Jonathan: What, how's, what's the way to do it? Can I, you know, can I do A-A-D-N-F install cus from my fedora or my Red Hat box? Is it available on a buntu?
Stéphane: Yeah. So we've got it basically everywhere at this point. That's another big difference with Lexi. Lexi was basically only available as a snap, which okay.
Not everyone enjoys it's, put it that way. With, with incur it's it's a lot more widely available. So, I mean, if I'm looking right now, we've got native packages in Alpine Arch Chime, Linux Debian Federa Gen two Ns Open, Rocky Linux, open Tovo. I think there are a couple more that people have just not contributed in instructions for.
Mm-hmm. So yeah, on federal you can DNF installing 'cause that will work perfectly fine. Same thing for Yeah. Most distros mm-hmm. On Ubuntu you can, because it's actively packaged in, in Debian and Ubuntu is still auto import from Debian, so they got it that way. The main downside is don't count on it being updated in Ubuntu after the initial import from Debian because nobody really seems to be pushing for that and I don't care enough to do it myself.
What I do, however, for both Debian and Ubuntu is that I've got my own package repository that has full builds of Inca on Ubuntu 22 20 0 4, 22 0 4, 24 4 Ian 11, 12, 13. And those packages gonna, they're, they're not your standard package where like everything is nicely split. They're more of the fat package kind, so they come with the right version of QMU with the right version of the firmware for EFI, with the right version of everything.
So that basically it has the optimal combination of everything and we don't have that kind of friction. Your kitchen, they get in these shows where, oh, the QMU version is a bit old, so some of the features are not gonna work because of that. Or like one of the examples would be the the EDK two firmware on a bunch of Distros doesn't support Secure Boot because the distro didn't build the secure boot keys in there.
Or I think another patch I've got for again, for EDK two is. By default, it leaves you a second to hit the boot menu, which basically no human being can do. So my package has that bump to three seconds and some distros have done similar bumps, some haven't. Mm-hmm. So that's why there's a bunch of differences between Distros, but that's what newly happens when you install open source software on, on Distros.
So yeah, that's the main, that's the main way. Someone also, and that's mostly for fun, but actually works surprisingly well. Someone also made a Docker package for Inca. So you can install a dock, you can run that Docker container, mostly turning off all the security features Uhhuh. But you know, if you can run that thing and then you would get Inus running that way, which Incus Yeah.
Surprisingly enough works. And then you can in, in
Jonathan: Docker
Stéphane: basically, and then you can use Doc, then you can use Incus running Docker images on top of it, you know, just to, to make things confusing. Wow. So, so that was mostly a bit of a workaround. I think it's, it's a snap style work around effectively for like any other distro.
That, that will have Docker but doesn't have a native version of Inca. Hey, you can use that, that to, to make it work. We're also working on, on couple of other initiatives around Inca to make it easier to install. So we've got Incas Deploy, which is a set of Terraform and Ansible files to deploy in cost typically at scale.
So if you want to deploy a cluster with, with oven, everything secured properly with a PKI, you know, that kind of stuff that can do it very easily. You basically, it's one YAML file. You just give all of the machines you want, watch what services you want on them, go and it deploys it. Mm-hmm. And that works on open to Deion and CentOS.
And the other thing I've been working on mostly myself for, for the past couple of months now is Incurs os. So that's a lot closer to your VMware ESXI or promo type approach where we build a full OS image. So we're using System D Micros I to make a. Os image that's based on Deb end, that's completely immutable, fully signed TPM measured all of the security features effectively.
Jonathan: Mm-hmm.
Stéphane: So that you can run that on your machine and know that there's like effectively no way for it to have been modified in any, any way whatsoever. And that if only one tries to hack your machine in any way, it'll break the encryption effectively so that you can't access the disc anymore. And it's gonna be very obvious that something bad happened.
So we've got that and we see two targets for this project. One is people wanting to run a home lab at home and don't want to run too much Linux. They just want to acquisition platform. Mm-hmm. That's convenient. That's kind of the mox use case. Deploy that on a few machines. You can access the web ui, you can just click around things and get your containers, VMs, network storage, everything sorted That sorted out that way.
The other thing is, our large customers, when you're running in the thousands of servers. You really don't want them all to run some version of open tool centers or whatever and have to apply updates on thousands of machines. Right. And then as everything hap as it always happens, even if you automate your updates, machines will somehow diverge.
You're gonna end up with like slightly different files and stuff over time. It's, it's a mess. So those environments, very much like the idea of like, this is completely locked down, it's identical bit for bit on all the machines. Mm-hmm. Updates are done with like an ab partition scheme. So that's, you don't know the update in the different slot.
You reboot anatomically into the new slot. If something goes wrong, reverts onto the old, the old slot. That works really well. And it's also like another thing that's annoying with large companies is compliance. They like running, scanning software that in theory tells them what security issues are.
Mm-hmm. Usually those software absolutely terrible, they slow down the machine and they don't know what they're talking about. So imagine that our fix for that is that there's literally no way to get a shell. And the entire system is read only. So we don't have SSH on Inca source. You can't SSH into it.
There is no local shell on the, on the machine itself. So even if you physically go to the machine, you can't log in. There's no shell. Hmm. The only thing it exposes is the rest, API to manage it. That's it. And if you want to actually do a full security audit for compliance, what we tell you to do is download the image onto another machine, unpack the image, run your scanner stuff there.
Jonathan: Mm-hmm.
Stéphane: And then you'll know that because the image is guaranteed bit for bit identical on all your machines. That's what's running on your machines. And don't run scanners on thousands of machines are gonna slow them down and like every time you open a support. Question or something. We're gonna first have you turn off the scanner and wipe all that stuff out to make sure it's not what's causing issues, because
Jonathan: Yep.
Stéphane: Yeah.
Jonathan: Yep. That's a, that's a, that's a hard earned lesson. I can, I can tell.
Stéphane: Yeah. I mean, I, yeah, like one of my pastime is I'm, I'm the VP of Engineering for Cybersecurity conference in Montreal. So I, I, I've unfortunately talked to a lot of those vendors dealing with Yeah. Security scanners. Security tooling and security reports.
And it's, it's all yes. Yes. It's, it's, it's, it's, it is, it is really a mess. And like best case scenario, they're harmless and they just keep reporting security flaws that don't exist. But they have also a bunch of them that are very much not harmless, that are actually like causing either massive CP usage, massive memory usage or street modified stuff on the system and cause a bunch of damage.
So, yeah, the, if I can keep those far away from the production systems, that, that's
Rob: would be nice.
Jonathan: Understood.
Rob: So you brought up a Prox box as a comparison to Incas. Yep. How, how does that compare? Like what kind of person would maybe pick one over the other?
Stéphane: It's surprisingly similar, yet the communities are completely distinct, which, which I'm not really sure why.
So I mean, prox Marks uses Lxe for containers. We've got fourth Gang Boot Metal as one of the droppers at Prox. Marks is one of the maintainers on the Lxe project. So
we
are quite familiar with what they're doing, how they're doing things. And on the VM side, they also use QMU, same thing as we do.
They have a ui, same thing we do. They have a way to cluster systems, same thing we do. So it's extremely similar. The, the main difference I think at this point is how do you install it? Like Mox is taking the. Linux distro approach of like, there's PX marks va, you don't load it to install it on a machine and it's running PX marks from that point on was up until now, incur is for people who already like to run Linux distros and then want to install something on top of it.
That's where Incas West might make things interesting because we're effectively stepping into that game now of like, well, we also have a pre-installed image thing that you can just dump on a machine and use. I mean, we don't have it yet. We've got built on GitHub that we can play with. I would not recommend putting that on actual machine right now, but in a couple of months we definitely will.
At which point it's gonna get a lot closer and it's gonna be, I think mostly a matter of preference on how it behaves, I guess.
Mm-hmm.
PX marks it's UI and everything to me feels a lot more like VMware in many ways. It's, it's really kind of modeled after it. Whereas Incur is a lot more modeled after the public cloud.
We, we use a lot of the same terminology around projects and that kind of stuff. We've got. Multi-tenancy. We've got like all of the virtual networking pieces are there to, with a CS and that kind of stuff. The same as a public cloud. We do object storage. So Incur is, is a lot closer to the public cloud world.
We also have like a full Terraform provider, so, and Ansible plugin. So for people who are used to to driving cloud deployments through from Ansible, you can do it the exact same way against thinkers. So I think that's currently the main kind of split is that it's the cloud crowd versus the VMware crowd.
The problem is that the VMware crowd is being pushed into the cloud right now 'cause they can't afford VMware anymore. And Hom Marks could have been a good alternative effectively to VMware in that, that situation. But that's not what I've seen large. And at this. So think for the SMB, maybe like the small medium businesses, they might be going PX marks.
That would make sense. It's mostly the same way to operate it. They've got some migration tooling. I think that works well. But for larger scale environments, like, I mean, I'm mostly dealing with Fortune 500 super large companies, but those guys, they, when they look at Prox marks, they see a company that's really focused towards SMBs and home ladders and don't really have clear vision or focus for how to run a full multi-region private cloud effectively.
Mm-hmm. And so they, they get discounted pretty quickly based on that. Whereas Incas is already used by some to run public cloud environments. So we, we know that we can run pretty large individual clusters of like anywhere from like 50 to 100 plus servers, and then usually have, you can treat that as an availability zone.
You can have multiple ones of those inside of a region for like a small region. And for environments that needs to be much, much larger like the public clouds with. Tens of thousands of servers per site, then you just create a lot of a lot of clusters usually based on like your hardware generation or that kind of stuff.
And each of those clusters maybe like have 50 to hundred 50 each servers. So that gets you basically, you can make it so that it's an aisle in data center, you can do that kind of mapping easily enough. Mm-hmm. And yeah, we've, we work pretty well at that kind of scale. And then again, you can use things like Terraform to then easily blast changes at all of your clusters in, in one shot.
So you don't need to keep replicating same changes over another again. So yeah, that's kinda the, the, the thing we've seen is that currently SMB Home Lab tends to go oxs, cloud folks tend to go InCast. That's mostly what what we've seen. It's gonna be interesting to see what happens within Qs os and whether we end up picking up some of the Mox folks at that point.
Jonathan: Yeah. I'm, I'm super curious. What's the experience right now for trying to use. Trying to use Incus in the s and b and the home lab. Like is there a, is there a GUI to, to be able to do this? You know, because I'm, I'm used to using, like I said, lib Vert, so I use Vert Manager. Mm-hmm. And you know, that's handy.
You can, you can just click, set up a new vm. Is there something like that with Incus or is it all command line driven?
Stéphane: Yeah, so that's incus. So officially, as far as the Inca project is concerned, we don't have an official web ui. We just offer hosting any web UI that you want. And there's like probably a dozen or so on GitHub.
Jonathan: Oh, okay.
Stéphane: Now, in my own, in the packages I produce, as far as my company, I package what's called Incas ui, which is actually a rebranded and now pretty heavily patched Lex D ui. And that gets you like a full a, a full UI that you can play with. You can create instances, you can access like your VGA console on Windows and stuff works just fine through it.
You can do all that kind of stuff through the UI and. Y Yeah, like we, that's pretty easy to install. If it's installed on your system, you can literally run Inca web UI as the command, and then it just opens the web UI in your web browser and you can play with it from there. Mm-hmm. We also have a online demo service for Incas on our website.
So you can go on our website and just try Incas online, which gets you access to effectively a VM running incur that you can play with for 30 minutes. And that's lets you, so there's a walkthrough terminal walkthrough to discover KU Works, but there's also the link to access the web UI for that that test environment.
So you can play with the UI instead if you want to.
Jonathan: Mm-hmm.
Stéphane: I think the, the, the, the main kinda missing piece maybe compared to prom as the UI at this stage is that assembling the cluster itself, that is not something you can easily do from the ui. Yeah, yeah. Once the cluster is assembled, you can, but you currently, you are better off using something like InCast deploy even just like three machines in your home lab.
To get the cluster correctly deployed with oven, with staff. Mm-hmm. All of the security bits and making sure that everything is nice and safe. That's gonna change within s UI because within cus os because Inca OS is fully read only. Mm-hmm. So we can't have Ansible deploy random stuff on it. Right.
Which means we we're moving to a different model for how to run those kind of complex services, likes and oven to instead of running them bare metal on the machine, at which point, if you're running a cluster of like 10 machines, you've got like three special snowflakes that run the controllers, fors and oven.
Mm-hmm. That's never ideal because what happens if like those three go down or you're evacuating them and you forgot that those three are the, the critical ones, which happened to me before. Yes. Well then you've got a bit of a problem. You're like, oh crap, I couldn't, I couldn't take down those two machines 'cause now I don't have a chrome on safe or another anymore, and now my storage on network is down.
Yeah.
So we're actually changing that with the work on InCast Os to run those kind of API like storage and, and network APIs as containers on top of Incus cluster itself so that they, they can then move around if we evacuate the machine or if it goes down or something, it'll just be spawned back on any of the others.
Keeping thing a HA obviously we need to be careful of the chicken and egg problem. Mm-hmm. Which is we can't have the SEF control plane be stored in an instance that's stored on sef, obviously. And same thing for, for the oven control plane, you can't do that on a virtual network. You need to have, so we're effectively making it so that you, we will have what we call a project internally.
So that's kind of how we do our multi-tenancy. Mm-hmm. There will be like an internal project that's used to, to run those kind of workloads. And that one is gonna be on local storage local network so that it. It doesn't have any of that kind of complexity and, and problems with like the chicken and egg issue and that that will fix that.
It'll also let us make it easy to deploy additional services because Incur can do open ID connect authentication can use open FGA for fine grain authorization. It can do all of the metrics stuff with Grafana, Prometheus, and Loki. And currently it's gonna up to the user to have deployed that somewhere if they want to use it.
And then configure Incas. What we wanna do is that with ware, it should be as easy as just like enabling some features and then we will spawn those containers for yeah, for Grafana, Prometheus, low key open, FGA key lock, all that kind of stuff for you, and integrate with it so that it just works. So the idea is that like you within os, once we're basically at the end of our vision you should be able to put it on a machine.
It'll give you IP address, you go to the web UI to the IP address. You can then either just configure in, configure it for standalone use or as part of a cluster. Mm-hmm. And then you can enable the bits you want. So if you want oven and surf shock, enable that, you get the control plane for that being deployed.
If you want to have all of the metrics gathering and everything, you can turn on the observability stack and that's gonna get deployed on top of that Inca cluster two. And then same thing for authentication authorization. If you want to do, have like fine grade control over the cluster, you can do it that way too.
Yeah. So that's basically the goal, which should at that point be very appealing, especially for a crowd that's more used to Yeah. VMware or pox or something. Mm-hmm. Because that will be like literally you'll be able to, to run in custom from your cell phone if you want to. Right. Like you, you start the machine, you just need the IP address.
And then from there on in, you can do everything through, through the UI if you want to. While we also make sure that everything is still easy to do 3D API and the CLI. 'cause every time we do more things in the ui, we always have a crowd that's like, please don't I want my CLI. It's like, yes, we know we always do the CLI first.
Yes. Because that's how we do testing. And then we UI afterwards. Anything the UI can do, you can already do it to the CLI. Maybe you just haven't found it yet, but you believe me, you can do it to the CI.
Jonathan: Yeah. Yeah. Is, is Incus os going to be fully open source and freely available or Yeah. You guys thinking about a freemium model?
Stéphane: Nope. Completely open source. So Incus OS is on, is on GitHub today. I've been mostly developing it through YouTube live streams. Actually. I, I tend to do live streams on, on Thursday. I've been, I've been doing a bunch of development on that that way. Mm-hmm. With the other company I've got that focuses to Fortune 500.
We will effectively have a rebranded version of Inca Os that's called Hypervisor os. Mm-hmm. And the main reason for the rebranding is simply we want a build with a different secure boot key that the customer can trust, which is then just that build. And they can't be tricked into loading any of the random build from GitHub.
But other than the key and the name that shows up on the board screen being different, that's basically gonna be it. The idea simply for that company being, if you're using hypervisor os and the other components coming from that company, we'll know that you will have paid the relevant licenses so that you get access to support.
And that's, that's effectively the model on that front. But Sure. All of the soft, like even for people who do pay for the future version cloud paid offering thing, they get the exact same open, the exact same bits as if you just consuming it from the open source side. The main difference being that you're using the build.
We built, we tested, and we are supporting. Yeah. Awesome. And like, and the reality is that, like, looking at the market right now and looking at how things work, there's really no need to go through the open core model or any of that kind of stuff. It's, it's, the open core model is extremely good at pissing off your community because they never know whether, whether a feature is gonna remain open source or it's gonna go on the, on the page one.
Contributors are also never really tempted to contribute because they don't know if the echo isn't gonna get just moved into the, the other version, you know. It, it, I just don't see it. And also kind of selfishly from like an open source, maintaining now point of view, it's painful enough to maintain one open source project.
I don't want to have to deal with the, the open source and the paid version at the same time. Yeah. Like, that's, that's just, that just seems like a lot of work.
Rob: So Anca os won't have a subscription nag that holds back updates like proxim Max? Nope.
Jonathan: Oh, that's,
Stéphane: yeah. And, and the, and, and the, and the way it's all built currently, it's literally built through GitHub action so we can, I mean, it's using like a, a local runner because we need something much larger. So I've, I've effectively have integration to use large Inca VMs on some of my clusters as cub runners.
And that's what we use to, to, to run to build and test all that kind of stuff. Mm-hmm. But yeah, like it, this will literally be able to put a schedule so that every day a build just happens. And then you put it or not, depending on if you want it or not, like the, the daily bills, you might want that.
The stable update channel will be slightly well updated, slightly less often. That being said, we're gonna have to follow at least the channel, release it as far as security updates and stuff. So you can probably expect a weekly update, basically. Mm-hmm. Now whether you want to install it or not, that's kind up to you.
I think a lot of folks update more like once a month, but these days just like a new point release of the channel coming out every week and it always has a bunch of CVS listed, so it basically needs to, to spin up a new image every week.
Jonathan: If we had more time, I would talk to you about the kernel CVEs because that is a fascinating subject in and of itself, but we are, we are over time already.
So I wanna get in a couple of last, last questions for you. And the first is, is there anything that we didn't talk about that you wanted to make sure and let folks know about? And I know we've covered a lot of things.
Stéphane: I mean, may, I might just go over briefly what Inca actually does because we didn't touch too, too much on that, actually.
That's true, that's true. So yeah. Inca is effectively a private cloud platform. Hs u run CONT system containers. So that's full Linux, TROs, HSU Run virtual machines industry run OCI containers, application containers. We've got an image server with prebuilt images for just about every single Linux distro.
You can think of most of who, most of which we have available as both VMs and container images. Some just as container images, like if you want to run open the value rt. Currently it's container only. Mm-hmm. There are few other that like that, but that means that unlike if you are used to ver you don't need to attach an ISO and go through the install process.
Us. Like you can literally say, I want 2 24 for vm, and we'll just pull a print image and start it for you, and you're gonna have it up and running in less than 10 seconds. So that's, that's a lot more cloudy in the feeling again, where like, it's a lot more what you get on the public cloud. Mm-hmm. As mentioned earlier, like we do clustering we've got a lot of storage features anywhere from like attaching extra discs, file systems, sharing file systems object storage, all that kind of stuff.
On the network front, we do anything from. Physical networking with VLANs, BGP. We've got a builtin D server we've got, and then we go onto the software defined stuff with oven. And we support all kind of pass throughs. So like PCI pass through GPUs GPU slicing, all of that stuff is all all supported.
So even if you want to do like AI stuff you can do AI stuff.
Jonathan: There you go. Fill that box in on your on your Bingo card. We talked about it.
Stéphane: Yep. There you go. Don't really care much about that, but a lot of people do. So lot people. Yeah. You can do it right now.
Jonathan: Yes. Alright. You mentioned your YouTube channel.
Are you the Zli on YouTube?
Stéphane: Yeah. Yes, that's me. So my, the name of my company private. My, my self-employed company is Zli. So yeah, that's the, that's the account on YouTube. Well, I usually do at least a video for every in release. Then I usually live stream the, the other weeks on mostly Inca West lately.
A lot of them are in case West related at this point. Mm-hmm. And do some occasional other things. So one thing maybe worth mentioning is that starting with the next Trona release that's coming up later this month. Mm-hmm. Inca will be part of true now and will be have virtual machines are being run on.
True now. Very cool. So I've, I've published a video just kind of talking at that, seeing how that works. So that's gonna bring us a bunch more new Incas users too, because a NAS is often the only server that, that people have Yep. At home or in their business. So having the ability to run virtual machines directly on that using InCast would be pretty neat.
Jonathan: Yeah. Very cool. Alright, so I'm required to ask you two final questions before we let you go. And that is, what's your favorite text editor in scripting language?
Stéphane: But text editor is vi and scripting language, I don't know. It, it, it moves a lot, I guess these days. It's probably good old fashioned shell and not bash like actual Shell.
Actual shell, yeah. So like yeah, deposit type. But I also used to do a lot of Python before, so, but I've not done much Python lately. I, image Server Logic and whatnot is written in Python, but it's mostly maintenance for the past year or so. So I think I've been doing more. More shared scripting than I've done anything else.
Jonathan: Awesome. Awesome. Alright, Stefan, thank you so much for being here. There was absolutely a blast to hear about Incas and LXC and the history with LXD. We'll, we'll have you back here after a while and talk about how the, the Incas Os release is going. But for today, man, thank you so much.
Stéphane: Yeah, thanks.
Looking forward to, to coming back and thanks for this.
Jonathan: Yes. All right, Rob, what do you think? Have we convinced you to get rid of that old prox Max install and go over to Incas?
Rob: I, I'm definitely interested to try it out. You know, obviously I've been falling Lex d and Incas a little bit. For quite some time now, and I've, I've known what they were and what they did, but hearing this talk definitely gave me a lot more insight and desire to really want to at least throw it on one machine and check it out and give it a good comparison.
Rundown.
Jonathan: You could probably install Incus inside of Prox Mox as a VM and play with it that way. I really
Rob: cross crossed my mind, but I, I, I would need to do nesting too, I guess if I really wanted to get the. The full test.
Jonathan: Yes, yes. But that's the thing most hardware can do these days. Goodness. Yeah. Which is kind of nuts.
But yes, they made it so that actually works. Alright. We do not yet have a guest for next week, so if anyone listening has an open source project and wants to be on, let us know. We've got a couple of openings coming up. And then we are eventually talking with Kuai, who is one of those lytic security projects.
Hopefully they're one of the good guys and not one of the ones that Stefan was talking about earlier. But that'll be super interesting to hear what they are all about. Rob, do you have anything you wanna plug before we let folks go?
Rob: For those who want to know more about me, you can come to my website, Robert p camp be.com, and on there you can find links to my LinkedIn blue Sky.
Yeah, all those social media sites and you can come connect with me there if you want.
Jonathan: Yep. Awesome. Alright, I appreciate it. Thank you man for being here. If you wanna follow my stuff, of course over on Hackaday, it's where the Home of Floss Weekly is and we appreciate that. It's also where you can find my security column talking about all those Linux CVEs and CVEs all over the place that goes live every Friday morning.
And then Rob and I do have the Untitled Linux Show with a couple of other guys over at twit tv. And you can get to that, it's twit tv slash uls and we'd love to have you there as well. We appreciate everybody that's here, those that get us both live and on the download. We will see you next week on Floss Weekly.