Jonathan: Hey, this week Dan joins me and we talk with Pádraig Brady about Coreutils, that software that pretty much everybody runs, whether you realize it or not. And because of some reasons, it is one of the most conservative development processes we've ever talked about. You don't want to miss it, so stay tuned.
This is Floss Weekly, episode 797, recorded August 20th. Don't RM RF up the tree. Hey everybody, it is time for Floss Weekly. It's a show about free, libre, and open source software. I'm your host, Jonathan Bennett, and we've got a fun show today. We're gonna be talking about core utils, and no, that's not a typo, not a mistake.
We did just talk about core utils, but about three, four weeks ago, we talked about Rust core utils. And this week, we are talking about the OG, as I like to say, the the original Core Utils. Still, still a project. It is not just me, though, of course. I've got I've got Dan the man, Method Dan, the original Linux outlaw.
Hello, sir. Welcome.
Dan: Hey, it's good to be here. Thank you, Jonathan.
Jonathan: Yeah, it's always good to have you with us. Dan, I suspect that you sort of have a clue about Core Utils. You've at least used them a few times over the years, right? Right.
Dan: Oh God. Yeah. I was just gonna, I was just thinking earlier that I would, I would bet that anybody who uses Linux as their operating system has, even if they don't know it, interacted with the projects we're going to talk about today.
Jonathan: Yes. Either directly or indirectly. Right. And, you know, there's, there's several different implementations of core utils because you've got, well, you've got the rest core utils, which is fairly new, but you've also got things like BusyBox that also includes a bunch of the core utils, but with a different code base.
And so they're, they're just, they're ubiquitous. I mean, there's probably, oh goodness, there's, if you, if you count the, the other versions of it, there are probably billions of copies of core utils in the world. And that's, that's, That's that's something not everybody can say that that they've got that many of their binaries running around.
All right. Well, we've got we've got the man We've got paw rig and he is I believe the corp the the head maintainer We'll have to ask him exactly what his title is but he is he is the man in the core utils Project that's Parag Brady and let's go ahead and bring him on Parag. Welcome, sir.
Padraig: Hi everybody.
Jonathan: Hey, it is great to have you Great,
Padraig: great to be here. Thank you very much.
Jonathan: All right. So what what is your official title as it were? I I believe you're the head honcho, but that's probably not what they call you
Padraig: I wouldn't say that. It's open source, so there's no real official title, but I'd say if you had to give me one manager, it'd be release manager.
And I'm one of the, one of the maintainers. There's a, there's a few of us.
Jonathan: Okay.
Padraig: Okay. And I've been contributing for the, to Coreus for the last 22 years.
Jonathan: 22 years sounds like a very long time, except Core Utils has been around for quite a bit longer than that, hasn't it? How long, how old is the project?
Padraig: Indeed, well there's I guess it was originated around 92, I think, it's when, well, there were original it was a bit separated into file utils, text utils, and shell utils, and then a little while after that it was amalgamated into a single Core Utils project. So that, that was done by Jim Meyering.
Jonathan: Now, the, the, the, like the original, original core utilities, some of these utilities go all the way back to like Ritchie and Kernaghan, way back in the day with the original Unix. Is there any shared code base?
Padraig: No, it says that the GNU says that it's a part of the, GNU thing was to be a kind of a complete separate implementation.
So, so, so they were implemented separately, the GNU source space is completely separate. But, but of course, like there's, there's a huge focus on compatibility with those older utils and with other systems. And I guess at this stage, the, the, the, the, the, the, the, the, the, the, the, The, we've been developing them so long there is an onus to be backwards compatible with ourselves.
So that's a, that's a, that's a huge concern.
Jonathan: Yeah, that's interesting. So I mentioned in the pre show we had we had the Rust Core Utils project on and that, that was one of the really interesting things. So, so first off. There is, there's, there's not animosity between the two projects, but in fact, you guys have cooperating on some things and one of the really interesting places of cooperation is in writing sort of this shared test suite that determines that you know, all of these tools do the same thing, no matter which implementation you're looking at, which that really interested me.
Padraig: Well, absolutely. Look at the implementation is secondary after the day. At the end of the day, it's the interfaces with users, and that's, I guess, encoded in the test suite. So we put a huge emphasis on the test suite. And it's something personally I've been focused on all along, like, like any kind of changes or patches we do tend to be mostly focused on.
Most of the effort is actually put in, put, put into testing and writing tests for, tests for patches. So in regards to the ROS Coriatos, I actually, like a few years ago, I noticed with interest the Rust coreutils project and suggested that they could easily tie in the coreutils test suite because it was kind of written like that.
So the Rust test driver, they essentially put coreutils earlier in a path and just call our test suite. And it automatically pulls in the Rust utility. So, so, so we write our test suite to be with portability, portability in mind. Because we have to run our test suite in a lot of places. So Rust gets to, the Rust core utils gets to take advantage of that.
Yeah. And just to mention, it's worth mentioning that the, it's, it's a two way thing. So, so sometimes, Rust folks, when they're implementing new utilities, they might notice bits that we haven't tested and they'll supply patches to us and we'll implement those. So it just verifies our code is working as expected.
Oh
Jonathan: yeah, no, that's neat. And I've found that test suites like that, what they're really, really useful for is when you make changes, They, they sort of help you guarantee that your changes didn't accidentally break something that you didn't think about when you're making the change.
Padraig: That's, yeah, 100%.
Yeah, it's brilliant. Yeah, we put a huge effort into the performance of the test suite. Like, you can run all the tests Say on a standard laptop in about 50 seconds, but the nice thing I've run it on really, really, really fast machines and it automatically scales up so you can run all the tests in five seconds.
That's impressive. 100, 000 machines. But it's, it's not, it's not nice that it scales up and it kind of I guess it shows. The advantages of the Unix model, because we, we tried to rather than saying, writing the test suite in something separate, like, driving it with Perl or Python or something like that, we actually, it's mainly shell scripts, so we were kind of reusing the model and reusing all the tools while testing, writing the test themselves, and it shows that, that's one of the nice things, as well.
So, But having separate processes as you do separate things, they automatically scale up to multiple processors. And so it nicely scales up automatically using the standard Unix model.
Jonathan: Yeah. So a question that kind of naturally flows out of this conversation is, what about the, like, the original Unix version of these utilities?
Is there any cross pollination with Unix? Any of those, like, I guess we might be in a place where one of the original Unix is, somebody's version of Unix uses core utils rather than the old Unix code base. In fact, that's probably likely. And then I guess also the original Unix version of the core utils.
Do we run those through the test suite to see what they do?
Padraig: It's yeah, well, I haven't done it myself. It's an interesting question. I'm sure a lot of them would break, like the test, like ROSCore, you tell us, is trying to be compatible with the latest GNU upstream. So there are new options and behaviors really there.
So, so I guess a lot of the older, older variants would break in most ways against the, against the existing test suite. So.
Jonathan: Yeah, yeah, that's interesting. Do you know, does anybody's Unix ship core utils rather than the old code?
Padraig: I don't think so. So the main focus with core utils is Linux. And essentially these days Linux is ubiquitous.
So that's, that's generally what everybody really uses, like in the large tech companies. It's, it's all Linux internally.
Jonathan: Unix itself is kind of dead these days, isn't it?
Padraig: Pretty, pretty much. Yeah. It's, and that kind of gets on to the kind of the portability aspects of it. It's still very important.
Less going forward, as we said, because things are more and more focused and more and more consolidated on Linux. But there's still a lot of portability concerns with compilers, different compiler options, different shells. And we still try to be as portable as possible. And when you keep as portable as possible, keep your code as flexible, I guess, as soft as possible.
And it keeps the interfaces true and separate and good.
Dan: I mean, Parag, you mentioned there that the portability and stuff, and Jonathan said nobody really uses Unix anymore, which I imagine would probably upset a few people, but I'm not much of an expert, but I'm wondering, does BSD, do people ship core utils with BSD at all?
Padraig: Yes, so that would be our main other Unix portability target, and we have full, personally, I have access to, like BSD, we have access to free BSD systems, and kind of implicitly through macOS that they use kind of free BSD interfaces. I was
Dan: interested in Mac OS, I was going to say, cause you've got Darwin, which is the the kind of the, the, the base under Mac OS and they must use core utils, I would imagine.
Padraig: They actually try, like Apple tries to steer clear for GPL three reasons but, but there are a lot of users of core utils. It's in homebrew, for example, so it's easily installed on Mac OS. And for, for a long time, the, the sort. For example, the sort that was used in FreeBSD and MacOS was actually the GNU version, because that's actually quite difficult to implement and stuff like that.
But, yeah, getting back to testing, like we do kind of put in order to make sure that we have full tests passing on MacOS and FreeBSD.
Dan: That's awesome. Now you mentioned that you've been working on this for, you've been involved with the project for 22 years, which is impressive. I'm always really intrigued in how people got started in these things and how they got involved.
So how did you come to get involved with, with the project? How did you get interested in it? Was it something you were, it was computing something you were into as a kid or, you know, was that, how did all that come about?
Padraig: Well, not as a kid, I'm not going to say my age now, but I hadn't access to a computer until kind of midway through college.
So I started very late from that point of view. But I started first also with Windows, which is interesting. And I was using that through college. And then Went into industry for a year or so and got a bit frustrated with the whole black box nature. It come to a problem and then it was really difficult to actually fully, fully fix things.
So you you're working around issues more and more rather than actually fixing core issues. So then I happened upon it. Linux was just starting out at the time. I guess that's the new thing to my age. And, um, at that stage then, it, it, it's something that, that resonated with me. I tinkered around with it for a couple of years and at that stage then I kind of the Unix model really resonated with me.
And I saw, like, there was a notice that some of the core utilities would they'd benefit from some new options, and so I proposed a few patches around 22 years ago, and they were accepted. And ever since then, I became a kind of an official maintainer, maybe about, I suppose, 16 years ago now.
Dan: Wow, that's amazing.
I always love how people get involved in these things because I think one of the great things about the kind of free open source software world is how, without meaning to be cheesy, open it is, you know, in that you can get involved. What were the challenges in kind of getting involved? Was it, what did you, were you nervous about maybe submitting a patch or and getting, you know, rejected in some way?
Padraig: No, I was very excited. I remember my first patch, how naive I was. I was, I was worried about, well, the code was okay, but I was worried about someone was going to come up with the idea. It's one of the funny things to look back on. I thought someone was going to, it was such an obvious thing. I thought someone was going to come up with the idea and I was mad to get the patch in before anybody else did.
But sure, of course, even there's an infinite amount of code. code to write and an infinite amount of ideas, so that's never a never an issue. So, so that that's one thing I would say to people is even if something is implemented already, there's always a way to do it better so. Always err on the side of sending in the patch, and generally people involved in these projects are more interested in the tech itself and are very interested in incorporating new people and new code into the projects.
Dan: Yeah. And a question I have to ask, it's the kind of dirty question in the room, but I'm really interested is have you managed to get paid for working on core utils? Has that been like part of your job at any of it? Cause I know you've worked in lots of places.
Padraig: Sure. Not directly but I wouldn't be working where I was without having worked on your core utils.
Let's put it like that. So, so there are a lot of it's, it's a good thing on your CV.
Jonathan: Yeah, yeah, no joke, and they say,
Dan: I know you've contributed to it. Sorry. Go on, Jonathan.
Jonathan: I was just going to say, they say with the kernel itself that it's like five, landing five bits of code. Landing five pull requests in the kernel is the average that it takes for someone to get a job offer as a result from it.
Like there, there are a few places we're contributing open source code. It's just. Excellent for your career. And I imagine something like core utils is going to be on that list.
Padraig: Yeah, it's just, well, I guess anybody who's used it as a set at the start of the show, anybody's has used Linux at all has either used them directly there and everybody knows, knows about them.
So it's it's it's, it's good for the CMU as we say. I've been very lucky over the years, really, to have been, I feel very lucky to have been involved with the project and it's still very interesting and rewarding going forward, so it's all good.
Dan: Yeah I'm interested in how, I know this is probably an obvious question, but I'm interested in how the project's managed.
You mentioned that you've got, obviously there's a few maintainers involved probably quite a lot of maintainers I would imagine. How does it compare to something like, like the kernel project? Do you, do you manage it all through Git and, and the releases and all the patches and everything else?
Padraig: Yeah, so we moved to Git fairly early on. Patches are managed with a mailing list still. The number of maintainers, there are, I guess there are three or four kind of central core maintainers that work on it over the years. There's a separate project, so kind of focusing on the portability aspect, there's GnuLib, project, which is probably slightly a bit more active, and there could be, say, 50 projects reusing the portability code that is abstracted away or encapsulated away in the GNU Live project.
And so, so, so, but, but GNU Live was kind of originated as core utils. So they were looking at it simply, there were lots of if defs in the core utils code, and that was gradually moved across to a separate project to keep. So GNU Live presents a GNU interface everywhere, and then it allows the actual projects using GNU Live to be a lot cleaner, and just as soon Like a new interface is available.
Jonathan: It seems like code bases just grow if def statements over the years. It's just a natural part of their development. Especially
Padraig: something trying to port to every Unix and every compiler in the world. You have to be especially careful.
Jonathan: Okay, so, with a project that's been around as long as Core Utils has, and with sort of a, in some ways, a frozen specification, What's it, what's it like to work on core utils and Are, is it, is it done?
Is there a place where it's going to get to be done and what does it look like? And so I guess, I guess really what I'm getting at is I asked that question very tongue in cheek, but what I'm getting at is what is it, what does it look like in core utils to make changes? And I assume it's a, it's like a very conservative process to very slowly make changes and to do it very intentionally, right?
Padraig: Well, yeah, you have to be, something that's used so ubiquitously, you have to be very careful. So there's the whole, as it was, mentioning earlier, the focus on testing. So, that kind of handles that. But we have to be very cognizant of the interfaces. And also the, we have to be cognizant of our interfaces with the community.
So, there are lots of requests over the time to add, over time to add this option and that option. And a lot of them are good suggestions, but they're not just appropriate for adding, because the equivalent functionality might be in a separate tool might be slightly dangerous. But we do take and we do make new additions ourselves over time just to add new functionality, and port to new compilers, and new architectures, and enhanced performance, and portability, and all that.
So there's changes all the time. But just getting back to engaging with the community, that's great. Like, like that's one of the things. You have to be especially cognizant and careful about that with an open source project. And one of the things we've done, for example, is we've maintained a page of rejected requests.
But they're very carefully considered and curated. And it, like if someone comes in with their with their suggestion, and we carefully consider it but reject it, we may add it to that page. But they can also see. If it has, like often, the same suggestion has been made multiple times. They can also see similar, uh, similar suggestions being rejected with very careful, carefully considered reasons given.
And so they don't feel I guess alienated when we, when we give feedback like that. But, but, but we're definitely very open to, to new features if they're appropriate. So it's just, we have to be just careful of that. Backwards compatibility with ourselves the compatibility going forward and just being, I guess, generally cognizant of the Unix model and just being true to that and keeping things appropriate.
Jonathan: Yeah. Let's see. Oh, progress bar. What, what's the, what does it look like when, when someone sends in a request? And let's just take the progress bar as an example. What's the process look like? Like, do the maintainers vote and say, you know, three, three out of five say, this is a bad idea, so we're not going to pull it in or just That's essentially it.
Padraig: Yeah. And it all happens out in the open. The, the, the important thing is it happens in the open on, on, on the mailing list. So we give reasons why it mightn't be a good idea. That's An interesting one. That's one of those 50 50 ones, which is probably a good idea, but it's also implemented already in our sink and stuff like that.
So do you want? Do you want to complicate the code base just to add that? That is one that we may actually add eventually. It's There, there, there's an interesting one like that. So, so that's getting back to the Unix model. Mm-Hmm. . So it might be nice to do that more generally. So looking at something like pv, so that's pro general, that's a separate progress viewer.
Speaker 4: Mm-Hmm. .
Padraig: And, and you can point it at any command and it, it'll open up the file descriptors and. Now, it's, it's not as general as if you put it directly in CP, but it's more general in that it will work with any command and you can pop it in the middle of any pipeline. But one of the reasons we hadn't done that particular one was rsync is equivalent functionality that already has a progress bar.
And looking at The Unix model of thinking, having one tool that's doing something more general, then there's a separate PV tool which can be pointed at the CP process, and it will inspect all its file descriptors and see how far along they are at reading and writing a file, and you can pop it into the middle of any command or any pipeline or directed at any command.
So it's a more general solution.
Jonathan: Yeah, and it's also interesting. Some of the other, like, for example, the Rust core utils, they've gone ahead and they've added that in, I think, at least in CP, maybe in some of the others where it has now a built in progress bar, and that's one of the fascinating things about having multiple implementations of this that are kind of looking at the same the same test suite and the same kind of core rules.
But you have a little bit of you have a little bit of flexibility. as well to be able to do things just a little bit differently without breaking the rules.
Padraig: Right, right, right. And like, it's an interesting thing, like, if there are kind of borderline functionality or new options like that, that are already implemented elsewhere, that that's more kind of more leeway for us.
To implement those for better compatibility. Now,
Jonathan: so the, the, the fact that this is again, talking specifically about progress, the fact that this is still kind of being reconsidered it makes me think that, that, that website, that, that page of these are the things that people have suggested that we are decided not to do, like, that's not necessarily a static list.
And, and you guys have the, That you have the freedom to go back into that and sort of mine for ideas again and reconsider, well, maybe this wasn't such a bad idea.
Padraig: Absolutely. Yeah. And some of them, that doesn't go for all of them. That there's some, some special
Jonathan: ones in there. Yeah.
Padraig: Not to, well, this happened long ago.
That's a, I'm not singling anybody out here, but there was one suggestion that so for RM minus R. To recourse down the way, you could have an option to recourse up the tree. Yeah, we rejected that one
Jonathan: pretty quickly. That, oh yeah, that's, that must be, that must've been like the, the April 1st RFC for the project.
No, no,
Padraig: there was some, some arguments for it.
Dan: Really? And also, also implied the dash F as well. Just
Jonathan: don't accidentally use that flag. Oh my goodness. God, yeah. Uh, and, and then what about when you, and I'm sure this has to happen from time to time, but I'm sure it has to be rare. What's the process if you say, okay, we're going to make a change and it's going to break something, but we're going to make it anyways.
What does, what does that look like? Has that happened? Is that going to happen?
Padraig: It has happened. As we said, we're very careful about doing that. Generally. It's, it's only on if it's associated with some extra functionality that we're doing. So we break compatibilities in some rare edge cases just to allow you to add a lot of extra functionality.
So it's rare we do that and we err on the side of not doing that because, like, nobody wants to rewrite shell scripts when they upgrade from CentOS. Thanks. Bye. 10 to 11 or whatever. So we, we, we just have to be very careful about doing that.
Jonathan: Yeah. And is, is there anything sort of on the on, on the radar that is going to be breaking or maybe otherwise a, a huge big change coming?
Padraig: Nothing breaking interface wise? Okay. Look like, look retroactively, look, looking back at some things that the of another or we, we have another page kind of written of core util Scots. And these are things like, you would never have done this originally if it was designed as one cohesive set of utilities.
These things would never, like, just one gotcha, for example, DD, you often want to present hex numbers to DD because you're dealing with power of two blocks for inputting output to disk and stuff like that. So it'd be nice to supply a hex number. So if you could, and you can. seemingly supply hex numbers like 0x100 for, say, 256 byte block, but 0x100 to DD is 0 multiplied by 100, which is, which is 0, which it accepts and goes ahead and just doesn't skip anything, for example.
So, so there, there are little gotchas like that, that haven't been very carefully considered back in the day, but we have to keep compatibility with that going forward. Thank you And so please, like, we'll break compatibility slightly in that regard. For example, POSIX specifies, if you're not giving an error, you shouldn't give a warning.
Like if you're not exiting with an error status, you shouldn't give a warning, but in that case we do give a warning because it is such an edge case that you probably wouldn't be doing a zero multiplied by something, but we give a warning in that case. So we break, that's not really breaking compatibility, but we're just very careful about how we approach these sort of things.
Jonathan: Yeah, interesting. Alright Dan, you want to pick it back up? I've lost the connection to our back server, so we're having to do it in the open.
Dan: It's all going on today. Yeah, it's been a fun
Jonathan: one.
Dan: Yeah. I'm interested in one of the things that you actually mentioned Pari was the Unicode situation and internationalization as well as another thing as well for for character sets and so on.
So what's the situation with, with, with that at the moment?
Padraig: Yeah. So that's it's. A tricky kind of implementation thing that spans most of the utilities, especially the original text utilities. Personally, I was interested in doing that, and while I was working with Red Hat, I requested, say, a block of three months to go away and kind of just implement that, which wasn't granted, which is interesting.
I was surprised at that at the time. never had really a block of time to, that are required to go away and work. So that has kind of happened in piecemeal over the time, over the last few years. So the main Unicode functionality is currently encapsulated in GNU lib. And there's a lot of Unicode expertise By the, the developers and the main developer, Bruno Hebel of, or of working on Cano Cano lab.
And those interfaces have evolved a little bit over the last couple of years. They've added new abstractions for dealing with characters and multi characters and cells and stuff like that. So that, that has been gradually being added to the corridors over the last while. And, I kind of created a planning document, kind of describing the work that had to be done there.
So at least we have a kind of an overview of what needs to change. So it will eventually happen. Just it's happening slowly at the moment. And one thing that has changed over the last while as well, like when we originally envisaged this, it had, there was a lot of different character sets that are in use, but most things have consolidated on UTF 8 now, so that kind of suggests different ways of handling things of converting everything to UTF 8 before processing and maybe having separate tools for kind of sanitizing and working with UTF 8.
And then as an interface to other utilities, rather than having each utility dealing with edge cases of mis encodings and cases like that. So yeah, it's still a work in process. So, I guess that would be the main kind of functionality or feature that, that. Is kind of outstanding and core utils at the present.
Dan: Yeah, so it's a big one to deal with I imagine It's interesting that you said that you I mean without getting into politics too much that you weren't granted the time to work on that That's you'd think a lot of people would be would be after that, but who knows? Now this is a slightly left field question I suppose because Jonathan mentioned busy box at the start And I am going to show my ignorance now because i'm not entirely sure of the relationship between core utils You I'm busy box.
So seeing as you're here, I'm going to ask you is, is there any kind of crossover between the two? And so I'm assuming not in code. How does, how, how is the, how is the relationship between the two and how's it, well,
Padraig: definitely not in cold. There, there is a little interaction sometimes between Suggested new options and stuff like that.
One interesting thing, like busy Box is more geared towards embedded systems. Yeah. And it has and there's different licensing and stuff to, to date with that. So, so the, the, the main difference there is licensing. Interestingly, a while ago, um, core Utils was adjusted to be able to be built in the same single binary setup as busy box.
So. So the standard way you would build BusyBox is as a single binary with symlinks mapping the various command names to the single binary, and the core utils can be built exactly the same way. So you can install core utils in I think it's a, it's a couple of megabytes with sim links to the single binary.
So, so from a functionality point of view Coriutos provides the same things, I guess, with more portability. But yeah, there's the licensing aspect, the main difference there.
Dan: Now, you mentioned licensing and my eyes lit up because I'm a bit of a licensing nerd. I don't know why I just find this stuff fascinating.
Now I, I was working that I had, I was interested that you're under GPL V3 with core utils and it's, it moved from GPL V2 or newer to GPL V3 or newer. I believe. about 2003 or something like that. So were you around at the time? And what was the process like?
Padraig: I wasn't involved in that licensing. I had a few earlier patches, and then I joined really the project a bit after that.
In a, in a more involved way, I guess. So I wasn't involved in the licensing and I haven't really been involved in the licensing, mm-Hmm. any licensing issues or anything like that going forward. So I'm the wrong person to talk really about that. I'm more more focused on it. That's okay. The, the technical and, and to be honest the, all the, the core maintainers were, were focused on the, the technical aspects really.
But we are aware of the. I kind of, some of the restrictions of GPL 3 just even from a political point of view some people just kind of shy away from it and keep things simple and just avoid it. So that, that's, that's not an idea.
Dan: Yeah, it's an interesting one. I, I, I've got, so the kind of little background to this for me is I have some friends who work at the Software Freedom Conservancy who were involved in a lot of that kind of, like you know, That's licensing.
That's Licensing Central right there. And then I know that they were very keen to get projects to move from v2 to v3 of the GPL, and some were keen and some were not, and some still haven't, like the kernel of course. And so I just thought I'd ask, but that's fair, totally fair enough, that you weren't around or you weren't involved with the licensing at the time.
Now, one of the things you mentioned to us in your email was that you worked for Meta for a good long time and their use of Linux in the back and the stuff that you've done with them. So I'm really interested to dig into some of that because I know that they have their own distribution. How does that work?
Padraig: Sure, I can't go into very details about absolute numbers or anything like that, or details, but I'm happy to go into general general information, and a lot of this information applies to all the big tech companies like Google and Amazon, that they all use the same sort of models. Stuff like that.
So I guess the interesting kind of general information with Meta is to have a huge scalability requirements. So a huge focus on performance, like if you've got, for example, a 1 percent win out of compiler, they're talking hundreds of millions of dollars. The scalability is immense, but it's interesting as well, not in particular to core users, but maybe more aligned with a project like glibc or that kind of library code.
That when you're working on these things in open source, it's used by Meta and Google and everybody else. So it actually has much larger scalability concerns, but it's not really as apparent or, I guess, measurable. to these people. So it's there is a, I guess, there's a huge onus and responsibility on people working on performance.
Like, when you make a change in meta, it's, everything is easily, very easily tested. And You can see exactly the dollar values of changes. And so that's cool as well, because you can feed those, it's easier to test, it's very hard to test things kind of open, open source world, because You have a lot more, it's just harder to test because you have a lot more disparate systems and you haven't everything as tightly cohesive in a whole test framework and stuff like that.
So there's a kind of a symbiotic relationship both ways, like you get really good testing in a place like meta, and then you can feed that information back up and feed the code back up. But, but there's an interesting thing as well that, like, Corporations like these, they get huge use out of open source code, but sometimes there isn't the, I guess, the focus on sending code changes back up because just looking at the loop, there's a short term win from not spending the time to send your changes back up in top string.
But then there's a long term loss by not doing that because you become forked and you kind of diverge away from all the improvements upstream as well. So that was a good bit of effort on my part in META was kind of ensuring that all our internal processes and changes were a bit more. Getting code back open into the o open source.
So, so we didn't become diverged and, and it's something that that's really easy to fall into, but because of the short term win and not doing it like, like even open source first, companies like Red Hat back in the day, they, they originally got into that, that sort of situation. They had had a fourth kernel and it's, they, they actually got into a, a bit of a knot that, that they had a huge effort then to get outta that.
So it's just an interesting thing for any tech company or any company these days, since tech is involved in most things. They have to be very wary of not forking away from upstream too much.
Dan: It's good that that they, well, it's great that, that it sounds like they were supportive at least in, in you contributing stuff back upstream.
Was that something that, like, the management, without meaning to cause any trouble here, and feel free to tell me to, you know, you don't want to talk about it. But I was just curious, were the management and so on, were they supportive in that? Were they like, yay, go and do it? take a day to do this or, you know, support it.
Padraig: Absolutely. Like like in Meta at a certain level, especially you were encouraged to go off, you were trusted to do the right thing as long as you presented the right arguments. Like they were happy as long as you're doing the right thing. So, so within Meta was very open source friendly. And increasingly going forward now with the AI ecosystems that they're really kind of leveraging the kind of the open source aspect of that.
So, so, so no, no, there was great support in, in meta. I was kind of more alluding to the kind of the general kind of thought process of engineers in general within tech companies. They were focused on the short term wins not, not, not on getting their like we're focused we're working away in open source a lot.
And most developers haven't that mindset of pushing stuff back up.
Dan: Yeah. That makes sense. We actually have a question. So there are people listening and watching us at the moment. And we have a question from Mashed Potato, who says, What's the relationship like with developers of Core Utils replacement tools, such as Exa?
Could Exa find its way into Core Utils to replace LS? Is there a good reason to keep LS as it is? That's quite a big question.
Padraig: Well, absolutely. I've, it's a long time since I looked at Exa. There are a few interesting, like, I do have a look at every so often at tools like this, and if there are interesting general functionality that would be appropriate for everybody to use, absolutely we would incorporate it into LS. The big thing you have to be aware about with making changes is the interface.
You, Wouldn't move everybody to having to type exa ls is kind of wired into everything at the moment. So but you could add functionality or the ideal thing is to adjust things.
You have to be careful as well, but the ideal thing is to adjust things without requiring options or changes on behalf of the user. But you have to be careful. So, LS is an interesting one. So, it's an end user tool. So, we made changes recently and there were really good reasons for making the changes, which was to quote filenames that had Problematic characters in them, like spaces or shell characters that were special to, to shell becau because it, it introduced un, unless you quoted the names, there would be ambiguity with the, the spacing for delimiting file names or were the spaces within the file names.
Or you could have semicolons and you could put commands in there. So if people are copying and pasting and you can, there, there's a lot of way to, ways to hide stuff. So, so there, there was security implications there. So by default, we now quote the output of problematic filenames from LS. And there was actually quite a lot of pushback from that, from various people.
And the main reason is because they weren't used to it. And
like we of course provided, always provided a way to go back to the old behavior if we really wanted. But yeah, you just have to be very careful about how you adjust these tools, especially stuff user facing stuff like Ls.
Jonathan: Yeah, it's interesting. I just went and looked. The, the EXA command itself has been retired and a fork.
Iza is now what is the what is the latest and greatest. So and I think, so yeah, it's, it's interesting to look at it cause there's going to be some great ideas there. But what's also fascinating is that Core Utils has been around since 1992 and Exa only lasted for, you know, however many years.
And now the, the main developer of Exa is missing and cannot be contacted anymore. Like just. Just the fact that Core Utils has been successfully maintained for that long is like a, it's a huge win and not, not every project can say that, that they have, you know, that sort of a track record that 20 years from now, you can pretty much guarantee that Core Utils is going to be around and still putting out releases.
Padraig: Yeah, absolutely. And like for companies kind of investing on a platform and they're writing shellscripts and stuff like that, depending on all these utils, there's a huge kind of responsibility for these things to stick around and be stable. And just another aspect of that example, it's good to expand on a particular example and look at all aspects of it.
Like if we were to, say, take things out of ESA, now is it? And add them into LS, there might be a little bit of extra performance in every LS invocation, which that gets back to the point about how there's a responsibility on us to mine performance because it moves out into all companies and all users.
There was one example there recently where file capabilities were colored, and file capabilities are one of the things that never really took off in Linux. So, maybe one in a million files has capabilities now. So there's no real Advantage of coloring. So there's an overhead there of every LLS invocation looking at every file to see does it have file capabilities.
And it's one of these kind of esoteric things. So it was never kind of the interfaces to detect file capabilities was never optimised over time. So kind of a rule of thumb. Or kind of a what I call it, a back of the paper quick calculation I did was that by taking that out of LS, nobody really noticed, but it saved about 75, 000 worth of electricity a year, just estimating the use of LS around the world.
And so, which is, I don't know, maybe 40 households of electricity. And just. By not having this extra little bit of functionality in LS. So these things are important.
Jonathan: Oh, I'm so tempted to say something sarcastic about cryptocurrency there, but I think I'm, I think I'm going to not. So that, that idea of performance, though, that does bring to mind a really interesting question, and that is when Say AMD pushes out their newest processor and it's got AVX 512 support for everybody now.
Of course, Intel has gated that off to their pro line of processors. Are there changes that get made to some of these core utils? Because, Hey, now we have AVX 512, and oh, you can do string comparisons in AVX 512. Like, are you guys sort of on the cutting edge of that, watching those changes in processors, and then therefore going in and making tuning changes in some of these commands?
Padraig: I wouldn't say we're on the cutting edge, but we're definitely incorporating changes such as that. There's two aspects of that. Again, we try not to implement everything ourselves. So, looking at the crypto side of things, or the hash side of things, we, rather than implementing Assembly slash Cindy versions of those ourselves, we kind of push off to live crypto or the open SSL libraries because they're, they're kind of ubiquitously available as well.
So we'll, and we'll with version three, we can, and the licensing changes there, we can without issue link to those. So, so we, we'll link to those by default if they're available and use the fast version. So, so the checks on or cha 2 5 6 on or whatever. We'll use the, the, the latest version of those, but, but also within like core functionality ourselves.
Like for example, wC for counting lines that can be done efficiently in SIMD code and AVX code. And, yeah, we do have code for that. We have to be careful in portability, so there are special portability constraints in how we set up the build, so we actually added libraries to the build system to support that.
That's how you efficiently are. Kind of definitely separate all AVX instructions into separate compilation units in Automake world. So we have libraries that we, internal libraries that we link to to implement that for a few utilities now, and probably more going forward.
Jonathan: Yeah, I know kernel and GCC itself, You have a MD and Intel employees that come through and, and send these big patch sets in.
Like, here's support or here's the tuning for the latest, the latest and greatest from our company. Do you guys get any of that in Core Utils? Are there, are [email protected] [email protected] email addresses, sending patches in?
Padraig: No, to be honest I've had interaction with those guys from working at Meta and ver various places.
And various other projects, but not directly in core utils.
Jonathan: Okay. Let's see. I was going to ask about Dan. What was I going to ask about? That was, that was a short answer. I was going to look it up. Retiring commands. That's what it was. There are, there are some core utils commands that have been around for.
for years, for decades. And it's like, some of these I don't think anyone has really used in decades. And is, have you, is there a thought about, well, let's just retire these rather than making them continue to be part of the the, the maintenance burden, or are they just, are they going to be around forever?
Padraig: Good question. So there's two aspects to that, really. There's individual commands and then there are classes of commands. So answering classes of commands at the initially. Like, for example, all the, the separate checksum commands, like nd5sum, sha1sum, sha25sum, blah, I'm not going to go on, but you could go on forever there, and that's kind of a bad way to do it, to have a separate command for each of those.
So, going forward, we're consolidating those in the checksum checksum minus a, then you select your algorithm there, and so, We won't have any more of that class of command. Everything will be consolidated in checksum. And I guess in 20 years time we'll start removing SHA 386 sum or whatever, just to clean things up.
So individual commands then, yeah, there are some commands that are less used, like ptx, tsort, this sort of thing. I guess the main idea there is that there's less maintenance. We don't much maintenance focus. Like if we get a compiler warning out of them, we'll fix it up. Or if we get some security warning or whatever, we'll maintain it.
But we won't put much effort into adding new features or changing functionality on those going forward. That's the main thing. As for removing command, there's the big compatibility concern. Like These are, these utilities are so used and that there's some edge case on. Some space probe in Mars or something, we just can't, we can't, we probably can't remove, remove commands, so it wouldn't be that much maintenance going forward.
And on the other hand, that's why we have to be very careful of adding commands. And even options in that regard, but in added commands, we have to be very careful.
Jonathan: Yeah. All right. I do want to make sure and ask, is there anything that is coming that you wanted to let folks know about? Like, are there any future plans that you're particularly excited about?
Padraig: I guess the main one, you already asked about the internationalization and Unicode support. So, so that is something that's coming gradually. And we're definitely focused on that, and it's happening as we speak. In the last release or two, there have been updates to expand and un expand and utilities like that for them to handle multiple characters, so that will be coming.
Perhaps there might be a new utility. We've moved it for a while about a replace utility. So it's interesting. To replace a file on Unix is actually really difficult to do it with ACID principles, and copying data around with ACID principles is actually really difficult. So, kind of makes a lot of sense to have that encapsulated in a separate command.
Like rather than have sed minus i, you can have The replace command and provide another command to do the actual processing and then the replace would do all the complicated stuff about temporary files or kind of moving files to atomically and and all that anarchy. That, that's something that, that might be on the horizon.
At the end, there's. There's always the ongoing maintenance of new kernel interfaces for example, one thing that's changed recently that might allude to things going forward is we added and we had to be very careful in how we added copy offloading. So copying a file is actually really primitive in a POSIX interface.
It's like you have to copy the metadata separate to the data. You have to, there's atomicity issues there as well, which gets back to the replace command. But recently there's been the copy range command to in the Linux kernel and similar commands elsewhere or similar functions elsewhere to provide copy offloading.
And but we locked the doors maybe 10 years ago. and had a deep dive on those, and they weren't stable enough either in interface or functionality. So we provided feedback to the kernel folks then, and more recently in the last two or three years we've been able to start using these things. So that allows for more efficient copying operations, but we still have to be careful and actively inspect and avoid older kernels and stuff like that.
So, there's going to be changes like that going forward with new kernel interfaces.
Dan: Oh, awesome. So I'm curious if somebody say, listen to this decides, you know, I'd like to get involved with with core utils. That seems like my kind of thing. I mean, I'm always keen to ask people who've made a career out of this and who, you know, have contributed and become maintainers on these projects.
Do you have any advice to somebody? How? What's the best way to get involved and to you. You know, to, to come along, is there anything you particularly need, say, from the community that you think would be great if we had somebody who could do X, Y, Z?
Padraig: Well, there's a, there's a to do file in the repo, in the main repo.
There's a, If you look on the main, the main GNU Core Utils mailing list there's, if you, if you sit on that for any period of time, really, you'll get an idea of the, the work we're interested in doing and what, what, generally the work that's, needs to be doing and interested in doing. There's no we don't have a Well, we do, I guess I'm saying we don't have a bug tracker, but we do have if you go to bugs.
gnu. org, uh, slash there's a core utils section there. And so there are some outstanding bugs that need to be handled as well. And just in general, we're always very accepting of Just interactions and patches are on the list, and we'll try and guide people Like, we're very, very interested in getting code and new people involved in the project.
Dan: Excellent. Yeah, I mean, that's the lifeblood of every project, I suppose. So, Jonathan and I have just been having a little discussion in the back channel here about, about GitHub, because I didn't think you used GitHub, but apparently Jonathan says you are on GitHub, and he want, being that we talked about contribution and so on, we, I was curious, do you, do you accept pull requests from GitHub?
Padraig: We kind of do unofficially, right? So, so we, we, we have a, a GitHub mirror and we, we probably changed that a little bit going forward, but because it's become more, GitHub has become more ubiquitous since we, we had a policy. So currently you can make a pull request against Core, which I set up about 10 years ago.
And, just to consolidate things on the mailing list, we give people advice, like when you make a pull request, as the main commit message it puts in, that you should send it to the mailing list, but you can still create the pull request. So, so we might change the, the policy there to, that we will support pull requests separately as well, because yeah, and I have looked at pull requests there over the last while, and I do monitor that all the time.
So, yep. That's actually, I'll add that to my to do list this evening to change the wording there. At least we're giving you work to do, I feel bad now.
Jonathan: Oh, no, no, that's our job. No, that's interesting to me. I get the, I very much get the the hesitation that particularly the, the thing that I've noticed is that a lot of people, particularly free software projects have, and GNU projects have, about using GitHub, because it's not all open source software.
And I definitely get why people don't love that. But at the same time, it's so useful. And it, like you said, it's become so ubiquitous that it makes things so much easier for people to come in and add, you know, a trivial or a small pull request without having to go to the mailing list. A lot, a lot of projects have arisen with this.
Padraig: Yeah, I don't wrestle with it too much. I'm definitely on the side of defense that I'm not kind of. Very binary on you can't do this and you can't do that. The, the world is gray. So I, I, I'm on the, the side of the fence that whatever gets the, the most, the, the best logic in to the most people is best and mm-Hmm , most people are used to using GitHub, so I have no issue whether using GitHub at all.
Jonathan: Yeah. That seems to be where a lot of projects are coming down. Alright. Is there anything that we didn't ask you that you wanted to make sure and let folks know about? And I know that's a tough question because you have to. think about all the things you wanted to talk about, and all the things we've talked about, and do that set comparison, but Dan, do it, Dan.
Padraig: No, I don't, I'm just, I scanned my notes here, and no, I think that that was a very all encompassing set of questions, so I think you've got everything there.
Jonathan: Yeah, we try, we try. What's the weirdest thing that somebody has done with Core Utils that you're aware of? Have you gotten any user stories that have just surprised you?
Or maybe, you know, requests that have just been off the wall are very surprising?
Padraig: Look, you know, that's Jesus, nothing goes to mind there.
Jonathan: Maybe the, maybe the recurse up the tree was the answer
Padraig: there. That was the most off the wall one, I guess. There was an interesting set of videos I saw from, I think it's Robert Elder.
He, he went through a set of core utils recently, and he did a set of videos where every command, he started off that every command was his favorite command, but he had a very interesting use of timeout for Avoiding saturating his network with a backup. So he'd start off a backup, but if the backup, so the backup would keep copying stuff that hadn't copied already, but he'd time it out after two hours.
So I thought, I just thought that was a very interesting insight. Time out his backup. But it would get, but it would run the next night, and then it had left off where it went before, so it would eventually, eventually copy his files across, but you know, there's lots of esoteric uses. I, so many, I guess so many esoteric uses that I can't remember many.
Very many particular ones over time.
Jonathan: Yeah, that's the thing. You have, you have people that write these one liners where it's 15 different commands and most of them are going to be core utils commands and they do something, you know, ridiculous or really impressive. So there's a bunch of them out there,
Padraig: but on the other hand, like looking at questions on the stack overflow and stuff like that.
And there's, I just, People are, I don't know why they do things the way they do, like, it's just, I should have a section, I definitely don't do this with core utils commands, but, each to their own, you know?
Jonathan: That would be amusing to read that FAQ, the things you definitely shouldn't do with core utils. Oh, that'd be great.
All right, well, I've got to ask you before we let you go, what is your favorite scripting language and text editor? Interesting.
Padraig: Interesting. Scripting language. Well, what is a scripting language? That's true. I'd say my favorite interpreted language is Python. So does that count? Yeah, absolutely.
Dan: Yeah, yeah, definitely.
Yeah,
Padraig: that's one I've used for a long time now. And I get the most flexibility and functionality out of it.
Jonathan: And text editor.
Padraig: Text editor, I use VI. So, an interesting part of that, that's worth mentioning quickly, is that the interfaces, I use CLI a lot, I guess, obviously. And so the main, like UNIX has not been designed kind of cohesively, it's kind of evolved in separate kind of factions with SysV and BSD and stuff like that.
So the kind of, the interfaces have kind of evolved to be bash, or, sorry. Vi versus Emacs in various commands. So I've set up all my systems to have Vi key bindings, and it actually helps with the speed of the interface and helps with RSI, which I don't have any RSI ever, but maybe that's, maybe that's the reason.
Jonathan: It's gonna be the secret why. It's funny you ask what what counts as a scripting language. We had, we had the creator of Bash on back several years ago, and I asked him if Bash counted as a scripting language, and I think he was sort of offended that I asked the question. And Yeah
Padraig: Yeah, it's one of these things.
Jonathan: Yeah. Yep. All right. All right. Thank you so much for being here The hour flew by and we had a great time and I think really covered a good Corpus of information on the core utils and it was it was a blast to have you appreciate it Absolutely.
Padraig: Thanks,
Jonathan: man.
Padraig: Much
Jonathan: appreciated. Yeah. All right. What what do you think?
Dan: Yeah, I think as, as I as I kind of said at the top of the show, everybody's used core utils, whether they know it or not, I think, anyway. I say everybody, you know, I would imagine these days with the amount of things we've got in devices that are running all kinds of different stuff. And one thing I actually found very interesting that Pyrog We told us about was because we mentioned BusyBox and embedded.
I've got a lot of friends who work in embedded Linux and in that kind of world. And I didn't realize you could get like a two megabyte implementation of, of, of core utils. Now, I don't know if they know that or if people from busy, because I also know people who work with BusyBox and they're going to hate me.
If I say don't use that, use core utils, it'd be better, but you know, it's just these options, isn't it? But very interesting stuff. And I think the ubiquity of, of this is, is, is so amazing and the responsibility that it puts on maintainers like, like Parag and the other people involved that this is used everywhere.
So I imagine that, you know, it, it, it must be quite a weight to kind of bear.
Jonathan: Yeah, yeah, it's like we said, it's, it's it says a lot about the project and the way that they run it, that it's been around for so long and it's still a healthy project. So that's, that's neat to see. And I also find it, I find it real fascinating.
So I, I couldn't help but think when they were talking about how they, they very carefully make changes. There's this line from the Lord of the Rings books where they, they discover some like wonderful cavern and Gimli tells Legolas, That he wished he had some of his kin to come and work on it in Legolas.
It's like, you would make changes to that? It's so beautiful. And Gimli says some line about, we would only remove one rock every hundred years to try to make it better. And that's kind of, that's kind of how I feel that they're working with Coriutils. We make one breaking change every 20 years. We didn't, we didn't get to ask him about it, but apparently Coriutils itself has.
a really good record when it comes to security as well. Something like only five CVEs in the last, you know, 13 or 20 years, something like that. And I wish I thought of that during the show to ask, to ask him about, but it's, they're really doing great work. Absolutely.
Dan: Yeah, definitely.
Jonathan: All right, well I think that is, that is it for Core Utils.
Do you have anything that you want to plug, Dan?
Dan: Yeah, I should mention some people, longtime listeners of, of FlossWeek who will remember an event called OggCamp that I used to run in the UK, which is a free software open source unconference, bar camp style event. It was at one time, the biggest in the UK.
I don't know if it still is. It probably it's hard to say that it still is. Cause this is the first one for five years. So there's going to be one. So the, what I'm doing here, burying the leaders, there's going to be one in October and it's been picked up. Thankfully we got stopped by, we got stopped in our tracks by COVID as a lot of people did.
But the events coming back and it's going to be in Manchester. It's in October this year. You can go to ogcamp. org. That's O G G C A M P. And you can get tickets on there. You can find out what's going on. And I'm not actually organizing it anymore, but but Simon Phipps of this parish is is on the organizing team.
And if you want to get involved, you want to find out more about it, you can come to Manchester and you want to you want to talk or, or, or you know, do a workshop or any of those sorts of things, then let us know and head to orgcamp. org.
Jonathan: Yeah, excellent. So things I want to plug is, of course, we thank Kakaday for being the new home of Floss Weekly and you should make sure and tune in next week because we're talking to Carl Ritchel about Cosmic.
The new alpha release of Cosmic is out with their Rust based compositor, some really fun stuff there. I'm going to chat with him about it. And then of course you can follow my security column goes live Friday mornings on Hackaday. And then we've still got the Untitled Linux show over at Twit. And that is always a blast.
That is every Saturday. And we have a lot of fun there talking about Linux news and some open source news as well, but it, that's a, that's a much more Linux flavored show than this one is. But you should definitely check that out as well. I very much appreciate Dan being here as co host and we appreciate everyone that watches and listens both live and on the download and we will see you next week on Floss Weekly.