Jonathan: Hey folks, this week Aaron joins me and we talk with Simon and Stefano about the open source AI definition. That's something that was just minted and did not come a moment too soon. It's a super interesting conversation and you don't want to miss it, so stay tuned. This is Floss Weekly, episode 816, recorded Tuesday, January the 14th.
Open source AI. Hey folks, it's time for Floss Weekly. That's the show about free, libre, and open source software. I'm your host, Jonathan Bennett, and we've got something that we've been teasing, it seems like, for months now. We're actually going to do a deep dive on the open source AI definition, and I've got a co host that is not also part of the team that wrote the definition, and you'll get this joke here in just a second.
But I've got Aaron with me. Welcome, Aaron. Hey, thanks. Thanks for having me and Aaron is going to be sort of our AI optimist. Is that how you described yourself?
Aaron: Yeah, a little bit of an optimist. I don't think it's taking over the world as quickly as people think, although it's doing a lot of crazy things these days.
It seems like But, but I, I don't, especially since I do use it on a daily basis for work and for other stuff I'm working on. I, I, I tend to be, if there's one person in the room, that's a little bit more optimistic, I tend to be that person. So,
Jonathan: yeah. And I've, I've explained this on the show before my, my theory, working theory as of now, is it artificial intelligence falls into.
Sort of the same place that the crypto bubble did, but also the same place that the dot com bubble did. And that is that it's a bubble, people are doing dumb things with it, trying to figure out where it makes sense, but once the bubble bursts, it's going to stick around, and it's going to be something that sort of changes the way the world works, indelibly.
And that's obvious with dot com, and you know, with cryptocurrency, we're still sort of in the process of figuring out what that looks like. I think it's a good analogy. Yeah.
Aaron: 10, 10 years from now, we'll, we'll kind of have to think back like what life was like before we had AI just auto, you know, just assuming that it was there and that we could ask it to do things.
Yeah.
Jonathan: It's, it's the same, it's the same difference of, of worldview that I have with my children. Because I remember before the internet was just always there, and my kids don't. It is always, they've always been connected, you know, in some way, the household has been at least. So it's just totally different.
All right, well, let's let's bring the guys on. So we have Both Simon Phipps, which that is, that is the, the joke that I was making that one of our, one of our co-hosts is also a guest. But we've also got Stefano, Mali, and welcome guys. And you two are the absolute experts, from what I can tell in , what it means for an AI or for an LLM to be open source.
And let's, I guess, go to Stefano first and sort of taking that as the prompt to use an LLM term. Tell us what, what is all of this about?
Stefano: What's oh, thanks for being thanks for
having me. It's it's a pleasure to be here. Yes, sir. It's so, so what is open source AI? Yes, let's start there.
It's. Yeah, it's really not that different from other open and, and open materials or or artifacts that we've been thinking of. We need to have, we want to have freely, free access, freely available access. to all the components and all the pieces that have made that artifact that you have received.
So it sounds, sounded really simple and almost trivial at the very beginning of the conversation that we had almost three years ago, but it was quickly, we quickly realized that All the paradigms that we were used to apply, like the term source just to get started with, didn't really match the technology.
And so we had to study a little bit the issue and we had a long process and the end result that I can today, I can simply say that an open source AI is an AI is a system. It's, it's it's a system that makes you. gives you a free availability to all the pieces that made it. That includes the, the parameters, so the, the results of the training, the, the code that does all the, all the training, the code that produces the training data set and the data itself, when it's possible to distribute that.
And that's, that's it. Sounds simple.
Jonathan: I'm, I'm kind of thinking of the, and I know this is not exactly the same thing, the free software and the OSI, those are two like overlapping but at the same time separate things. But I can't help but think of the, the five freedoms that the Free Software Foundation talks about, and that's the freedom to, when it comes to software, to run, to copy, to distribute, to study, and then to change and improve a piece of software.
Are, are we kind of talking about the same thing when it comes to open source ai or is it, is it a similar sort of overlapping definition? The way that the OSI and the free software definition is l
Stefano: right. Let, lemme start from, from scratch because maybe, maybe the whole concept of open source AI definition throws, throws people off balance because what we have done.
The principles are really the same. And in fact, if you look the, at the document that we have published the OSI published at the end of last year, the, the document really lists all those freedoms that you are talking about, like an AI, and we talk about systems, I can talk about it later, but the, the, an AI pieces, in other words, something that something that produces an output infers an output based on an input is, needs to be made available.
Needs to be able, you as a recipient of that system, need to be able to study, to share it, to understand to to run it you know, to execute, to have those to have those outputs generated for you. And you have to have also the freedom to modify it, change it, change how it works so that others can enjoy the same freedoms that you have received.
So those principles are really the same. What is really missing when you look at the definition of free software? It is that sentence that says the precondition to exercise the freedom to study and the freedom to modify the software is access to the source code. And that's the, the words, those are the words that we were missing when we started looking into AI specifically.
We didn't have a very good way, we didn't have any way of understanding how an AI system can be really studied deeply. Modified to change its behavior, right? That was the, that that's what we have researched for the past almost three years and what we came out with is a, is a way to describe. The equivalent of source code for software and, and which in open the, in the open source definition, it's a definition point number two, it's called the preferred form of making modifications to the, to the code, right?
It's not just the source code, but also instructions on how to build dependencies. Of libraries and versions and, and of course, language compilers and things like that. All those things, knowledge about those pieces need to be shared in order to make, to have access to the source code.
Jonathan: Yeah, that's interesting that you mentioned that.
I'm trying to find the exact my notes on this, but there was a, there was a court case in I believe it was in Germany here recently, where a router manufacturer used some code that was under the GNU, the lesser public license of the LGPL, and they got sued, not because the code itself was missing, but that extra stuff, like the scripts needed for compilation and installation.
were missing from, you know, their, their source code repository. And someone sued them and said, no, you've got to, you've got to provide this under the LGPL as well. And I, I've commented before that every time, like, the GPL, the LGPL, or any of the other, but particularly those two, because they have, like, the strongest copy left protections in them, every time they go to court, I'm kind of, like, I hope this turns out okay Because the there's sort of this nightmare scenario that a court says no no No, this this is a this is a contract not a license and it's not a valid contractor You know, there's various ways that that could go poorly but the german court found that yes, not you know, not only is it a valid is it a valid agreement, but they they It looks like, you know, reading the, between the lines here, it looks like a conclusion was come to before the court case happened, but then the court case kind of rubber stamped it.
But the extra stuff was shared. And so it was it was confirmed that yes, you can force someone to share your compilation steps as a part of the, the source code license. So that's kind of a, it's, it's good to see that confirmed in
Stefano: court, right? It, it's really, it's a really a, a fundamental component.
A fundamental piece of, of the whole movement is to have access to the source code as defined as the, the preferred form in which a programmer would modify the program.
Simon: Mm-hmm .
Stefano: You, you, you know, obfuscated source code is not source code in this, in the context of the open source definition. Missing information about.
Like, build scripts is really not open source code.
Jonathan: Yeah, so, in the, what OSI done, has done, is they've put together a definition for what open source AI, what it has to be, like the minimum requirements. I guess the next step then, or maybe this has already been done, is to begin to produce OSI compliant licenses for AI models?
Is that, is that what's sort of next on the horizon? I'm jumping way to the end with this question, but it just comes to mind and I'm very curious.
Stefano: We, we are, as OSI, we are ready to start evaluating licenses that do not cover squarely or exclusively software. Like historically, the OSI has never Taken into consideration licenses that cover, for example, content music or.
Databases or other things that, that are not necessarily software. But with we are ready to evaluate other sorts of licenses against the, the open source definition, the original one, the 10 points, the 10 points one, are there, there are efforts that we are aware of. Of groups that are writing new documents.
They're not licenses, technically, or they're not necessarily going to be licenses, but they're, they're terms of use and distributions. There are other legal terms that are squarely new and, and cover only specifically parameters data and data sets and, and code also. And, and they're all comprehensive.
That's it. put together. Yes, we're ready to do it. Yeah. To review that
Jonathan: has in, and this is a, this is another sort of strange tangential question, but when we talk about open source beyond just software, what immediately comes to mind is open source hardware. Has OSI been involved with any of the open source hardware efforts, like defining what that looks like, or has that been left to you?
You want to jump in Simon? Yeah. So you have, you have thoughts about open source hardware,
Simon: don't you? Yeah, well, it's, so this came up before Steph's time. There, there is another organization called Open Source Hardware Association, or Oshawa.
Jonathan: Mm-hmm .
Simon: And Oshawa you know, did what frequently happens in open source communities.
They gave us the great. Using our logo as the basis for their logo and in trademark law, that's a big problem because that means you have to ask them very politely not to do that. And and so at that time, which was about 10 years ago, 12 years ago OSI and Oshawa had to reach a legal agreement agreeing that they would deal with open source hardware and we would deal with open source software.
And as a consequence, OSI has never done that. Actually got into defining what open source means in the world of hardware in the same way that the Open Data Institute, ODI, talked about what open data was and OSI again has never got into defining what open data would be. So the open source hardware definition, sorry, the open source AI definition is something of a departure because it's really OSI's first move into something which is not open source.
But I think it was a necessary thing to do because the, the boundary is such a a series of dotted lines that it's very, whereas with hardware, you can tell, well, you know, that's, that's fairly clear, even actually open source hardware is fairly unclear. One of the things I did when I was at Sun in 2006 was release all of the Spark designs for the Spark silicon chips as open source under the GPLV2.
We called it OpenSpark and we released all of the Verilog designs because it turns out that silicon chips are actually software as well. They're just software that's compiled to silicon. So the dividing lines are kind of hazy there as well, but OSI, as an organization doesn't doesn't harbor an opinion about open source hardware and does not produce a definition in that region.
Stefano: Can I jump in because I, I want to augment a little bit this, this part, like, why did we jump in? Why did we feel the urge when software is not we haven't done the same for hardware? I think what Simon just said hardware is programmed by a human. In, in, in some sense, like the chip design is done by a human.
And then it is compiled into Silicon. So you can see the mapping is pretty easily translated into source code being written by human compiled by a compiler into executable code for, for AI. What we noticed was that these systems look a lot like software, but they're not programmed by humans. They.
They apprehend, they learned by themselves, like they, they, they they have capabilities that emerge semi randomly. I don't want to get, I mean, I'm not a technician, I don't understand exactly why, but what I've been told is that these things just start to execute and become they have new capabilities.
They're not programmed. And, and the question. is therefore, how do you fix it? If it's consistently creating issues or spitting out the wrong answers, or, you know, or whatever, you, you know, you want to fix it, you want to change it, you want to give it to others, what is it that you actually want to ask?
It was easy for the hardware piece, it's not that simple, it wasn't easy, it wasn't immediate for the AI piece, that's what triggered it.
Jonathan: Yeah, that makes sense. Before we get any further into this, we really ought to stop and define who we are talking to. And I know that both Stefano and Simon are involved with OSI, and I, I could, honestly could not tell you what each of your exact roles are.
So let's let's go there next. Stefano, I guess. Where, where are you in the OSI org chart?
Stefano: I'm the executive director of the open source initiative. I started three years ago and I'm in Italy now. You're at the top
Jonathan: of the
Stefano: chart. No, no, the board is actually at the top. Well,
Jonathan: okay, that's fair. Simon, where
Simon: do you fall in this?
Well, I was the president of OSI for about a decade, give or take the odd year here or there, and when I quit the board of directors, I had foolishly started some work on open standards and public policy. And there was nobody to carry that work on if I just ran away and disappeared. So OSI hired me to be their director of policy.
And so for the last three years, I've been OSI's director of policy and standards, and then we hired somebody else to do, to look after us policy. So I've been the director. Director of EU policy and standards for the last year or so. And I, I, I actually don't do anything at all to do with AI. I, I, I look after making sure that things like the cyber resilience act doesn't break open source.
I'm making sure that the standards organizations create standards as if open source was real. Those are the two things that I actually do as the day job. But one of the things I'm going to have to do now, Steph made a A very clear statement when he took over as executive director that OSI needed to do something about defining open source in the context of AI.
And I'm now going to be taking that definition forward into the the, the, into Brussels and performing the necessary education to help people understand that, for example, Meta's LLAMA AI system is not open source. because the licensing includes field of use restrictions and other insights drawn from understanding what AI is and working out how we should legislate.
And that's necessary because people have already started writing legislation about AI. I mean, it, it. It may be really young, but there is already on the statute books in Europe, the the artificial intelligence act. And it contains within it an exception for open source AI. So somebody had to define what open source AI means.
And that was, that was Steph.
Stefano: Yeah. No, no, no, no. Wait a second. Wait a second. It wasn't me who defined it. I mean, that's.
Well, that's crucial because, because different from the free software definition and the open source definition itself, this is not the work of a lone person you know, smart or otherwise the coming out of the, of their, of their garage with, with with the sacred text. This was the. This process had to come from the community of AI developers, researchers, lawyers, copyright holders subjects of, of of AI systems, all of these different, different stakeholders had to be consulted and we needed to find a definition that matched what was actually happening.
In the space, providing some, some guidance, of course, and bringing our expertise and experience from 30 plus years of free software. But it was definitely not the work of my work.
Jonathan: We, we absolutely need now to go ask one of the one of the AI image generators to generate the image of Stefano carrying the tablets of stone out of his garage.
Simon: You know, I think that the point Steph's making there is actually really important. Yes. Because some, there has been the question asked, you know, what writers OSI got to define what open source AI is. Sure. And, and this then reads back to the open source definition itself, you know, what writers OSI got to define what open source means.
And the answer is that it's actually, The OSI's role is to collect together the consensus on what it means. That's what we did with the open source definition. There was a definition that came out of Debian that became the open source definition. And over a fairly short space of time it became obvious that That the consensus of the global community was that the open source definition was the the, the, the canonical explanation of how you can tell that a license is an open source license.
And now what Steph has done for the last few years is he has run this exhausting global program where he's held public meetings. He's hired facilitators. He's hired authors and writers. And what's happened is he's asked AI experts and open source experts and people who, who work in, in social development.
He said to them, what, what is open source AI? And over that time, he's gradually evolved what the answer is not by being clever and working it out for himself, but by. Identifying the consensus of this huge crowd of people. And that means that the, the, the, the definition is really a consensus definition in many ways.
Now because the, the field is younger, there are some dissenting voices. You go out and ask anybody who actually works in AI on open source nine out of 10 of them, and I can tell you the name of them. The 10th, but nine out of 10 of them will tell you that this definition is, it's pretty much right.
They might want to nuance some of the words or some of the concepts. And indeed OSI is putting in place a plan to evolve the definition to a, you know, a V 1. 1 or a V2 in the future. But this really is a consensus definition rather than an imposed definition from Steph. There you go. He's
Jonathan: got it.
There's Steph with the open source definition coming out of his garage, that's great. I love just that hair. So, I think, I think we can take a moment and just say, like, on some level, that, that is impressive, right? Like, that's pretty amazing that we could do that. That we could go, hey, here's a cool idea.
Let's, let's have a machine draw a picture for us. And it comes out that well. Like that, it is cool that we live in these times where we could do that. Yeah, can
Stefano: you imagine, can, do you
Jonathan: remember when you had to
Stefano: go to a bookstore to order a book?
Jonathan: I still enjoy going to bookstores, but yes. Me too. Take it. We take it.
Aaron: Yeah. Hey, I wanted to dig in if I could a little bit more into the nuances of what we're dealing with, because one of the things that really interested me two things you know, one is that there could be a dependency on something that lives outside the source code for the thing to work right as a problem statement.
And then the other, er, That it may have self generation capabilities and how do you license them? So I'm just kind of curious if what made me think initially was about yesterday, I discovered and started playing around with whisper from open AI. I'm not sure if you're familiar with it, but it's a.
text to speech translation tool. And you can, you can, it's, it's under the MIT license. So you can go, you know, do whatever you want with it under that license. But I'm just kind of curious, like if there's some examples of things that are already out there where those two things would be problematic or are problematic already.
Stefano: Two things that are problematic. Sorry, I, I wasn't, I'm not sure what, What do you mean? An
Aaron: example of some, of, of some AI tool or, or AI LLM or something that's out there that is. Either self generating, do you have an example of, of something like that, that could be problematic or the other one is where it's totally dependent on things that are outside of the, what could be covered under a license or what could be considered open source and thereby kind of invalidates the idea.
Stefano: Self generate, do you mean self generating as in Skynet? I don't know if
Aaron: I'm going that far, but something where I think you were talking about it, Simon, where it would be difficult to categorize the what was covered under there because it wasn't necessarily programmed by a human.
Stefano: All right, okay, now I understand.
So the case of whisper is interesting because the license. is, is extremely permissive. We're very familiar with it and it covers the parameters, like the weights, the train, the trained weights of that engine. What we don't know about Whisper is how it's been built, how, what kind of training, how, how the training happened, what kind of data sources they use to, to, to, you know, to, to have those results, those pieces are missing.
And that is why we don't consider, we wouldn't consider. Whisper and Open Source AI, despite the fact that the weights are openly and freely available. Because what we have defined in the document, the Open Source AI definition, what's defined in there that is brand new is the preferred form, the definition of preferred form of making modifications to an AI system.
If you want to change the behavior of Whisper, you can, you can, sure, you can fine tune it. You can, you can add layers to it. Or you can modify manually some of the, or randomly poke at at the weights themselves inside the, inside the matrix, you will not have, you will have a much harder, easier time if you knew what kind of data went in there, have a full list of it, the, if you had the training, the training code in order to understand how that training was done, you had all the code that was used to Change the, to generate the training data set because training data, I mean, the original data needs to be massaged, needs to be filtered, needs to be duplicated, tokenized, etc.
Before it can be fed into the training machine. So all of these pieces are required to understand to, to have a fully access to preferred form of making modifications to the, to the, to the code. Sorry, to the system, to the AI. That's the new thing that's in the definition.
Jonathan: So would it be fair to say then that that particular model, some of it is still a black box?
You don't know where it came from, it's just you have this black box kind of artifact that's part of it. And, and to make something in open source AI, we want to get rid of all of those black boxes and be able to see inside of all of them.
Stefano: Yep, that's exactly. Or
Simon: at least know where they came from. You know, so, so you can't, because some of the things that you do to make an AI are ephemeral.
They are transient. What matters more is that you've got a, an adequate description of how it was, how the AI acquired its knowledge that somebody else who is sufficiently experienced could take your recipe and do the same thing. They may not need exactly the same data. But they do need the same recipe that you had for how to train it on, you know, the health data of everyone in an emergency room for a month with these, with, you know, the, these inputs from the equipment and from with these sign offs from the patients and so on.
An expert can take that description and. produce the same transparent box. They don't necessarily need all the exact same data to do that, but that that's a corner case as the general case. Yes, indeed. You know, we don't want any black boxes. We want to know what was, what they were shown in order to be populated.
Jonathan: And I suppose that's sort of a hinge point here, right? Like there are, there are certain to medical is a really good example, right? There are certain times where you would want an LLM. That has been trained on medical data. And the idea of releasing all of that medical source data is just, it's a complete non starter.
Like, legally, you just cannot do it. But you would want that LLM to still be as open as possible and ideally be able to fall under the open source definition of AI. And so that's, that's kind of the corner case that's been the, the sticking point maybe through all of this?
Stefano: Honestly, a lot of the corner cases that have been, that have been, that are circulating We must wait and see, like, I, a lot of the conversations that we're having about these corner cases require new science or they may, they may, they may become obsolete tomorrow. Like a lot of the, a lot of the ideas that we had two or three years ago about the technology, specifically LLMs are starting to go away, like, or, or are becoming less relevant.
So, before we talk about corner cases, I would really love to see more work and more analysis of the actual good examples that we have today of, of groups, research institutions, nonprofit. Hackers that are really releasing datasets with with full instructions on how to build them full code releases.
They're making attempts at, at creating platforms, including hardware descriptions of clusters, training clusters. To, to build AI, AI themselves, like all of these examples, the, the virtuous examples are the ones that we are losing track of. Everyone talks about Lama and then on the other side there is a lot of other groups that are releasing much, very, very compelling technology.
With full access to all of the underlying components and pieces. So respecting that, the concept of preferred form of making modifications to an AI system.
Jonathan: Have you, have you gotten a decent bit of a contact from like the industry? People saying we, we acknowledge the work that you've done and we would like to make changes to make our license or model or, you know, whatever.
Open source compliant. Has there been some reach out?
Stefano: There has rather than the industry, the most the, the most productive and conversations we had in collaboration, we had So we had collaboration with industry, large corporations, small corporations, startups, and research institutions, and non profit groups.
The non profit groups are the ones who have endorsed more happily the definition as it came out, because it really supports that idea of creating a framework for collaboration, a shared understanding of what are the principles of furthering the science. furthering the knowledge on how systems have been built, how you train and therefore how to improve without having to reinvent the wheel.
But from the industry perspective, because of the technology, the way it is built, the way it's of its complexity, the fact that it has To take into account multiple layers of the companies themselves, like they go from the data scientists, but even on the legal departments, just to give you an example, in the legal department for software, copyright experts and maybe patent experts are sufficient, all of a sudden for AI components and pieces, you end up having to involve the whole Expertise of the firm plus consulting consultants from outside, because you have to have export regulation, the, the, the privacy regulation across multiple, multiple countries.
And world like it becoming, it becomes a lot more complex. Companies generally haven't been very happy with the way the definition came out. It's too restrictive for their point of view. Yeah.
Jonathan: And then you've got people on the other side that don't think it's restrictive enough, I'm sure.
Stefano: Right. Yes, there, there are some, some groups who say, yes, we should be asking for more.
Jonathan: The thing that comes to mind with that is that there's nothing in this that would prevent someone from writing a more restrictive license. Right? And I would imagine that you could write a more restrictive license and it would still be considered an open source AI license under these guidelines. So someone could come along and try to, you know, bring the idea of copyleft.
Into into this, it's the, the AI, the open source AI definition. Someone could come along and try to, you know, bring in an, a Fero, Genie, Genie public license. To where, you know, if anyone touches it, even on a website, they need to have a way to be able to get the source. Like, I would imagine that it would be possible to write these now, they're not going to get a whole lot of, of uptake.
But it would be possible to write these license and to to release something, to release an LLM under, you know, a less restrictive or a copy left license, right? Like that's an option, isn't it?
Stefano: It is an option and it's a conscious one. I don't think it's a negative one, to be honest. Like if you. And I don't classify the GNU GPL or the GPL as restrictive licenses.
They're permissive. They just add requirements that you may want or not. They're not really restrictive. So in the same vein, I do think that we It's a good idea to have the possibility to have legal frameworks, legal documents that would let someone like a user downstream to, to say, Hey, you're coming up with this system is spitting out this, this this output like I'm asking for a mortgage and it's consistently telling me that I'm not qualified.
To, to, to have it like, but why can I get access to all the instructions of how it's been built, can I have my experts review it, you know, that kind of stuff is not, is, is good for society and in general let me put it also on the other way. Like, think about the fact that you. We, collectively, have created a lot of content that has been crawled and, and and spidered and archived into, into repositories like Common Crawl, or the Internet Archive, or our code is into Software Heritage and GitHub repositories, right?
That content is now, can be used and, and is being used to train wonderful machines that create images and, and and spit out code. Right? Do we want, as a society, to have the possibility to say, Hey, you're using my code, I want the parameters to go back to me, and under the free, same conditions I gave the code to you, or my pictures.
I don't think it's wrong to think about it that way. It's a choice that we need to allow.
Jonathan: Yeah. And that, that actually kind of touches on a, a much bigger question with, with the, the, the way AI works right now, and there's this sort of legal theory that AI is put, putting data into a large language model, it is so transformative that it pretty much removes the original copyright, right?
So you, you put it, you put it in. And the, the language model trains on it, but does not inherit the copyright of the training data. That's essentially the way that it's being used. I mean, you look at something like Copilot on GitHub. It is trained on a whole bunch of GPL code. But then you can say to Copilot, write me code, and there's no expectation that the code that Copilot writes carries the, the GPL license.
So just as an example, that's the sort of thing that I mean. And I know that there are some there are some legal theories out there that that's not going to survive now We we talked last week with a lawyer and his comment was the genie is out of the bottle And I don't think we can ever put the genie back in the bottle but it is interesting to think that there is there is sort of this push for We should inherit some of the copyright of the original training data.
And I'm, what, I guess, what do you, what is your thought on that? Do you think that's a reasonable thing to think about? Or is the genie just entirely out of the bottle and there's no way to go back?
Stefano: It's, it's not, it's not a, I don't have an easy answer. It's a complicated, complicated and nuanced conversation because, and I have a dual, dual approach to this.
Do that on one hand. First of all, these theories, they are being proven in in American courts. And I'm, I'm saying American courts specifically because there are different in law. The copyright law is fairly uniformly applied around the world. But some things are different in the United States, in Europe, in Great Britain, China, et cetera, et cetera.
So, yes, the, the genie, I, I guess the lawyer was may, may have been referring to the the theory that has been proven by Google Scholar Google Books, actually the when, when if you remember, was it 10 years plus 10 plus years ago, Google started scanning books and making them available. Search available through the content of the books.
They were buying the books and paper, scanning them, doing character recognition and offering those as search engine products. And and they got sued. Google got sued by the American Publisher Association and the publisher, the Google won that case because the judge said basically that what Google was doing was not damaging the copyright holders.
And also being was transformative work. And, and so I believe that Copilot and GitHub and Microsoft are building their cases, their defenses against the lawsuits pretty much on the similar theories. We'll see if that survives the courts. But there is a, there is a different, there is another angle that I keep on thinking.
That the fact that. Collectively, we have created content. Collectively, the society has created content. Every blogger, every podcaster, we are creating content. We're making it available on their permissive licenses, like Creative Commons licenses and others. And we're telling the world, use it, do whatever you want with it.
And now, if we start saying individually, like, you can use it, but not for training, or you can use it not for training under these conditions, Then all of a sudden, the larger copyright hold, the larger corporations who can license content from content providers, large aggregators, et cetera, or they will, they will have an edge.
They will have an advantage because they have the money and the resources. To get access to large quantities of data and groups like that are building their, their, their systems openly in 3d and they want to have access to freely available content, they will have to jump through hoops and license individually from millions of people and copyright holders.
So we, it's a, it's a more complicated. Question in my mind than, than what it looks like at surface.
Jonathan: With some of those being more complicated questions and the way that all of this is still being developed and changing so much, do you foresee the possibility of changing the open source AI definition?
Do you think there's going to be some updates to it where you go back and you say, this needed to be stronger or this doesn't need to be in there?
Stefano: Absolutely. I think we need to, we need to really pay attention to what's happening in, in the space, like in the field, what, what are not only the technologies, how they're evolving and changing, but also how the developers and developers of AI, builders of AI, builders of datasets, how they're behaving, how they're adjusting their.
Their, their tooling, their expectations, how are they, they're, they're releasing. What, where is the collaboration happening and how or what's the potential for collaboration? Where is the safety coming from? All of these habits, we need to watch them. Like, consider the fact that the open source definition appeared more than a decade, almost two decades after the free software definition appeared.
The, and the free software. Definition was basically a statement of principles and in a manifesto that the open source definition is more like a checklist what we have today, a checklist that was based on 20 years of experience, right? So we're basically watching the space as it evolves. More systems are released.
More data sets are released, more, more experience we gain, and we can generalize from there. What we have right now in the open source AI definition version one is a stake in the ground. It's basically like, like a conversation starter, if you want. The place that we can use, like someone was saying, to have conversations with policymakers, with researchers and developers, with corporations.
To say, this is, these are the principles that came out of the, of a large conversation. Where do you disagree? What do you think is coming, is, is wrong? You know, we'll, we'll keep collecting that information. And we're also going to keep watching the data space because that's where I think that we, We need to pay more attention to like we have right now Really been talking about open data as if it's the only and most important thing that that we when we have But the open data alone is not sufficient to describe the complexity of training data sets and training data.
There's more nuance to that.
Aaron: There's so I've got a kind of a reverse question. We talked about this a little bit when Jonathan was talking about protecting creators rights or protecting the rights of the people who's Data, whether it was images or whatever, ever kind of data that was used for training.
Is there, I'm reading through the definition now. Is, is there, because one of the nice things about open source licenses, and we've got probably also distinguished between the definition and the license, but you know, the license does provide protection. At various levels, depending on which license you choose for the creator of the software, in that case to make sure that they get credit, for example, as the code is, is copied by others or used in other projects, et cetera.
Is there also that same protection in the definition, or is that left up to the various licenses that will come that meet the, The, the definition to provide that type of protection.
Stefano: I think it will have to be coming from the legal terms as they get developed.
Simon: So keep in mind that that, that the the credit to the author is a license term. It's one that's permitted by the open source definition, but it's not actually required. Actually a little while ago, somebody, I think it was Jonathan talked about restrictive licenses.
There are no open, there are no restrictive open source licenses. I
Jonathan: knew that was coming. Simon's had his B in his bonnet ever since I said that. And look at him go. There
Simon: are no restrictive open source licenses because a restriction is something that you have to negotiate removal of. And open source licenses do include conditions.
And so I, I say you can use this license if, blah. But that's not a restriction, that's a condition. A restriction is you can't, no, you, you, you guys with, with, who have, don't have white hair, you can't use this software. That's a restriction. And the only way you can get rid of that restriction is to go to the person who owns the copyright and ask them to waive the restriction, probably in exchange for some money.
So open, no all open source licenses are permissive in the, because they contain no restrictions. They only contain conditions and credit to the authors is a condition. And it's an optional condition. There are open source licenses that don't include that condition. And so if there are going to be those sorts of conditions placed on open source AIs, they will have to be licensed terms because they're not predicated by the open source AI definition itself.
Aaron: That's a good point. I the other question I had as you guys were talking, was there ever any thought put into this or maybe it's just outside the bounds and it doesn't apply because it's too high level of of intelligence itself, right? So what happens when we have an AI that, I don't want to go all the way to sentience, but I mean, as, as, as these AI tools and things become more and more intelligent and able to do things on their own, was there any thought in terms of the definition into we need to somehow Accommodate that, or think about that, or do the rights change at that point?
When, when AIs become or, or approach sentience?
Simon: That's got to be your question, because I don't believe that can ever happen. I'm, I'm a weak AI guy. I read Marvin Minsky in the 80s. And I, I believe that the society of mind does not lead to emergent intelligence. I believe that that evil, the evolution of, of machine learning only goes to just below that point.
But, so it's gotta be his question,
Stefano: right? My question, your question. I'll take it. I, I, I like we have this, this this sticker that says Skynet won't be open source . So I love, I love that. But it's just a working theory right now.
Jonathan: Oh, Aaron, I think that is a great question. But I think that's probably also a great answer. Yeah, that's pretty funny. Oh, all right. Let's see. Where, where do we, where do we want to go from here? So, the Do it. Simon told me where one of the bodies was buried. Do I want to do? I want to dig around in that.
Somebody's like, I don't care. I know, Simon. So there's this, there's this thought that in the, so there, there are exceptions for where, you know, sometimes all of the data for whatever reason cannot be provided. And we talked about the medical exception. And there are some others. And is it sort of a challenge to just to explain the data.
Open source definition, because those exceptions are so prominent and are we going to maybe see a version in the future that puts the exceptions sort of in the, in the back half of that section?
Stefano: We might. So, okay, let me put it that way. If I started today, rewriting everything from scratch. With with the help of a few people without having to, you know, go through the whole process and,
and interviewing hundreds of people, et cetera and traveling the world that blah, blah, blah probably the, the wording might.
I think that could change, but honestly, I, I think that we are panicking a little bit and, and we need to, we need to take a little bit of a step back, like really, we're going to release next week, a paper on about data specifically about the issue of data. And because if you put things away, like I said, the open source, an open source AI is one that gives you all the means to understand exactly how it's been built and be able to build something that is.
working exactly like the one that you have received. Knowing that software is, we have decades of experience with, we have understood it well enough that we have now reproducible builds, or at least a theory of it. Like, we can really take source code and rebuild a binary bit by bit exactly the same as exactly equivalent to the one that we have received.
It took us decades to get there with, with AI, with LLMs, with, with with this sort of neural networks technology, we still don't have the science to to do the same thing, to take I've been told by a notable AI developer that releases everything in open source, that they tried to Take the same data set.
Okay. Take one data set, feed it through. I
had an ambulance. I live near a hospital. So ambulances every now and then, I forgot about this issue. So they, they had this one data set and they were feeding the same data set to through two different pipelines into the same cluster split into halves. And they would put these two streams. would produce two different models.
Okay. So it's not, it's not easy. If not, it's not easy. It's not a solved problem on how to replicate a training so that you get an exact identical model model weights. So knowing that this is different than software, and this is a very young discipline, we need to take a little bit of a step back and, and we need to really understand the issue of data.
is that the issue of data is not going to be solved quickly. It requires much deeper understanding that what I was talking about, like we're now thinking of open data versus not. And I give you a very quick example. I was talking to a developer who's been thinking about building a data set that is purely made of.
Copyright free or unencumbered material content. And they were adamant on using public domain movies only. They step into the public domain issue, which is in the United States, public domain for a movie is 70 years after the death of the director in France. A movie goes into public domain 70 years after.
The last person who worked on the movie dies. It's impossible to calculate. So a lot of the, so where is a data set that looks like it's, it's built on public domain material in the United States may not be in public domain in, in in Europe or, or in France, right, or in German. Completely different story.
We need to get a little bit, take a step back and say, okay, well, this is different than source code. We need, we need more. We need to talk about changing policies maybe. Have new laws or build new habits, or change the technology, right? All of these questions are all open, all of these possibilities are open.
Jonathan: One of the, one of the things as we, as we get towards wrapping up, this is something that's kind of been in the back of my mind the whole time. When we talk about open source software, generally all it takes is a computer to be able to make changes and compile. When it comes to open source AI, like the, just the barrier to entry to really play with these things is much higher, you know, in some, in some cases you can do it on a CPU, in some cases a GPU, but like these, these rather on the edge AI models, you, multiple GPUs to be able to do anything interesting with it and I guess that doesn't really change anything as far as what the open source definition says for it, but surely that changes things for how accessible, like, On a practical level, how accessible it is to be able to actually get into and mess with some of these things.
Stefano: Yeah, there is definitely something that changed, like the scale of some of these training is, is prohibitive. But again, in a couple of years, I've seen there's been some difference. And small language models and other technologies. They seem to be making things more accessible. Like if before, if open AI, the company trains on clusters that are worth billions of dollars there are smaller groups that are doing something on tens of thousands of dollars, like, and getting similar results.
And I'm not saying that, you know, they can build something as big as Powerful of OpenAI's GPT and being able to respond that quickly. But, you know, these models, like Mozilla is doing something extremely fascinating with with, with smaller models running inside the browser, even a mobile. So, you know, training versus execution, that's another big conversation.
It's early. It's really early. We need, we need to give it time.
Jonathan: Oh, yeah. We are, we are definitely in the well, even, even faster than Moore's Law, right? So, like, there's this idea with early computing that every, what was it 16 months, it, it, the computing power would double. Some, something like that.
And we're, we're moving even faster than that with AI at this point. So, you know, give it a year or two and just who knows where we'll be as far as the accessibility part of it goes.
Aaron: And it definitely speaks to the reason why something like this is needed in my opinion, even though even though I'm an optimist, I'm not a skeptic.
I'm not, I'm not saying we don't need any, any of these protections and definitions for things. Speaking of definitions, I'm kind of curious, like, you know, you've been saying it's early days, this is, you know, You know, 1. 0, the definition, right, is, is what's out today. By the way, if people are looking for it, it's on open, I think it's opensource.
org slash AI. If they want to check it out so definitely go there and have a read as I've been doing, as we've been talking here. But it's still early days. I'm kind of curious, like, for example, you know, there's, I only see really two definitions in version 1. 0. One is of an AI system and the other one These are definitions in the definition, , but one is of a, of an AI system.
And then the other one is for machine learning. Mm-hmm . And of course AI is is lots of different things, right? Like LLMs and, and different types of things. So I'm kind of curious where, where do you go from here? Will there, where will there be other definitions that you know, you already have to add?
Are there things that, you know, what's the roadmap here of, of the definition? Where do we go?
Stefano: Yeah. Very good point. So. So why did we include the definition of an AI system in there is because we needed to have an anchor to understand what we were talking about, what we were defining three years ago, it wasn't really clear what we were talking about, like, what is it, why, what's different in here.
And so we use the definition of the. OECD, the Organization for Economic Cooperation and Development, which is what's been used also in the Artificial Intelligence Act, very similar definition. And why we're targeting machine learning specifically to talk about the preferred form to make modifications is because the new LLMs, the ones that require training, that have the dependency of data, etc.,
the ones that learn how to magically, those are machine learning systems. So we We said, okay, we don't want to define. The preferred form for everything all at forever at at every time, let's focus on what we know today that needs clarifications. And that's what we, we got it from what we take it from, I mean, what, where do we go from here?
We need in the next year, year and a half, we will need to understand better how groups like LLM 360, LL3i, The Allen Institute for AI, Falcon Foundation, TII, and other groups like this are LLM France. These groups that are releasing software parameters and data sets. As much as possible to the, to the comments into the group, into the, into the, the open source communities to understand how they operate and how they work and what they need in terms of legal frameworks, legal documents, opinions and, and generalize from there.
Jonathan: How soon do you think we'll see the first license that is OSI approved specifically for AI?
Stefano: I know that the Linux Foundation is working on a new license specifically for that. I would love to see even the, the current ones that are, would not pass definition, but I would love to see the debate, like the ones, like the responsible AI license.
I would love to see one of those being submitted, but also would love to see submitted the data licenses, for example, the, like the CD, CDLA, I think it's called. From the Linux Foundation, they have, they have developed a couple of licenses that are suitable for, for, for data sets. It would be a nice exercise to, to start getting, you know, start getting the, the habits but also understanding where the open source definition terms like require clarifications.
Jonathan: All right. Very interesting. Good stuff. We have basically reached the bottom of the hour. Aaron, was there anything that you. Desperately wanted to ask before we let him or we wrap.
Aaron: Yeah, I've got two and they're kind of, they, they, they weren't immediately applicable to the discussion. But I'm curious about their thoughts on this.
So I'm kind of curious if you have any thoughts on what other open source projects can do besides getting more. Developers involved to incorporate some AI tools into other open source programs or existing open source programs. And the ones that come to mind for me, cause I use them all the time, of course, are like Inkscape and GIMP.
I find myself using Photoshop more than I want to these days because I just need a tool that can quickly remove the background and generate a forest behind my image or something. And they do that really well. And I'm just concerned that. People will stop using these great tools that have been in the community for so long because they're now they're missing these AI function, this AI functionality because they don't have the development bench to go build it.
Any, any thoughts on that?
Jonathan: That's interesting.
Stefano: Yeah, it's I, I love to see Colabora. And, and LibreOffice get some summarization stuff. Like Mozilla is doing some very interesting work on that front. I, I think that it's just a matter of time to get the, to get on one hand, developers with skills and understanding these, these tools and how to do things like reducing the size of a model, like something that looks really gigantic that works only on a hardcore platform.
And high level GPUs to make it run on CPUs, for example. Yeah, so that it can be distributed freely by Debian. But also we need a little bit,
speaking of Debian, right? We need to also I think that we need to understand a little bit better what the legal frameworks are. Because I, I don't think that Debian is going to be very happily distributing Whisper. For example, or something based on a whisper, right? So we will have to come to, we will have to have more conversations about, about that, this reluctance, or maybe let me put it that way.
Maybe we need to think about how we're going to be solving that, that challenge between. Oh my God, this thing is stealing my content, it's stealing my stuff. And, and, oh my God, this, this is useful. I would use it or my friends are using it. So there's that tension in there that we need to resolve.
Jonathan: And you had one more Aaron.
Aaron: Yeah. One, one more quick one. And maybe this can be part of the wrap up instead of our usual questions, but I'm kind of curious what your favorite AI tool is that you're using at the moment.
Jonathan: Oh, that's a really good one.
Stefano: Yeah. I really don't. Don't use them that much, but we do use Google workspace at, at USI and Gemini is included in it.
So every now and then the temptation is to, to go and check what Gemini does.
Simon: Simon? You know, I'm not knowingly using any AIs at the moment. I, I do go and kick the tires on them every now and then. So like Steph, I, I kick the tires on Gemini. Because I can, you do that without needing to buy tokens to put in the slot machine.
Right. I, I'm going to be quite interested, so to go back to Aaron's earlier question, you know, the biggest challenges with doing that are indeed the, the the belief that the copyleft is, is carried into the statistical model. And I, I, I, personally, I am of the view that the courts are going to find that ridiculous.
They have done so far in each attempt in the U. S. We've, we've really not seen very much happening elsewhere. But until we get over the idea that the copyleft has been carried into the model, we're not going to see anybody then using the model in some software. And I think that's a big obstacle to seeing AI tools, AI ending up in open source tools.
I, I also think that we're going to see another generational change in AI coming about, where the technology is going to be made differently and deployed differently to the way that it is now. And I think that's, that's a big obstacle. Quite likely to wait for those, those cool things to make their way in to open source tools is going to need to wait for that.
Having said that, there are already AI hooks in an awful lot of places. I don't know if you've realized this, you know, you look in home assistant, for example, so I do use home assistant. There's a great big AI hook in the middle of home assistant so that it can do voice recognition. There are great big hooks in Mastodon for going and doing AI translation.
So actually, the, the, the question you asked, you know, when are we going to see open source supporting AI, it's already happening, but the way it's happening at the moment is by providing hooks to go use external systems rather than by building the capability into the product itself.
Jonathan: And
Simon: there's a lot of that already happening.
But as a community or a community of communities, we've got some very hard conversations to have sometime soon, if we're going to see freedom respecting software, do AI rather than just call out to freedom, non respecting software that does AI.
Yeah,
Aaron: right. To your, the, the agentic model, I guess, as it's known, it seems like that's a big word.
That's thrown around a lot these days.
Jonathan: To your point, Simon, there, about whether, and it's the same question that I asked about the kind of inherent, the inherence of the copyleft when it's part of the training data and how that's gone in the courts. I imagine what we're eventually going to see is essentially a test.
Right? There's going to be a legal test that says, you know, if, and I don't know for sure what it's going to look like, but like if you can create a prompt that gives you this many words in a row that matches the the input, then you have inherited the copyright from the input, you know, it's, and it may not be exactly that, but like, I have to assume that there's going to be some court case that's going to give us a test that sort of everybody can agree on.
Okay. Because, obviously, like, you could just copy a file through a black box AI that doesn't do anything, call it AI, and
Simon: I mean, there was a cartoon to that effect, wasn't there? Well, someone
Jonathan: tried to do that with Beatles music, years ago. And they, they ran the Beatles records through their sonic maximizer.
And I don't know if they even use the AI buzzword or not, but they're, they tried to make the point that, Oh, this transforms it so much. It's a completely new work. And of course the courts completely shot that down and, but there's got to be a happy medium in there somewhere. So I, I expect that at some point there's going to be an agreed upon test of some sort that here's the, here's the guideline for where your AI is transformative enough.
that you're not inheriting that copyright.
Simon: Yeah. Well, I'm not going to second guess it. You know, but I, as an accident of what I do for a living, I know a lot of lawyers. And I have yet to meet one that thinks that that the license that the, the the statistical model is a derivative work of the source data.
Now that maybe that's going to change sooner or later, but at the moment If you, if you want to go make a court case that the some AI has copied your work, it's going to be a really tough case to even get legal counsel to defend you on, and I draw your attention to the fact there's a lot of no win, no fee work going on in this area.
But Steph disagrees. I can see him waving. No, I, I don't disagree.
Stefano: I, I think that it's the, it's a less interesting question for me because the more interesting question is, should it consider a derivative or not? And I, I think that we should be running that exercise a little bit more consciously and think about the consequences of either way.
Like, either way, what happens and what's the worst outcome possible in, in either parts? And and maybe, maybe we can even influence courts. But we should do it with consciousness, right? It's not just the gut reaction. Say, Hey, that code is mine. I should be getting something remuneration for, for copilots capabilities, for example,
Simon: or,
Stefano: or Dolly, the fact that it, you know, reproduces works that look like mine, like I need to get compensated.
All right. Should you, what happens if we do?
Jonathan: Yeah, there's a hole and we do not have time to go down this rabbit hole, but there is an entire rabbit hole about whether it's a good thing that we can have a machine produce an image and we're no longer paying an artist to do it, right? Like, that's its own entire conversation that, yeah, I think is important to think about, but as I said, we are not going down that rabbit hole today.
Maybe someday in the future. I do, I am required to get a couple of final questions in myself. And that is, what are each of your favorite text editor and scripting language? Stefano! Tsk.
Stefano: Yeah, it's passed on the scripting language. I really don't code anymore. You know, I played with Bash and Python.
But I, right now, I'd probably ask a co pilot at JGPT or something to do it for me. Is there a text editor in there? Text
editors VI, VIM is one that I usually fire up when I do some quick stuff, but I'm not really a text editor guy. Yeah.
Simon: And I'm very disloyal to my text editors. I've been using different ones throughout my career.
Yep. The one I fire up most often at the moment is Nano. Sure.
Yep.
But hey, you know when, when I was IBM, we were using e and I, I used to work on, I used to work on, on a word processor. I used, I worked with WordStar and Word Perfect back in the day.
Jonathan: Yeah.
Simon: And I've got everything loaded on a, on various computers around here.
Scripting language is more interesting though. I'm doing it all in YAML at the moment because I'm doing all this home assistant stuff. Yeah. You
Stefano: consider that a
Simon: scripting
Stefano: language?
Jonathan: I'm not sure that, that yet another markup language really counts as a scripting language, but
Simon: And yet I'm programming the whole of my home assistant deployment using YAML, using YAML pages.
Jonathan: All right. That's fair. I suppose. All right. Thank you guys both so much for being here. We appreciate it. And it's a fascinating dive into the, some of the questions and some of the answers about open source AI. Appreciate it very much. Thank you. All right. Man. What do you think?
Aaron: Yeah, I mean, it is, I'm, I'm glad that it's being done, I feel like mm-hmm
With other areas in tech, we kind of missed the boat. Open source hardware and some of those other things. Yeah. Social media, for example. Yeah. You know, it, it, it feels like now there's all this concern around social media. It's like, well, where were you five, 10 years ago? Right.
Simon: Yeah.
Aaron: When people could see some of the problems that it was gonna have, and at least with AI moving so fast, I'm glad that.
Somebody at least is thinking about these things and coming up with these definitions because otherwise, you know you who knows what what could happen? Right? And like Stefano said, we need to have the discussion, right? Even if you don't like 1. 0 of the definition that's out today. Talk about it. Have the discussion.
It's what it's what we've always done going. Back to early days of networking, for example, think of all the discussions and all of the groups that were around to try to define how we're going to make this thing work, right? Without discussing it, you're, you're, you're just opening the door for not necessarily bad things to happen, but unwanted things to happen.
And so yeah, I'm glad it's taking place.
Jonathan: So I think it was Simon that told us something that absolutely terrifies me. And that is that in Europe, there are laws that refer to open source AI, and they referred to that with there being no definition for what open source AI is. And it's like, that encapsulates the central problem that many of us have with like over regulation and, and just government's, Stepping into things that they really don't understand.
That's, that's, that's humorous. And all of that to say, I'm glad that there are people working on this and, you know, people that sort of have an idea of what they're doing.
Aaron: Exactly. People that understand these things instead of. Politicians who don't just coming up with, you know, random things that they throw darts on the wall and come up with sometimes.
Jonathan: Yes. Yes. All right. You have anything?
Aaron: What we, you know, we should, we should talk about our favorite AI tools that were you, what's your favorite, Jonathan, what are you using?
Jonathan: About the only one that I use these days is when you Google something. Sometimes you'll get like the, the, and I guess it's Gemini, right?
So you'll get like the AI answer to your question. And that is getting to the point to where it's useful. I, I do not consider it trustworthy, but it is useful. And then I've done a little bit similar to what you did with like here's a prompt, give me an image that looks like this. I've, I've done some playing around with that.
With varying degrees of success. Some, sometimes the prompts can be, or if you're good at writing the prompts, sometimes you can get really good results. And then sometimes you ask for something a little esoteric or off the beaten path, and you just get weird results, which I guess is kind of fun too, but not really what you were looking for.
What, what tool do you make you most use of?
Aaron: I've probably got to pick JAT GPT. I mean, I do pay for the plus license and I use it multiple times a day. Okay. For personal and business use. And. It's, it's, it, it impresses me more than it disappoints me, that's for sure. So I use it to summarize notes from, from meetings.
So, you know, I'll have like hour and a half long meetings. We get the transcription, feed it into chat GPT, say, can you summarize this? And it does a great job. And there are other tools, of course, that do that. I use it as part of my YouTube channel for sometimes creating thumbnails. Images, if I just can't find anything that I'm allowed to use sometimes.
I'll use it for that. And of course I use it for you know, like I said, removing background on things, just things, shortcuts that, that, that I can get it to do that would take me. You know, 10 times as long, I can say, look, just like generating that image today, for example of the guy coming out of the garage with the open source tablets, right?
I mean, yeah, that was
Jonathan: pretty good.
Aaron: Yeah. So things like that, that you can do just as time savers really is the biggest way that I use it today.
Jonathan: I think, I think I'm a little gun shy on using AI because I write for Hackaday. And like, I do not want there to ever be the even the appearance of crossing those streams.
Right. So every word that I write for hackaday always comes directly from me. The, the most in AI is ever involved with is Google will tell me that I misspelled something and that's it. And we've had some conversations internally at hackaday and that's pretty much the conclusion we've come to as well.
You know, there, I think there's, was. Once one of our writers used an AI image as like the headline image and we kind of got shot down internally it's like let's not do this unless we decide that we're gonna be okay with this and So I'm I'm pretty careful to not use much AI just kind of because of that even as a
Aaron: research tool though for for the background for the For the piece that you're writing
Jonathan: about the only thing is if I google If I Google something, you know, it'll, it'll come up with the the, the, the Google summary of what it thinks the answer is.
And again, I've kind of found out that that's not necessarily always trustworthy. Sometimes it'll tell you the exact, now it's getting better, obviously, but sometimes it'll tell you the exact opposite, or it'll pull something from an article that's not really about the thing that you asked about. So even, even then I've, I'm pretty careful to go in and try to find actual hard sources.
I use
Aaron: it. I use it for research and I tell it to give me sources and then I can go take a look at the sources and validate.
Jonathan: Yeah. Yeah. And that's, that's apparently a really useful hack. Like you make your, you make your AI assistant. Way more accurate. If you tell it, it's, it's kind of like with people, you ask them, Hey, what's the answer to this?
And give me your source. Well, we're going to work a little bit harder to make sure we give you the right answer. If we also have to give you a source, apparently that works for AI too.
Aaron: Exactly. Exactly. Yeah.
Jonathan: All right. What do you want to plug if we let folks go?
Aaron: Well, I mentioned the YouTube channel.
Check it out. I've got two channels. That's important because I've actually been publishing more on the second channel than on the first. So there's a RetroHackShack, of course, which is the main channel. I do more history videos there, more big projects there, and they come out a little bit less frequency, less frequently.
And then there's also RetroHackShack After Hours. And the last two videos I did there, one was a repair on a 1990s motherboard where the keyboard didn't work. I couldn't get any keyboards to work, so that was a short one, and you can go watch that one and figure out what happened. And then the one before that was oh, I do a lot of e waste stuff on the second channel.
I find stuff at e waste and just, just bring it home and start the camera, and I go through and figure out what happened. Like on the fly, like what it is, what's working, what's not. And so that's, that's a lot of fun too. So check out both channels and you know, hopefully there's, there's still people out there that like vintage computers.
Jonathan: Oh, definitely. I know. I watched that, that last one about the the keyboard not working and I won't give it away, but I will say the component that you replaced, I would have looked at and probably just thought it was a weird resistor. So that's actually, that's actually pretty, a pretty useful bit of knowledge for me to add to my own.
Fix it toolkit.
Aaron: Right. Right. And I would say probably it wasn't long ago, you know, 10 years ago, I would have been in the same boat. I would have been like, Oh, what's that thing? So, yeah, it is, it's, it's fun. And that's, what's fun about it is you learn about what things are and how they work and how they used to work.
So yes,
Jonathan: absolutely. Absolutely. All right. So next week we've actually got something really interesting. I believe, yeah, we're talking with according to the calendar. Hopefully it's right. We have another Stefano. We're talking with Stefano Zaccaroli about the Software Heritage group. And that's going to be a lot of fun.
And then the week after that, hopefully we're going to have someone from CIQ to talk about Rocky, which we've talked to all Malinux. We figure we might as well talk Rocky. They were there at the very beginning as well. And then the week after that, we're talking Thunderbolt. Thunderbird, so all kinds of fun stuff coming up.
You don't want to miss it. If you want to follow me and my work, of course, there is Hackaday. We appreciate Hackaday being the home of Floss Weekly. I've got the security column that goes live pretty much every Friday morning there, and you can check that out. Got a YouTube channel that you can find.
There's also the Untitled Linux Show still over at. Twit, that's twit. tv. I think it's twit. tv slash ULS. But we have a lot of fun there talking about what's going on with Linux and lots of open source stuff, but more news of the week over there as opposed to the long interview form here. Yeah, we appreciate everyone that watches that get us both live and on the download, and we will see you next week on Floss Weekly.