The Monkey Patching Podcast: Going Bananas on AI, Data, LLMs & Tech | Transcript: AWS Went Down, Sora Under Fire (Again!?), Claude 4.5, Music Without Musicians

AWS Went Down, Sora Under Fire (Again!?), Claude 4.5, Music Without Musicians

October 22, 2025 / 47:57/E18

Murilo (00:07)
Hi everyone, welcome to the Monkey Patching Podcast where we go bananas about all things outages, haikus and more. My name is Murillo, I'm joined by my friend Bart. Hey Bart, how are you?

Bart (00:18)
Hey Marillo!

I'm doing very good. How about you?

Murilo (00:24)
I'm doing good today. think last week I had actually a intense week and today also was a long day, but we're here. We're good.

Bart (00:33)
My day, we'll get to it, but my day was messed up by AWS. Really.

Murilo (00:38)
Oh really? I will get to

it indeed. Like a lot of of a lot of a lot a lot of of a of

Bart (00:43)
A lot of people's

days were messed up by AWS today.

Murilo (00:47)
Yeah, indeed, indeed. We'll cover that, don't want to spoil too much. And today we have a maybe slightly shorter, I think, episode today. But let's get to it, right? We have

Major Belgian chains like Brico and Carrefur are swapping licensed playlists for AI generated background music, a shift that could reshape in-store soundtracks. Rights agency Sabam says musicians could lose roughly a quarter of public play income as retailers chase lower cost and bespoke royalty free tunes. So this article, yeah, I think it was a bit...

I was a bit surprised. Basically, are a lot of retailers here in Belgium where we live, right? That basically they're saying, we don't want to pay royalties for music producers. So instead, let's just use GenAI music, right? And then they can even use GenAI music for doing seasons like Christmas and stuff, or Easter or whatever. And that way they can save a lot of money.

Bart (01:25)
Yeah, I didn't know that was going.

Murilo (01:51)
which as a business argument I get, but I think the feeling that I get and I think, I mean, I'm curious what you think, but it feels a bit unethical, right? It feels like a bit like, because I mean, also you, on the other side, this guy, Sabam, I think, he also lays out how much it will cost for the actual music producers, right? Which is a quarter of their income. So it's a huge impact. There's a lot of already, let's say,

controversy around how this AI generated music is generated, right? So it is already still based off of musicians as well. And now this bypass, it feels very unethical, right? I think I get it as a business case, like you're saving money basically, but it feels like we should be tilting the other way, right? So it's almost like you lose faith in society a bit. I don't know how you feel about this part.

Bart (02:22)
Mm-hmm.

I have bit of a different point of view on this. Maybe just as a shopper. If I go to the grocery and I've never even noticed that there's music on. So I don't really care and I think it adds for me personally much value. think Sabam, which is like the agency that these retailers need to pay and they then distribute it to the artists. Sabam is also not without controversy themselves.

Murilo (02:46)
really? Tell me.

Bart (03:08)
like the charts was for artists that are not under their, that they don't have any formal ties. So like there was some deep dives into this in the past. I'm not a major fan of Sabam. ⁓

Murilo (03:18)
really?

So basically they make money off of... random.

Bart (03:21)
Well, basically

like here at Retailers was probably a big fee, but almost every company in Belgium, if you have a public space, you need to pay like a forfeit, like a fixed fee to Sabam because potentially you play some music that someone hears. then Sabam basically pays this to the artist, which is good that the artist gets paid, right? Definitely not against that.

Murilo (03:38)
potentially.

Okay.

Bart (03:46)
I'm a bit... To me the problem is not like the... Because you could make the same argument on like there's no live music anymore, right? Like where are the live artists? They're not there, right? But to me the problem is not necessarily that it moves to AI-generated music. I think it's a smart move, business-wise, like fix a bum, right? But the problem is that there is no royalty payment to music that AI models are trained on.

And that there is also no recurring thing like, okay, I'm gonna use this note is activated and it's probably linked to that song. So we need to, every time that this use or generate something, we need to pay that artist and that doesn't exist yet. And I think because it doesn't exist yet, I hope it will someday. That's why this also not only f**ks over Saban, which I don't care about a lot, but it also f**ks over the artists. Because there is no clear copyright on this, right?

Murilo (04:33)
Yeah, but I think but but

but like why would anyone care if it fucks off fucks over sub bomb like who cares about sub bomb I guess like. OK, no, but is there anyone like do know is there anyone that OK, it's just them because like the way I understand this, you're explaining like they're the means to the artist, right?

Bart (04:40)
Well, I fully agree. Yeah, yeah, I agree.

sub on themselves probably.

Yeah, they're the organization that takes all these funds and distributes it to the artists.

Murilo (04:58)
Yeah, but it's not like that organization itself has value aside from compensating the artists.

Bart (05:04)
Well, it probably also pays people wages, but indeed there's limited added value in the whole supply chain of delivering music.

Murilo (05:07)
Yeah, yeah, yeah, yeah, yeah. But it's not like they have a product or they do. Yeah, OK,

OK. Yeah, I see, Yeah, I think they also mentioned the neighboring countries, they're moving away from this as well, right, from the AI generated music. So.

Bart (05:22)
Yeah, well, Sabama is a Belgian thing. Most European countries, US definitely also has this type of agency that does this.

And what I would be wondering about, they mentioned a big number, like 25 % of public income, I would wonder for a popular artist in Belgium, how much does it actually translate to, because I can imagine that public income is probably only like, what is it, 5 %? It probably be a lot of their annual revenue.

But yeah, it's still a loss, Like they were playing their music. And the problem is of course, if, and again, like in AI-generated music, are probably stuff of them used where they are not correctly attributed and not make any money, which that is the problem here, right? Not that.

Murilo (06:04)
Yeah, yeah, Yeah,

but I think that's not a new problem necessarily, right? I think it's like, I mean, same thing we saw with, yeah, yeah, yeah. No, but I fully agree. I fully agree.

Bart (06:11)
No, no, that's

But

I do think that that's the that you don't have this attribution of artists in Gen.AI, that that problem makes that this feels unethical what the retailers are doing. Otherwise people probably would care less, aside from the cultural aspect of this.

Murilo (06:26)
For sure, I fully agree, I fully agree. feel like if that... I agree.

Yeah, I feel like if that problem was solved, then people wouldn't care about this indeed. Right. And maybe just a clarification for me, like Sabam then, like if you potentially play music from an artist, you have to pay. So like even if you have a space that you pretty much never like if you have a public space.

Bart (06:38)
No. I we're on the same page here.

Murilo (06:58)
You already have to paste about that. That's how it works. Or is there like a criteria, like you have events and you have this and you have that and you have to. Okay.

Bart (07:04)
Definitely not an expert on this, but if you

are a company and you have public space, then your accountant will probably say, it's better to pay like a small fixed fee and we're not going to get into difficulties with them.

Murilo (07:14)
Okay. Also, like maybe you mentioned also, I thought it was funny. You said like when you go to Kai food or do your groceries, like you don't care that there's music. Do you think it would be as much of a backlash? Like if they just say, I'm going to stop playing music.

Bart (07:16)
That's how I look at it.

I I would notice.

like, if like my goal is to get the stuff that I need to get and like only just below that my goal is to my second goal is to get out very quickly. All right. Like it's not like to me, it's not like that's experience the top 40. Right. Like just chill down between the melons, watermelons.

Murilo (07:38)
Exactly, right?

It's like you just go in there, take headphones off.

Yeah, exactly. Like on the frozen section on a warm day, you just vibin', right? Yeah, true.

Bart (07:57)
So yeah, maybe

there's an audience to that, right? But then, if there's an audience to that, then probably should be live music.

Murilo (08:03)
Yeah, I mean, if you want to give the full experience, probably live Yeah.

Bart (08:07)
Yeah, full experience,

Murilo (08:09)
What I was thinking also is, is this Belgium artist only? Because when I think of Christmas, for example, the first song that comes to mind is not Belgium, Christmas songs.

Bart (08:19)
No, think Sabam, again, not an expert at this, but what I think is that they represent all ⁓ artists, more or less, that are associated to them, and definitely not all, but that are associated to them through big publishers. But they get the public fees from Belgium and they distribute them.

the Arab-Belgian Organization.

Murilo (08:37)
I see.

All right, all right. What else do we have Bart?

Bart (08:41)
A new community paper claims large language models pick tools more accurately when instructions are written in plain English instead of JSON schemas. Across 6,400 trials, on 10 models, reported accuracy rose about 18 percentage points and variance dropped 70%, though tests were single turn and parameter free.

That's interesting, Maybe you can explain a bit for the people that are not into the whole LLM tool calling things, like what is the JSON part here?

Murilo (09:01)
Indeed. ⁓

Yes. So basically when, when we say LLMs can call tools, what really we mean is LLMs requests tools to be called by the developer, right? The LLM never actually calls tools. And the way that it's done is because we expect something structured. We usually expect something in JSON. So for people with following the video is based something like this, right? That basically says the tool calls and then it gives a name and a description. And then the, the LLM

And again, we don't really know because it's not open, but we assume that the LLM really writes down the tokens for the JSON response. But because we can validate this JSON response on our side programmatically, and then before returning it to the user, we have some guarantees of how it's going to look like. So tool calling usually today, way you want, traditional way, let's say.

to provide tools and to get the tool results back is via this JSON, right? This semi-structure, so it's not really a table, but there is a pattern, like a key of value, key value, right? ⁓

Bart (10:12)
So

for example, I have this JSON schema that says I have these tools available and this tool is called web search and this tool is called send a message through Slack. These type of tools and then there's a description on how the tools can be used.

Murilo (10:29)
Exactly.

There's also a description of the inputs, right? And the types. So for example, a very classical one is like get weather. So the name is get weather. The description is to get weather based on coordinates. And then you have usually a latitude and longitude, which is our floating points, right? And then the result is gonna be a number that is gonna be in Celsius or Fahrenheit or whatever, right? So basically you specify everything in the JSON. So it's kind of like a schema.

and then the model will know, okay, I need to call this and I want to these inputs. And then you run on your side and then you give back to the model the two results and the conversation history. And then the model continues processing everything.

Bart (11:12)
And what this paper, this new paper is on archive, what this paper is saying is, fuck this JSON file, but do this in natural language. Like write down, if the user requests the weather, then call this thing through these and these parameters. That's how I understand it.

Murilo (11:30)
Exactly. So this is also how I understand it. Basically, they said that when you specify some things in JSON, when you ask the model to output some stuff in JSON, the model focuses more. And I'm saying focusing with quotes here, air quotes for people listening, because it's not a person, right? But the model is more focused on getting the JSON output correct. And the content of the JSON output is not as accurate as it could have been.

So they say, actually, if you forget the JSON, right, and I think they mentioned a paper here from it, research demonstrates that the structure models output, the more its performance tends to degrade, and there's a paper link here. So basically they say, if we forget the JSON, because if you ask JSON, it focuses more on the JSON than the actual content, then the actual tool selection is much better. So they call this, they still have like this JSON contract, they call it the natural language tools here, right? And the way it looks, the way that they propose it to look,

is down here. So basically says, it's basically like you really just text, it says thinking, and then you have to, you expect the model to give a brief reasoning and then two one yes no, two two yes no, two three yes no, and then assessment finished. So basically they really want the model to say just yes or no for each thing. And there'll probably be another tool call to parse that out because then the decision of calling a tool or not is already done. And your model just needs to parse it out and just say call this, this, and this, this tool. All right, so they break it down a bit.

And they said that with this, the performance actually went really, really high. Well, really, really high. Boost the accuracy by more than 18 percentage points from 69 % to 87%. Yeah, indeed. And I think if you think more of the agent-tec AI, right, which depends a lot on the calling the right tools, this is very significant, right?

Bart (13:01)
That's a lot, right?

Yeah, and like they're also saying here, they use this with DeepSeq V3 jumping from 78 to 94, which I mean, this is already a good model, right? Like you could say like maybe there is like a bit of a weird side effect on a small model, but cool.

Murilo (13:20)
reading exactly

Yeah, no, indeed. The thing I

saw somewhere as well, they had two domains. So was mental health and customer service. So again.

Bart (13:36)
Interesting, maybe

it's also specific to the main. ⁓

Murilo (13:38)
Maybe, but I

think in any case, I think it's an interesting finding. They give a full template here of what you should prompt the LLM, right? Saying you're an assistant, these are the tools, me the format, the output in this format.

Bart (13:48)
Good.

Murilo (13:50)
One thing, there are some limitations in the sense that, for example, they are measuring here tool call accuracy. So, but there's already an assumption there is a tool call. And I think sometimes like models, when you're actually using models, the model doesn't know if it needs to call a tool or not. Right? And also by doing this, you also need to have more LLM calls as well. Because the way I could see this using working with DeepSec V3,

Bart (14:06)
Hmm.

Murilo (14:15)
is that basically every LLM call, you have to output something like this that is shown here, right? To just say, do we need two calls or not? And if not, then you just answered the thing. But then every time you're need that extra two call. And if there is, if one of these answers is yes, you still need to parse it out afterwards saying, okay, if it is a yes for one of these two calls, now make the two call, right? So you probably need an LLM as well to assess the output. So you do increase the number of two calls.

Bart (14:21)
Mm-hmm.

Yeah.

Murilo (14:45)
as So it's not super practical as well.

Bart (14:47)
Mm-hmm.

Yeah. Maybe also to make a parallel here to something that we discussed shortly last week, Claude skills. could call, maybe it goes a bit too far to go in depth, but Claude skills are also like they're described text, right? And like it's by injecting this text into the context that it knows that it can use these things, right? And then maybe it's like you can maybe make a parallel here. Yeah, interesting.

Murilo (14:54)
Mm-hmm.

It's It's

true. It's true. Yeah, I actually have seen.

Bart (15:14)
I think

the sample size of one, but like having a model even know that they need to call a tool. you notice this very well with Gemini AI. Like, and there's a big, because Google offers like Gemini AI at the web app or the mobile app and AI studio. And you can use the models in both, but like in the web app, you need to really say, come on, buddy, like use the image generator. You have an image generator, like try it out. Like you need to go like.

motivated.

Murilo (15:43)
It's like a shy guy with like a shy buddy with

like a tool, know, shiny. It's like, no use it. Yeah, you got this. You know, you can do this. Yeah. I also, didn't have this, but I was working with a colleague and I said, oh yeah, you can give tools to them all and do this. And then I actually saw the code afterwards and it's like, use this tool for this. Like you shouldn't have to tell it, right? But like the reality is like, come on, like I swear it works. You know, like I built it myself. I swear. You know, so.

Bart (15:49)
Exactly. ⁓

Pretty please, pretty please.

Supervibe-o!

Murilo (16:11)
Yeah, indeed.

One other thing I thought was interesting before we move on. They actually said that by using the NLT, you actually reduce the context length. They said 47.4 fewer input tokens on average. So basically like half of the things is just JSON boilerplate, which I was...

Bart (16:29)
Jason is very verbose.

Murilo (16:31)
I knew it rebuffs, half is just boilerplate. I was like, that's a lot, no?

Bart (16:36)
It happens a lot, yeah. more than... But it depends maybe how they calculate context, because... It depends also a bit on what is the full thing that you're trying to do. For example, if you also include that you want structured output, and you include JSON schema, which is even way more verbose, it quickly adds up, right?

Murilo (16:54)
Yeah, that is true. That is true. Yeah. Well, I thought it was interesting, but I was still... was like, well, sure, like, they have... Okay. It is a big...

Bart (17:03)
It's a big jump, Like, feeling

like this needs to be bigger news if it's such a big jump.

Murilo (17:08)
Yeah, yeah. And I think, yeah, indeed. So you gain accuracy, you're losing number of LLM calls, but then you also gain in terms of context reduction. So I'm wondering like how much the costs actually go, right? So to be seen.

Bart (17:20)
Yeah,

Murilo (17:23)
Moving on, we have Japan formally asked OpenAI to curb Sora videos that imitate manga and anime, escalating a broader fight over copyrighted characters in generative media. Minister Minoru Kiuchi called the art forms, quote unquote, irreparably simple treasures as OpenAI faces backlash over its opt out policy and waves of lookalike content.

Bart (17:49)
Yeah.

everybody saw this coming right?

Murilo (17:51)
Yeah, I think I'm surprised that it took this long, if I'm being honest.

Bart (17:56)
Yeah, true.

I think with Sora, we see a bit of a, I think in the last six months, there was a bit of a turning point where at some point everybody said like, fuck it when it comes to like generating copyrighted visuals or characters or whatever, ⁓

Murilo (18:09)
Yeah.

Yeah,

I thought it was funny that I think opening Ike sued someone because they claim that someone used their models to train their models or something. And it was like, are you sure? yeah, it was like, like the whole, like the, the, the Ghibli when images came out, the Ghibli studio thing, was something so open and in the air that I really thought that Japan didn't care to be honest. So

Bart (18:25)
Yeah.

Bit laughable,

Yeah. And what happened here now more specifically is that Japan's cabinet office via their minister Minoru Kiuchi, like they basically they formally asked OpenAI to stop infringing on their on Japanese related artwork like manga and anime works. And

Sam Alton also acknowledged through some forums that OpenAI has a debt to the creative output of Japan. Based on that we were able to pull a lot of stuff off. We'll see what comes from it, right? It's again the same problem that we also discussed with AI-generated music. There is no attribution to the original...

So that is the basic problem here,

Murilo (19:26)
Yeah, fully agree. think any creative,

Bart (19:31)
No, maybe that's

making it too light. That is a significant problem. The other problem is that no one asked them if their material was okay to be included in this training data.

Murilo (19:42)
Yeah, no, for sure. creative like art, right? I think they all kind of suffer from this problem, right? There's no attribution. It's like, I think the law is kind of catching up as well. One day I thought it was as I was reading this as well. It feels like a very Japanese response, right? Like they take it up and then they can't not take it anymore. They politely ask, can you please stop doing this?

You know, and it's an altman is like, I mean, you're dead, but it's like figurative debt, right? Like, I'm not going to pay you, but thank you. You know, like, it's very, I don't know, it's very polite and civilized, but not sure if that's the way to go either. Right.

Bart (20:13)
Yeah.

Yeah, it's just an example in a lot of different examples in copyright issues, right? When it comes to January. We will see how it turns out. I think that when it comes to January artwork, think the only quote unquote good player in this today is Adobe. That have their things called Firefly, their AI engine, things called Firefly.

And they traded basically on a very, very big set of stock images and visuals that they owned prior to this whole LLM hype, for which they paid the artist at that point. Probably the artist had the idea that this was going to be like their direct images were going to be used. Like they didn't know the future usage, but that is probably today the best example of how it should look like.

Murilo (21:12)
Yeah, I think they're also in a unique position that they were the only ones that could do it, right? So, mean, props for them, huh? I mean, they cashed in their chips at the right time, so.

Bart (21:22)
But they also could have said, Fokkeldis, like there's much more creative stuff out there as included, right? They didn't do it,

Murilo (21:31)
True, true, true. They could just say, like, I'm going to take this and that. Right. So no, it's very, very true. So still, feel like do you think we'll ever have a good answer for music, for images? Do think we'll ever get to a point that we feel like, okay, this is fair?

Bart (21:35)
Exactly.

I'm hopeful, but I think we're at this point very far off from a technical solution,

Murilo (21:54)
Yeah, technical, think also legally, I think we're catching up, right, with all these new things.

Yeah.

Bart (22:01)
Yeah, but I think it would be good to hear if there are advances, to connect with an expert on this, to see if there are advances from a technical point of view to have clear attribution, because I don't know of any that are truly robust.

Murilo (22:14)
Yeah, indeed. I mean, I'm also not in the space, but I think I'm sure that there is a lot of stuff happening there, right? And maybe we just get the headlines. But I would imagine that there more people thinking about these things as well.

Bart (22:27)
Yeah, I think the challenge is also like this technical challenge of going from something that you created through the model to understand what the content was that was able to produce all this. Like that's a very big technical challenge. think a lot of people that are not in the space also don't understand how hard it is. Because I hear this a lot like, yeah, but this you just paste something to the original artist, but it's not that easy, right? Like I actually heard...

Murilo (22:44)
Yeah.

Yeah.

Bart (22:55)
a or two, three ago, like Scott Galloway, on Pivot, think, it says something like this. It was about writers. But I mean, it's not that easy, right?

Murilo (23:03)
⁓

Yeah, like it's almost impossible, I mean, not impossible, but like it's not that it's not easy. It's like super hard, right? It's not something like this problem, even when the models were not as large, it was already a problem that was super difficult, right? Like it's not something that we could just fix it. yeah, I think that's the thing. think a lot of the people that are

Bart (23:16)
Super hard hair.

Yeah.

Murilo (23:30)
trying to come up with legislation and the laws and the precedents, they have little to no knowledge of how these things work. And these things are evolving very fast as well. it's a very tough position to be in,

Yes.

Bart (23:43)
I'll move to next one. In early tests, the 90.90 cents per month experience felt muddled with verbose answers, misfires on page interactions and confusions over which bot to ask for a task.

Opera launched their new browser. Yes, Opera still exists. The Opera browser. This one is called Neon. I need to pay a monthly subscription. And it basically has three big AI features. It's called Chat, Do and Make. "Chat" allows you to basically via a chat window interact with a page like, can you tell me, can you give me a summary of what is on this page, what dates are mentioned, which people are mentioned, like these kinds of things.

"do" allows you to say,

I know that I want to follow this event on Meetup. Please go to the page and then RSVP for me. Like these kinds of things, like it can automate everything for you and "make" it's like you can make very small applications. HTML JavaScript application.

Murilo (24:38)
Well, applications is

kind of like no code, low code applications or

Bart (24:43)
like through a chat interaction. like what I understand is that it creates a small virtual machine within the browser. So it's relatively sandboxed. And you can do like, it's a mini agent you can basically make. And what I understand from this review, everything is a bit meh. That's what I understand.

Murilo (25:01)
Did you try this yourself?

Bart (25:02)
I did not actually, but the feeling is recognizable in the sense that I did recently tried two other approaches. One is an open source one. I think it's called MCP browser or browser MCP or something. then I also last week I tried the Proplexity Comet, which is Proplexity's high browser. ⁓

Murilo (25:19)
Hmm This is the way before

this the browser MCP that's the one you were thinking or no but this is a open source project that connects to Existing browsers or no

Bart (25:26)
Yeah, yeah, that's one. ⁓

Yes, you can basically via cloud talk to your browser then or via whatever MCP ⁓ agent. But I think perplexities comment is more in line with what Oprah launched. To me, the problem with this automation stuff is that it's very, very, very slow.

So I went to a networking event and I had four names. I thought I want to connect on LinkedIn, these four people. And then it's like, can say this to a proplexes comment and then it goes to LinkedIn and tries to log in and logging in is already a bit difficult. You need to come in and come in between and it works. You get logged in and then it starts typing and then it types in the wrong field.

and then it tries to redo it. Until you get to the first profile of the four, it takes already so significantly longer than if I would have just typed in a name and pressed enter, right? It's very, very slow. then connecting, clicked the wrong button. It doesn't feel like this is not more efficient, right? Even if you could say...

Murilo (26:27)
Yeah.

Bart (26:34)
I don't need to be involved and I actually have a big task. It's not four but it's like a thousand people that I need to connect with. Even then it's so inefficient because it takes so much time, the current stage that we're in, because it tries to understand the DOM making screenshots to understand what is where on the image. The idea feels extremely promising. browser is completely automated for me, it does everything for me. But the execution today is not there.

Murilo (26:39)
Yeah.

Yeah.

Interesting.

So, and then like, it's low because it's taking screenshots, because it's analyzing the DOM, because I would imagine that even if it makes mistakes, but analyzing the DOM, it would still be pretty fast, no?

Well, so you've been very disappointed.

Bart (27:13)
Yeah, I think like I'm very excited about this idea because you can like, can, you can do this for a lot of things. I'm like, go to this web store and order this for me. Right. Like I don't want, I'm not like, if I know what I want, I don't need to do it myself. Like I'm fine. Just saying it's to a Chess GPT. I would, I would probably prefer that actually, but like how it is now is not a, with browser automation, it's, it's not there.

Murilo (27:36)
Yeah.

Well, but I don't know if you agree, but I think that maybe next year it will be much better.

Bart (27:47)
I hope so. I do hope so, yeah.

Murilo (27:49)
Yeah, because I do think I agree with you on the promises there. I think there's a lot of people racing to get there. It's not great, but I still think that because like you said, you mentioned, for example, CHPT, right? I know they re-talked that they create another protocol for commerce, right? It's competing with Google. So there's like you mentioned the example of buying stuff.

Bart (28:08)
Well,

the approach that ChetGPT took actually is just like, I'm going to bypass the browser completely. I'm going to make these mini apps within our controlled environment. So they said, this problem for now. We'll solve it ourselves. And with the upside for them is that they get this whole ecosystem, walled ecosystem. It's very, very difficult for other people to recreate.

Murilo (28:25)
Yeah, indeed.

Yeah, but it also feels a bit like via the browser is like you're trying to replicate what users do today with agents. And I think the way that OpenAI did is set up is like we already have agents. like why are we like why are we trying to replicate the more like now that we have this new setup with agents, there are more efficient ways to do it. But because it's a new way, you're going to have to swim upstream a bit, right? Like I think everything's on the browser. But I think when you talk about

Bart (28:38)
⁓ Exactly, that's what you do.

Hmm.

Murilo (29:00)
protocols and these new things like it's gonna take some time to get there right and I'm not even sure if it's ever gonna fully get there right if someone doesn't want to Have an even if it's like an API right if someone doesn't want to get there like you cannot interact with LinkedIn with the API then if they don't want to allow you then you won't so I think this way is like the more safe way, but It's clunky. I'm also wondering if with new models if you can train models to do this right like if

Bart (29:17)
Yeah, true.

Murilo (29:27)
they are also using, like they're collecting the data for the users of the browser and they can find two models or get models to work more efficiently there as well.

Bart (29:36)
Yes, specifically for those use cases.

Murilo (29:37)
Yeah,

because I think if you think of the way that LLMs are supposedly trained today, right, which is more like natural language and all these things, and there's the human reinforcement learning, all these things, it's very different from analyzing DOM and answering questions about it. I feel like even if there is data about it, it's probably very small, right?

I'm hopeful.

Bart (29:56)
I'll sign up for that.

Murilo (29:58)
All right, and maybe as we talk of new models, we have that Anthropic introduced Cloud Haiku 4.5 as a model aimed at fast, low cost coding and agentic tasks. The company says it delivers near Sonnet 4 coding quality at one third the price and more than double the speed priced at one dollar per one dollar input, five dollar output per million input output tokens. Yes.

So a new model, I mean, we had the 4.5 Sonnet, want to say, Sonnet and Opus, I want to say.

Bart (30:31)
We don't have Opus

Murilo (30:33)
Ah yes, yes,

So we had the Sonnet 4.5, correct? Yes.

Bart (30:37)
Yes, exactly. A

month ago, no, two months ago, something like that.

Murilo (30:42)
Yes, so the Haiku is the small, the tiny model of the tree, right? The Opus is the big one or the reasoning one as well. So I guess we can expect that one next. Not much to say here. I mean, it's better than the previous ones. They are saying they compared the 4.5 Haiku with the 4 Sonnet because here on the putting up an image for people following video, 73.3 % accuracy for Haiku 4.5 and 72.7.

on Sonnet 4, so it's a smaller model, but just as much coding expertise, let's say, or coding efficiency. I think it's really cool because they also use this on Cloud Code, right? So for your terminal agents and all these things, they're also going to improve a lot, right? So it's small change for model users, but if you're a Cloud Code user, you should expect also big...

performance increases, like on the accuracy for coding stuff as well.

Bart (31:37)
It depends a bit on for which task and they do some optimization on which model they use for what.

Murilo (31:44)
But I think it's a lot of it is Haiku 4.5, no? That's what I actually... no, okay.

Bart (31:48)
I don't think for good riding, to be honest. No,

think it's a Sonno 4.5.

Murilo (31:54)
Anything that surprised you here Bart?

Bart (31:56)
No, I have bit of a double feeling on this. In a sense that I don't really care for my day-to-day, because for my day-to-day you need to have the best model, which today for coding is arguably some 4.5, which I use heavily. I tried Haiku 4.5 today, for coding I wasn't really impressed.

I think they also did a little bit of searching like on Reddit on this. I think that's kinda mimics what the community is saying. Like it's good, but it's not like if you have the choice, you're gonna go for Sonnet. Right. My other point of view on this is that I'm very happy that we have this as a mini model, right? Like this type of performance that is better up to power, sometimes better than Sonnet 4, which was by far the best model six months ago.

So if we, I think we're still at a stage with this that there is, we're not in diminishing returns yet. Like everybody, if there's a better model, like everybody wants to use it for the day-to-day coding, but we will get into like where there is a diminishing returns for day-to-day coding task. then like efficiency becomes a much bigger player, I think.

Murilo (33:06)
Yeah, I agree. I think.

Bart (33:07)
And what we're seeing

is that it's possible to do this. What state of the art for us a year ago, it's possible to do way more efficient, and that's exciting to see.

Murilo (33:17)
I think also you see a bit the specialization, right? think they just talked about software engineering, right? When someone else means they talk about math, software engineering, a whole bunch of things, right? So I think I agree. think it's good. Maybe question also, when you're coding, do you ever have prompts or tasks that you're like, ⁓ Haiku 4.5 could probably deal with this. So maybe, I don't know, for example,

Simple things like create a pidentic schema for this JSON. know, like so something that is really just like reading just, you know, almost like this, this, this, this almost like mechanical, right? You don't need to think about it. Do you ever waste mental energy quote unquote by saying, ah, this I'm going to ask Claude the 4.5 Haiku because it's very simple and it saves some money. Or do you just say, I'm just going to use all that for everything.

Bart (34:06)
To be honest, I used to do this. Not necessarily for the pricing or something, but I used back in the old days, which is six months ago. I used to say that for building implementation plans, I used Opus. And for the actual writing of code, I used Sonnet. But to be honest, leave that up to more, not always, but most of the time I leave that up to plot code.

Murilo (34:26)
So today, use mainly Cloud Code for your Cloud Code on VS Code, I guess.

Bart (34:31)
Nordic win with 4 is good.

Murilo (34:33)
I'll read it just on the terminal.

Okay, cool. What else we have?

Bart (34:36)
Let me have a look. A widespread AWS outage tied to DNS knocked major sites, banks, and some government services offline before Amazon set a mitigated issue. Apps from Coinbase and Fortnite to Signal and Zoom were hit, underscoring how a single infrastructure choke point can ripple across the internet.

Murilo (34:41)
Yes.

Bart (34:59)
Yeah. Um, so this is very recent, right? Like this is, uh, this is news of today. Um, I was, uh, this morning, uh, arrived at the office and said, uh, I should actually need to, uh, to, uh, finish off that slide deck. And then I went to Canva and then Canva was down. I said, fuck it. And then I refreshed it like 10 times and it was still down. And then I said, sorry, I didn't turn off the laptop. And then I thought.

Murilo (35:20)
It's like, didn't work. Turn off your laptop. Turn off your laptop. Do it again.

Bart (35:26)
And then I thought, okay, I have this, working on a small project. It has a Postgres database. I'm hosting the database on neon.tech. I'm going to neon. Neon is down and then something clicked. I need to Google this. Like something more is happening, right? And then it turned out like half of the internet was down. Yeah, like we were saying, like Coinbase was down, Fortnite was down, Signal was down. like can be a very important communication tool.

Murilo (35:40)
Ha

Bart (35:51)
Apparently Ring was down, security cameras was down. A lot of banks were down. Yeah, crazy. Right?

Murilo (36:00)
And

do they say why? They went down.

Bart (36:06)
It's a DNS issue. I looked at it when it just came out, the news was still coming out. I haven't looked into it yet, we'll try to do a debrief next time. To a bit of understanding, I think the news is still coming out as we speak.

Murilo (36:23)
but it's not like there was something in the ocean that was cut or something, right? Like with Azure.

Bart (36:29)
No, not that I know of. Apparently it was a regional thing, was the US East, it was not even Europe.

And it also shows that even if you use, like for me, my example, my neon stuff is hosted in the EU region, but it still depends on some things, like probably for routing and proxying that are in the US and the whole website was down and they very quickly fixed it and it came back up. But like still, like you're still tied to regions, right? Even if your specific resources are not hosted there. And again, from my...

Murilo (36:48)
Mmm.

Hmm, interesting.

Bart (37:02)
I I just like forgot my day a bit, but for me not a problem. like, these were big, like signal being down, it's big, right? Like ring being down, like this can have major implications.

Murilo (37:14)
Yeah. Yeah. It also...

Bart (37:16)
And this is

especially like, like in a backdrop where we're all saying, oh yeah, I'm not going to go host my own stuff on something like heads down because AWS this uptime is like way better. Right? Like we need to beat her for the uptime and then, and then they take the whole internet down.

Murilo (37:32)
Yeah, it's like the dimension of it. Like you mentioned one region in the WS and it's like, wow, like, like their footprint is huge. Right. So, yeah. Does that make someone say like, want to self host because I don't want to depend on the WS. Does it, these type of episodes change your view on things? Because it doesn't happen often still. Right.

Bart (37:55)
Yeah

It doesn't happen often though.

Murilo (37:58)
Do you think it happens? And actually, I don't know. Really. I don't have a good estimate. But do you think if you had self hosted something, do you think it would be less likely for something like this to happen? Actually.

Bart (38:08)
No, I don't think it's less likely. But the question... No, no, I don't think it's less likely, to be honest. But you do have more control over it.

Murilo (38:15)
You never control. Very true.

Bart (38:17)
Right, you know what is happening under the hood. Or you should know, probably. But like these type of uptime things, they are very difficult in a sense that...

They call this the... In this world they call this the March of Nines. The March of Nines. Where you say like 99 % uptime results in, I don't know, don't know the math by hand, but like, I don't know, it's like three days that you're down. Right? 99.9, like it's already way smaller. But the effort to do that is way bigger.

And then 99.99, like only very small downtime, but like the efforts we've reached that is really exponential. ⁓

Murilo (38:53)
Yeah. Yeah.

Yeah, it's like really

like you add so much more effort to get that little bit more guarantees, right?

Bart (39:02)
Yeah, exactly.

And I think what happens in practice a lot is that it is a very easy to say, let's go for AWS because they promised 99.99, maybe even 0.99. I don't know. But percent uptime, question is if you even need it, right? That is the question, I think. Because if you know that it's 99.9.

Murilo (39:23)
Yeah, that's true.

Bart (39:30)
And you can also have impact on, okay, we need to switch out this hardware or we're gonna have some new routers in place and there will be downtime between 6 a.m. and 6.30 and I'm gonna let my users know three months in advance that this is coming and I'm gonna send them a lot of reminders. I mean, the impact of that is way less dramatic than AWS going down and everybody being surprised. Even the people hosting the solution, Not just the customers.

Murilo (39:55)
Yeah, that's true.

Yeah, that's true.

That's true. That's true. Yeah.

Bart (40:00)
So yeah, I don't

really have a good answer, but I think that you can look at it in different ways.

Murilo (40:05)
But I think the planned outages, I'm not sure if it's a fair comparison though, because I feel like for this it was definitely not planned, right? And I'm sure that AWS probably also has planned outages, but then they route traffic to other servers and stuff, right? So there is no disruption of service. I think it's more like if you have an unplanned outage and how often that would happen versus something like this, right?

Bart (40:29)
Yeah, I think that's fair. But it doesn't change the thought experiment,

Murilo (40:32)
Right.

No, for sure. I fully agree. I also think it's like, if you can host it yourself, what's the level of guarantees do you need? You do have more control over it, right? I do think everything. I do agree. And I think.

Bringing this up now as well, think earlier in the year, now not as much, I don't hear as much, but there was a lot of discussions over the distrust in the US and the US clouds and I think this adds to it, right? And I mean...

Bart (40:54)
Yeah, there's another point here.

People start considering now what direction should we take while two years ago people would never think about this.

Murilo (41:08)
Exactly. I think even

like DHH, which I think we talked some time ago, like the guy there, like he had a post as well saying that it's way more financially interesting for hosting their own stuff as well. there was moving away from the cloud. So there was also some people, I think depends a lot. I mean, because I think he also had like a big server usage, server costs, right? So I think for him, he explained a bit of math, like buying this, have using this for five years and having this. So he's not definitely not for

Bart (41:20)
think it depends on the use case.

Murilo (41:35)
everyone, right? But I think for me, was also interesting hearing that because I think it was, mean, whether he's right or wrong, I think it was the first person that really came out being really against the cloud. And up until that point, I hadn't really seen anyone really say going for the cloud was a mistake or is the wrong way. So I feel like it also kind of was like, whoa, like the question mark, right? Like, what? Let's let's let's let's see him out. Yeah. And some people actually, I mean,

Bart (41:50)
Yeah, ⁓

Yeah, but I agree, like there has been over the last 15 years, whole generation brainwashed in that the cloud is the future.

Murilo (42:09)
Yeah, I mean, it's the only way. Right? It's like, so, so yeah. Would you anything you want to share what you were what you're working on Bart on your neon?

Bart (42:12)
Is it the only way? Yeah, that's what we were taught.

⁓ but it's still very, very early days.

Murilo (42:24)
Okay, maybe maybe we'll leave it for another another day then. Or you want to share now? can see I can I can put on the screen as well. This really okay, you can just use it now and then we'll do it. We'll do another.

Bart (42:27)
Yeah, yeah, yeah, I can tease it a bit. I can tease it a bit.

Oh no, that's too early. That's too early. So

I'm making, but it's really like in the hours in between that they have here and there because I think it's just fun. And it's not even a concept that is fun. okay, I'm just gonna say what it is. Otherwise it's gonna be very vague. I'm making a basically a website, which is a job board for developer related jobs. So let's say full stack from the back end.

data engineering, machine learning, whatever kind of jobs, right? You can post them there if you have a verified account. Not too enthusiastic about it, right? But the only thing that is fun is I'm making like this retro UI and it's like you're a bit going back to always two warp days, which was my first operating system. And you're there again, like, everything is a bit old school and retro and like you have to have like a task bar and icons and like...

In the end, it's like a job board, right? But it's more, I'm more having fun with making the whole retro UI. And I have an idea to make a terminal and stuff like that. yeah, it's kind of fun.

Murilo (43:30)
With the UI. Yeah, yeah, yeah, yeah.

It looks cool. Maybe next time we can think a bit about what we want to show, but it looks really cool. We showed the post hog, think, the website as well, which is also very cool.

Bart (43:47)
Yeah, yeah, It's actually,

it's actually post-hoc that triggered it. So post-hoc is a main website. They have a little bit of this retro feeling and that, I don't know, it kind of warmed into my brain. And then I thought, I need to do something with this. Yeah.

Murilo (44:01)
⁓ Which

is showing different the screen people here, you know, it's really like again not gonna put

Bart (44:09)
It looks a bit like

it, but actually, like, I'm not trying to brag here, but I can have multiple applications open online.

Murilo (44:15)
man. You can have like multiple windows. can have like the you can have AI chats on your browser, right?

Bart (44:18)
Yeah, yeah, yeah.

Yeah, and there's a community chat. Like you can type something and everybody sees it. Not sure if that's a smart, like...

Murilo (44:30)
instantly no moderation just like

Bart (44:32)
Yeah, yeah.

But what I did to safeguard a bit, like you can have like max 50 messages and then like it's a first in first out.

Murilo (44:39)
Okay, I see, I see. So if someone says something, Bart's just gonna be there like...

Bart (44:41)
So if Murilo misbehaves,

it only takes 49 other people and then you're out.

Murilo (44:45)
or Bart typing like crazy, right? Like, no, no, no, no, no, no, no, Cool. Cool. That was it for this week. So like I said, slightly shorter, but yeah, anything else that you want to say Bart, anything I didn't? ⁓

Bart (44:47)
Yeah, yeah, yeah. You're not allowed to say this. You're not allowed to say this.

Don't think so.

Murilo (45:05)
think so. Then maybe a little update on the newsletter.

Bart (45:10)
yeah. ⁓

Murilo (45:11)
You wanna go for it, bud?

Bart (45:12)
Like here what do you want to.

It exists? I forgot, because we also send out an article, we did some updates, ⁓ Async from the podcast, I don't know where we're at. but basically we haven't, no, we don't have a newsletter person. We did, we do. We have a first, we do have a first email sent, right? Yeah, yeah, we have a first email sent. That was the update you were asking about. Okay, okay, you're completely right. So, let me reset.

Murilo (45:19)
Yeah, yeah.

Me too! Yeah, that's what I was... That was the news!

You have to eat.

Bart (45:37)
So we do have, for the people that don't know, we have a mailing list now. And it's under our, like the link is under our videos. We should actually also put it on our website. It's not on the website yet. We actually do have people that subscribe, which is very cool. So thanks a lot to these people. We have a very first version of the newsletter sent out, which today is just, let's say a summary, a bit of a rehash of the articles that we discussed.

which is, I want to say, the minimal effort that we can do to have a newsletter. We do have some ideas to improve it going forward. But it's cool to have these different touchpoints. It's also cool to have a bit more personal interaction with these subscribers, I want to say.

Murilo (46:14)
Yeah, for sure. think it's like we were talking earlier, right? I think it does bring a different vibe to it, right? Because I think it's a bit more like before we have numbers on the podcast and YouTube, it's just numbers, right? And I think there we have a bit of a I don't know, it feels like it feels more personal. It feels like we're talking to people, right? Which is cool.

Bart (46:36)
Yeah, true. So, yeah, enthusiastic about that.

Murilo (46:37)
Right.

I think we sent out the first one. We had a lot of ideas, a lot of great ideas, a lot of not so great ideas, but ideas nonetheless. Right. And we'll try things out and... Exactly. But we'll try things out. And I think, again, we're excited about it. If you have any feedback as well, feel free to share or any ideas or something you would like to see. Feel free to let us know.

Bart (46:51)
of our lives.

Murilo (47:03)
Same thing for the podcast as well, right? If you have any thoughts, comments, feedback, you want to leave us a review as well, anything, we're very much appreciated, especially the five star ones.

Bart (47:13)
I'll maybe close with a bit of a humble brag.

Murilo (47:16)
Alright, please do.

Bart (47:17)
just opened the dashboard for our newsletter. So we're hosting this on Beehive. Still very humble days, humble beginnings. But our open rate now is still 100%. That's impressive, right? So yeah, it's a good start. Right? It's a good start.

Murilo (47:28)
⁓ there we go. Yeah, it's true. It is a good start. is a good

start. So thanks for everyone that subscribed and stay tuned. I think we'll play a bit more with the newsletter as well in the future. Cool.

Bart (47:41)
I'll hear you all soon.

Murilo (47:43)
yes yes yes thanks everyone ciao

Bart (47:48)
Ciao!

Creators and Guests

Host

Bart Smeets

Mostly dad of three. Tech founder. Sometimes a trail runner, now and then a cyclist. Trying to survive creative & outdoor splurges.

Host

Murilo Kuniyoshi Suzart Cunha

AI enthusiast turned MLOps specialist who balances his passion for machine learning with interests in open source, sports (particularly football and tennis), philosophy, and mindfulness, while actively contributing to the tech community through conference speaking and as an organizer for Python User Group Belgium.

AWS Went Down, Sora Under Fire (Again!?), Claude 4.5, Music Without Musicians

Broadcast by

Creators and Guests

headphones Listen Anywhere

Listen Anywhere