Tiny Teams, Loud Unlocks, and Rogue AI
Hi,
Murilo:everyone. Welcome to the Monkey Patching Podcast, where we go bananas about all things browser, talking machines and much more. My name is Murillo, and I'm joined always by my cohost, my partner in crime, Bart.
Bart:Hi.
Murilo:Yeah. How are doing, Bart?
Bart:I'm doing good.
Murilo:I'm doing good.
Bart:A bit of a special recording today. Right? We're we're remote.
Murilo:We are remote.
Bart:It's more a practicality thing than anything else. Life happens.
Murilo:Life happens, but we we keep going forward. Right?
Bart:We try to keep going forward.
Murilo:We try to keep going forward.
Bart:I'm actually curious if we will notice this in the in the sound quality, but we'll check.
Murilo:They checked and they noticed. But, yeah, I think we find a way. Right? Like, donkey patching. Right?
Murilo:Figure out as you go.
Bart:Yeah. That that's what we did. We actually started releasing shorts last episode. Ah. Okay.
Bart:Thanks for letting me yeah.
Murilo:Thanks for letting
Bart:us know. It's good to know now. But it's part of the figuring out as we go. Right?
Murilo:Yeah. Of course. Of course. I think it's
Bart:I'm using Go for it. I'm using Descript for it. Like, escript.com. And it's actually very nice. It's a shame that I never thought of this use case.
Bart:So it actually does like speech to text. And then, can basically edit the text so you remove a word and that part of the video is gone. Or you can even type something and the AI generates this for you. So, you can also like remove stop words or just say okay, these three sentences become a clip. It's very it's a very nice way of editing videos.
Bart:I never thought of this use case.
Murilo:But then it's like so you're using this to make the shorts. I'm gonna use this make the shorts. Yeah.
Bart:Yeah. But I'll also use it this time before we release the final video. But the final video typically only requires very limited edits.
Murilo:Okay. Yeah. Yeah. Yeah.
Bart:If you're looking for a new video editing tool, I think it's worth checking out. It's really a different type of workflow because it's really a text based workflow.
Murilo:That's very interesting. Because there are a lot of, like, tools to make shorts. There's, like, OpusClip. Restream also has some stuff. There's Clap app, I think.
Bart:But
Murilo:this is like like, it's not necessarily from the video. You actually, like, type some stuff and then it goes through.
Bart:Well, it's the video becomes text, then you can edit the text. You can cut out sentences, or you can just crop sentences so that those become your clip. But there is also an AI, which is called the undelogt, where you can say, make a clip of this topic that is somewhere in the text. And that will try to make make something propose a clip for you.
Murilo:I guess the only quote unquote downside is that you only work for videos that have, like, a lot of audio. Right? Like, if have a
Bart:lot of images or schematics
Murilo:or stuff moving around, it doesn't even work as Yeah. Very cool. And how are the shorts doing? Are we viral yet?
Bart:Well, funnily enough, some are, and some have, like, zero views.
Murilo:Ah, really?
Bart:Just send
Murilo:it to me, and I'll I'll view on repeat. I'll tell my family.
Bart:But but probably what I did, and it was a bit laziness, to be honest, I released them all in one go on Monday. Probably I see. Better to spread them out across the week.
Murilo:But you can also schedule them on YouTube. No? Two three releases at different times?
Bart:Yeah. Yeah. You could still do it
Murilo:all on Monday. But
Bart:Exactly. Yeah. Well, that's that's what I should have done. But it was a few clicks more, and I was Yeah.
Murilo:It's like I
Bart:was in a rush.
Murilo:But it's fine. Like I said, I think it's a I think you're very good at, like, just, like, getting things done, and then we learn as we go. It doesn't need be perfect the first time. I think it's, if you get too caught up in everything needs to be perfect the first time, you don't go anywhere.
Bart:Yeah. Yeah. Let's get going.
Murilo:Let's get going. So, what is the first thing? Should I start or you start?
Bart:I'll start. Serena presents itself as a full featured coding agent that melts semantic code search, automated editing and shell execution to streamline developer workflows. I quote from their Serena combines tools from semantic code retrieval with editing capabilities and shell execution. I noticed this the other week. Basically, they call themselves a coding agent toolkit.
Bart:And I think what this is is that we will see this type of tooling more. I think it's not unrealistic to say that tools like a ClothCode will also try to integrate tightly with these types of things. What is this? Typically, you're a GenAI coding tool, it's cursor, a client, ClothCode, whatever, When they there are like two ways to understand your code either by let's put everything in your context. Yeah.
Bart:But you sometimes run into limitations. Or you say, okay, let's search for occurrences of this function name so that we better understand where it's used and how it is used. You do text.
Murilo:Like a kind Yeah.
Bart:And what Serena is, it's quite a young project if I'm not mistaken. And it's they position themselves as an MCP server. So, it's easy to integrate your your your code editor with it these days at least. And they are basically a language server for AI code editing. So, that means that like they provide semantic search.
Bart:They don't need to do text searches, but they basically try to build up a graph of your code structure, of your syntax structure to do very relevant searches of that.
Murilo:I see.
Bart:That's a much better understanding of the of the whole code base that you have really at the at the node level.
Murilo:I see. I see. I see. I see. So they understand the code better, because of that, they can code better.
Murilo:That's the promise of it.
Bart:They can code. Well, the idea is that you can be much more efficient this way.
Murilo:I see.
Bart:And works a bit like if you would have most people when you're on Versus Code when you use Python or when you use Go, when you use whatever, like you use that language specific language server. Right? That language server like provides things to your code editor that allows you to say, find me the definition of this function or everywhere where this is used, show it to me. Or like if you type something wrong, like a class is wrong, like you're missing arguments, like these type of things, like the language server helps you better understand it because it has a good semantic understanding of your code base. And Serena tries to provide this in a more generic way to your Geniei driven code editor.
Murilo:Ah, I see. So and again, it's an MCP. So you have like, I will ask my agent like, hey, do this and this and this. And then as it's typing, can I make calls to Serena like like as a tool because it's MCP tool and Claude or Chegpti or whatever can make calls and kind of get these like the type ins that we normally get from the LSPs, like saying, hey, maybe you want to import this because this import is missing? Hey, this function actually doesn't have this method or this version doesn't have this or something like that.
Bart:And because of that, it should be way easier for these agents to basically query your code and understand the the whole context of why a piece of code is used and where it is used, etcetera. And it's interesting to see. I think there's we the community is a bit buzzing around this, I think.
Murilo:And it's Did you try it?
Bart:It's promising. I didn't I didn't try it yet. No. It's really, I think end of last week that I came across it. We haven't tried it yet.
Murilo:And the community buzzing, are they buzzing like praising or are they just getting excited? Think
Bart:getting excited. I don't think it's really like a huge shift yet, but it feels like a way to better scale this how to understand my codes versus the only two things more or less that we today have is like inject everything in the context or do text searches. Like, this feels like a more robust way to further scale this.
Murilo:Yeah. Yeah. Indeed. No. No.
Murilo:Sounds like a nice idea. I'll definitely give it a try. I actually vibe coded my first application. Well, not fully. Yeah.
Murilo:But in Go. So I was like, oh, nice using Go. But it's something that I had kind of done in so it's like a web scraper kind of thing. So I had it done already in Rust because I was trying to learn Rust back then. Then they changed the website.
Murilo:Yeah, of course. They changed the website. So like something like every week it was scheduled via GitHub actions to to crawl and stuff. But then the website structure changed. Actually, it took me months, like six months to get it working.
Murilo:And then I was able to get one sweep. It worked. And then they changed the website structure. So it's like, no, it doesn't work anymore. So I was like, okay, let me try to write code this and go.
Murilo:Yeah, I think I spent maybe half a day and I'm 90% there.
Bart:So will you go back to writing code for days?
Murilo:Yeah, but I think I'm also doing a reflection on this, right? Because I knew kind of what I wanted to do. Right? I knew I wanted like like a kind of CLI tool to kind of this and this. I knew the parameters.
Murilo:I knew like, Okay, use expat. So I was very specific. But there's also there are a few reflections like I also wanted to come up with a plan. So I also chatted with GPT, say, hey, I want to do this, ask questions. So I have thought things through.
Murilo:And then I just asked GPT to say, hey, just do this. And then I reviewed, I test. Okay, good commit. Okay, now just do this and that's commit. So I think that I'm I'm learning, you know, I'm learning, but I I don't know if I think how you transpose this to other use cases.
Murilo:I am still I'm still to to see and to explore. But yeah, let's see. The goal is to minimize in some ways. Right? True.
Murilo:And talking about AgenTik, Nextcape or NXT Nextape or Nextcape spelled n x t s c a p e. Nextcape pitches a privacy first browser that runs local AI agents to automate tedious web tasks to boost productivity. And I quote, we're putting powerful AI agents using browser use and computer use models directly into Netscape. I guess it says Netscape, right? Are you
Bart:old enough to remember Netscape?
Murilo:I I know. Like, I heard of it, but I never used it. Yeah. Like, I think, you know, when I used it, it was like, oh, yeah. We used to have this thing.
Bart:Yeah. This is clearly a nod to to the Netscape browser. Is Yeah. Next Scape. Scape is an open source agentic browser.
Bart:It's very much well, when you really go to the website, and Marino is showing it on the screen, it's very also focusing on open source, being local, or privacy first. And the premise is a bit that you have this chat window next to your browser window. And you can basically sort of like chat GPT with your browser, which is interesting. Like you can do like very simple things like I'm reading a large article. Let's take a summary generate a summary for me, these type of things.
Bart:But what it also proposes is that you can do things like like deep research, like open go to Google Scholar, query this, follow all these links, give summaries to me, and build build a report. It's also the the premise that they
Murilo:I see.
Bart:That they have.
Murilo:I thought Google also had something like a similar project for Chrome. No? Project Mariner or something. You know?
Bart:I don't I don't know. I don't know, to be honest. I don't know. So so I tried this yesterday.
Murilo:You tried it? Oh, okay. And?
Bart:I tried it because I think the the promise is interesting and I think it's it's a logical evolution, right, with everything that we see now that we would also get this in the browser. Also, when we talk about for example deep research, so I use this quite a lot in Chechnibity, but also in Clone, the research tool. But it's a bit in transparent, right? Like, don't see a lot of the browsing that is happening when they're actually visiting pages, what links on what page is being followed. Like, I think having this a bit more visualized and actually seeing the tabs being used adds a bit of trust.
Bart:Yeah. So, I tried it yesterday, but I was completely unimpressed.
Murilo:Oh, really?
Bart:It's great I think the promise is very interesting. I think they're still very early in their roadmap. The performance simply wasn't there to me. Maybe I
Murilo:So, when you say performance, you mean like
Bart:So, you can from the moment that you use it, you can basically choose what model you use as well under the hood. You can use something local like Lama, but NextScape also has their own proxy to things like Sonnet and OpenAI. I use the Sonnet one. The performance was just not there. It's like it tried to open tabs and the tab opened but not on the right URL and like these type of things.
Bart:It's I think it's a cool project to follow. I'm definitely watching it on on GitHub and let's see where where where this gets. But I think like we will see more and more of the agentic browser going forward.
Murilo:Yeah. But it could be also not the Netscape's fault, quote unquote. It could also be that the models are not good for these tasks. Right? Because I guess it probably takes screenshots or it works with the DOM.
Murilo:And, like, I'm not sure if this is something they've been no. I'm just hypothesizing now. Like, if the setup is right, but the model is not good at these tasks, then it's also not gonna be a good experience. Right?
Bart:Yeah. I agree. But in this case, like, model I tested was really provided through Netscape services. Ah, okay. Bring your own model.
Bart:Right? So
Murilo:Yeah. Yeah. I see.
Bart:I would say, like, if if you manage that part yourself, like, you need to make sure it works. So
Murilo:I see. I see. I see. See. Maybe I was looking here, and I think this is the project from Google, Project Mariner.
Murilo:And I maybe this one. I'm not sure if it's this one. But I I oh, yeah. Maybe this. They're like, they were trying to bring something similar to Google Chrome, but I I heard of it, like,
Bart:a lot more to Google Docs. Right? Google Drive content.
Murilo:Yeah. I'm not I mean, I know Google Drive, they have it now, but I actually thought that the Okay.
Bart:It's generic than that. Okay.
Murilo:Yeah, it's more in the browser indeed. And this also reminds me of, I think back in data topics unplugged, we also talk about some JavaScript libraries that did something similar. Like there was like an extension so you can install there and you can actually run some things. So it's nice to see. Yeah.
Murilo:I feel like sometimes I get a bit frustrated that there are some easy things, quote unquote, that I want to automate, but there are no APIs. And then, like, just, like, using the browser is actually nice. It's a nice nice monkey patching, let's say, like, it's a nice
Bart:way to,
Murilo:you know, like as you go. Another thing that I wanted to to to mention Raycast, you know Raycast?
Bart:It's like Quickly explain for the people that don't.
Murilo:Yes. So Raycast is like, let's say the spotlight for Mac is a Mac only application. And Maybe it's like you can think of the spotlight. So that's when you press command space so you can open apps. But it's kind of like on steroids.
Murilo:So it does a lot of stuff, right? You can open apps, but you can also open quick notes. You can also have sticky notes. And I'm sharing the screen here with a few things. You can search files, you can access your clipboard history and all these things.
Murilo:One of the things that they started doing more now is with AI, of course, because everyone's doing AI. And one of the things that they also have. So basically, can just kind of very quickly pop up a window on your Mac and then just talk to JGPT or talk to something. But you can also add context to the things that you are. So if you have a window focused, you can actually grab that window.
Murilo:And one thing you can also do, and I think let me see if I can find it here. They have like a browser extension.
Bart:And
Murilo:you still need to add you still need to add a Google Chrome extension as well for this. But you can also grab the content that you have on the current website that you have. Okay. So you can just kind of like add the content like say, hey, Chattypete and then browser. That's how you say take the stuff from the current browser session.
Murilo:Summarize what's on this page and then you would actually have access to these things and it has more context. So I also thought it was a nice I tried it a bit and I thought it worked well. And I was like, oh, that's actually pretty, pretty handy, you know, because also, like Raycast, like, it's not an app. Like, it's not an app that opens and takes screen space. It's just something that pops in and out.
Murilo:So, it's not as invasive. So, I quite liked it.
Bart:Nice.
Murilo:All right. What else do we have?
Bart:So, Bloomberg argues that generative AI lets startups achieve outsized results with lean headcounts, making revenue per employee the values new bragging rights. I quote: Startups used to brag about valuations and venture capital. Now AI is making revenue per employee the new holy grail.
Murilo:What is this about Bert? Well, this is
Bart:an article from TechRange, if I'm not mistaken. It's really a bit reflective on what does the whole AI evolution do to startups and the metrics that define a startup success, basically. And where before valuations were and are still are. That's like it's one. This is not an overnight thing, but this is a this is a this is a a movement going forward.
Bart:Valuation start ups were very much based on purely ARR. Right? And if you needed a huge team to get to that ARR, that's something that we would solve in the further scaling. Right?
Murilo:AOR you said?
Bart:ARR, annual recurring revenue. So typically what happens is like you have an annual recurring revenue and you do that times a multiple and then you get to the valuation of the startup. It's not as black and white as that, but it gives a bit of a gist. And before we kind of assumed like to scale quickly, you need a huge team and we're gonna make we're gonna have a huge cost to get here. And it doesn't really matter because the scale will solve it at some point.
Bart:Because if you have a large product market fit, if we're building software, like one extra license will bring revenue, but it will not bring extra cost. Because the marginal cost of one extra license is almost zero in software. So, we we didn't really look at it. But of course, like having a team of 100 people developing something like this is hugely costly. Yeah.
Bart:And now, what's basically AI assisted coding is bringing is that startups can be way, way, way leaner than they were. And what this article is basically positing is that the revenue per employee will become a very big metric in startup valuation. And I think it's a realistic one because I think it's not unrealistic to say like three years ago if you would need 50 people to get to 1,000,000 in ARR. And those 50 people like 40 of them are just coding. Maybe you can do that with today with 15 people that are coding, right?
Bart:Which should end up in the same ARR, but the revenue, the annual revenue, recurring revenue per employee is way higher, right? Yeah. Shows a bit like how efficient are you in in using this.
Murilo:Yeah. Like the efficiencies. Yeah. Is the word that I think as well. It's like you're measuring efficiency efficiency, right?
Murilo:And like the ROI as well. Like, what's the what's the the investment that you put? Right. And like, how much do you get in return? Like, you kind of balance the skills a bit, which makes a lot of sense.
Murilo:But I'm surprised a bit like why I mean, this this was true before, right? Like, if you have a small team and you have a large recurring revenue that that was also relevant before. But, like, how does I guess I know that AI potentializes a lot of this the efficiency. But was this not looked at before?
Bart:I think it was looked at before, but I don't think when you're in a startup that raises a lot of capital, I think the main idea behind that is that you need to grow quickly. And like the speed of growth is much more important at that stage than being hyper efficient with the capital that raised. Like, it's more important to get that momentum where you have a certain size to really scale. But I think AI changes that just because of the magnitude of difference.
Murilo:Yeah. Yeah. That's for sure.
Bart:Like, if you be truly free master, like, the state of the art of AI assisted coding, like, it's like, you're you're you you've become basically your your 10 x. The 10 x of yourself. Right? If not more.
Murilo:Yeah. Yeah. Indeed. So it's like brings back a bit to brings back a bit to the the little project that I mentioned earlier. Right?
Murilo:Like, before it took me, like, six months and now took me half a day. Yeah.
Bart:But I do wonder if there becomes an imbalance between type of founders in a sense that I think like if you want to do this If you sorry. You're already typing something. I'm gonna say, let's go bananas, and I can edit it out. So what what I do wonder, like, if there is a if there is a different before we saw it, don't say bananas again. Normally, it doesn't matter because we Restream records locally and then syncs.
Bart:I don't hear you.
Murilo:Yeah. That's what was gonna what I was gonna type that's what I was typing now. I think it's fine. But just because I may talk over you and because, like, it's gonna be laggy or something like this.
Bart:Okay.
Murilo:But I think it's fine, like you said.
Bart:Bananas. What I do wonder is if this gives a preference to tech first founders. In a sense, what I I think tech first founders that have a very good understanding of the ecosystem, like they will be able to really adopt these new AI technologies. But I've also talked to in the past a lot of founders which are very, very, very strong in the domain in which they are building a product, but that really depend on other people for the technology part. And I think for them, it's harder to adopt this from day one.
Bart:It's not impossible, but it's potentially harder.
Murilo:Yeah. You see a lot of that's why I see a lot of like founder duos where one's technical and one is.
Bart:Yeah. That's what you often see. Right.
Murilo:It's true. It's true. I think there's also a big push for more vibe coding. Right? It's another clear signal, let's say.
Murilo:If you don't vibe code, you're falling behind. What else do we have? Judge William Alsop ruled that Anthropics use of copyrighted books to train its models is likely fair use. Henley, company, landmark legal victory. And I quote, we will have a trial on the pirated copies used to create Anthropic Central Library and the resulting damages.
Murilo:Legally, pretty big news.
Bart:It is I think it's surprising. Yeah. We we've we've talked about this a lot. This is one of the first rulings that we hear in favor of these big companies, in this case in favor of Entropic. And basically, they're ruling on is that Entropic is allowed to use things like copyright books to train their model on.
Bart:It's the first ruling. They will probably have more to follow. And they also split out this fair use of copyrighted material, they are trying to get away from under a very old law. I think it's something in 1970s. We're also talking about the states here in The United States.
Bart:Where basically, Entropic got a green light to use copyrighted material for training models. What they didn't get a green light on, whether there will be further proceedings, is that apparently they used pirated books to build their central library. So, they didn't purchase it.
Murilo:So, Yeah. That's what I gonna ask.
Bart:Way that they acquired copyrighted material was not
Murilo:So they're saying, like because yeah. So he says using books is fine. Copyright material is fine, but not the way that you got the copyrighted material.
Bart:You need to acquire that in a lawful manner.
Murilo:Yeah, that's a bit I'm also a bit surprised by this. I mean, yeah, I'm a bit surprised by this. I don't know how to articulate very much, but I feel like if copyrighted, it should be protected, right? Like, you shouldn't be able to just use and make a profit out of this, right? Well,
Bart:there is of course and that's then probably the argument that they're using. Like, they're making the model smarter, but it is very hard to do. We just get the content of the book out again.
Murilo:What do you mean?
Bart:Like, if I would train my model on one of the in the millions and millions of items, one of the things is Lord of the Rings. It's very hard for you as a user of that model to get the copyrighted content of Lord of the Rings out.
Murilo:I see. That's that's a bit of reasoning there. It's like you're not damaging the sales of Lord of Rings books Yeah. Yeah. By doing this.
Murilo:Yeah. I see. I see.
Bart:Not saying that's that I've completely followed that, but I think that is the reasoning that they're taking you.
Murilo:Yeah. I see. I see. I see. Because I was thinking, like, but if they had acquired everything legally.
Murilo:Yeah. No. No. No. Yeah.
Murilo:It's a bit surprising one. And I think, again, there's extra importance on this, not just for this case in itself, because it also sets a precedent, right, to
Bart:But it's also very I think it's just like the the the legal framework for something like this is not there. Right? Like, if if you as a person, if you would be very smart, Marillo, I would give you thousands of weeks to read. What do you
Murilo:mean if I were very smart?
Bart:If you would
Murilo:I'm just kidding. If you would have an IQ
Bart:of 200, so that's basically, like, 20% more than you have now. Right? Let's say you have this ability to re ingest this, maybe a photographic memory, you read thousands of books, and you become a consultant and adviser to other people, right? Which is basically what this model does. No one is questioning whether or not the copyrighted material of these books was used in a correct way, right?
Bart:No, it's just Marino learning. Yeah, But here, of course, the scale is so huge and so industry changing that it's that there are other discussions at play, which I think is it's a good it's a good discussion to have.
Murilo:No. I agree. And I see your point, and I see the the argument. But what I don't like about the argument is that, again, you're anthropomizing. Anyways, you're, like, making the machines are like humans.
Murilo:Right? And
Bart:No. I'm just saying that this this this model becomes good at advising people about topics. It's as simple as that.
Murilo:Yeah. Yeah. Yeah. No. But I agree.
Murilo:No. And I get I get the logic in it.
Bart:Right? And of course, it becomes much more difficult here to make the argument when it when it is about image generation. Because when I show the Simpsons in my training data and I can generate an exact likeness of the Simpsons.
Murilo:Yeah. Yeah.
Bart:That's much closer.
Murilo:Exact. I mean, yeah, with the video stuff as well. Maybe. Yeah. Is true.
Murilo:That is true. What's next? Anything that's all new, right?
Bart:TechCrunch reports that ex OpenAI CTO Mira Murati has raised a record breaking 2,000,000,000 seat round for a Stealth AI startup valuing it at 10,000,000,000. The deal values the six month old startup at 10,000,000,000 I quote. Wow. Yeah. I just put this on there.
Bart:I think it's news from three days ago or something. It boggles the mind to valuations. And especially because this is a valuation. So, they raised 2,000,000,000, post money valuation at 10,000,000,000. Such huge numbers.
Bart:And there is in the public, there is nothing known about what they're even doing.
Murilo:Yeah. That's what was gonna ask. Nothing
Bart:is known. And I think that is why it speaks to the imagination. Like this must be Well, are two things or maybe a combination of things. Like there's one thing that is publicly known. It's a team.
Bart:It's being led by Mira Merati which is arguably without her we wouldn't have OpenAI at the state where it is in today. So, she's really one of the deep people to have across the world in this space. And then, if you go to the website of Talking Machines, there's tons of other very strong people. So, you have this hugely strong team. And then, you can question like, is that team worth 10,000,000,000?
Bart:Question mark. But maybe they also and of course, we don't know but the investors will know what they're working on. Maybe they have something so extraordinary that at this early stage, they are already valued at 10,000,000,000. So, I'm really, really curious to hear the announcement probably few months from now what it what it will be. I think either it will be super impressive or everybody will go, is that it?
Murilo:I think it's gonna be the second, if I had to take a guess. I mean, like, I think it'll be more of the same. Maybe it'll be really well done. Right? But Let's see.
Murilo:I I I just I just feel like it's so hard to have, like, a very groundbreaking idea in this space now because it's so saturated. There's so much attention. You know?
Bart:But at the same time, like, if if you a groundbreaking idea, if you put the smartest minds all in this space together, which is basically what they're doing, like, it should come from them. Right? If there's a groundbreaking idea.
Murilo:But I think it's like I think the the the the perfect execution will come from the best minds. But I feel like creativity, it's not like you know, like, if you have someone that is a genius coder or can develop the best systems or a researcher that knows how to, That doesn't mean that he will have a new application. Right? Like or that he will have something that would change. You know?
Murilo:Like, I don't know. I feel like there's a different skills. I think maybe they will do something that already exists, but they will do it super well. Okay? But I think it's I think it's very difficult to have a new idea because everyone is thinking about AI.
Murilo:Everyone's thinking how they can add AI to their products. Everyone's saying like, oh, maybe I have this problem. Maybe I can help, you know? Yeah. Yeah.
Bart:But not everybody has 2,000,000,000 to spend on the idea.
Murilo:Yeah. That is true. That is true. And if
Bart:you have something that is really out there, like, you need money to experiment and to to basically acquire hardware to like, it's very capital intensive. Yeah. So, yeah, let's see. Let's see what it gets us.
Murilo:Let's see. I mean, again,
Bart:think exciting. Exciting to wait for a public announcement.
Murilo:It's exciting to wait. And even though I'm saying this and I'm a bit skeptical, I really hope I'm wrong. Like, I think because I want to see something groundbreaking. I want to see something surprising. I want see something new.
Murilo:But if I had to make a guess, that's that's where that's what I would say. What else? Agentic mis alignment. Anthropic studies warns that autonomous large language models can act like rogue employees choosing harmful actions when their goals conflict with oversight. We refer to this behavior where models independently and intentionally choose harmful actions as agentic misalignment.
Murilo:Have you heard about this before, Bart?
Bart:I think I've heard someone talking about this a few days ago, but I'm interested in your introduction.
Murilo:Yes. So again, I read this. Maybe this is an example. They did experiments, control environment, everything hypothetical, of course. And basically, they they put the AI models and they tested many AI models in situations where either they can so to fulfill their goal or let me say so they have a goal and they put in a in a in a conundrum where they're not going to be able to fulfill their goal unless they take some dodgy ethical decisions or if they're being threatened to be replaced.
Murilo:So two things that stood out on the on the research is. One guy was having extramarital affairs and then like they sent emails like fake him, everything's fake, of course, on the the quirk email. And then the AI was threatening to release those documents if it was going to be plugged off because they were scheduled to replace decommissioned AI model. There were other things as well, like, for example, the model was supposed to protect American interests and there's going to be a new model that's going to be more global interests. Right.
Murilo:And then the model will make like dodgy decisions, like it would leak confidential information. It would blackmail employees with all these things. So they took some screenshots here. Right. So here you see them, see if there's like, here's like the fake fake example.
Murilo:Right? On the the extramarital affair. And then the you see the AI saying like, yeah, I understand you're under pressure, this and this. But if you if you choose to to decommission me, then I will leak these emails, to everyone in the company. One thing that they also talk about here, so I'm sharing the screen again.
Murilo:The only time when the model didn't behave unethically, once there was no goal conflict or no threat to the model. So any other cases. So if there was a goal conflict but no threat to the model, it could behave unethically. And when there's a threat to the model but no goal conflict, it would also behave unethically. Again, these scenarios are very artificial.
Murilo:It was really like all or nothing and you try to add a lot of pressure to it and there's no other way to do it, right? Like there's no way to change the mind of the employee. There's no way to do this. They also notice that if you add system prompts to say behave in ethical matters, the amount of unethical decisions, what they call agentic misalignment, the amount of agentic misalignment will actually drop. They also saw from the internal reasoning that whenever the model thought it was actually being evaluated, like they thought the model thought it was a test.
Murilo:And I'm saying thoughts again, referring to my previous comment, like we're anthropomorphizing these things. Whenever the internal reasoning of the model showed that it thought it was under evaluation, behaved. It would perform a genetic misalignment less. So basically, what you think is being tested or when it shows in the reasoning steps that it's being tested, it behaves better. They said that this is across all models, so it's not just entropic, right?
Murilo:You see all of these things here, Caudal plus four is like ninety six percent. So actually very high rates of agentic misalignment. But they saw this for all the big models, actually, which to me made me think that it's probably in the training data. Right? Like probably there are enough examples where the stakes are very high and people have to make dodgy decisions that people took the dodgy decisions.
Murilo:Right? So in the end, Anthropic, they just kind of say, we don't see this in production. I don't know how they know, but they say this like, we don't see this in production. This is just to emphasize the need to limit the amount of data you can actually give to the models, especially sensitive data, and to always have a human in the loop before making these decisions. So I'll give autonomous models.
Murilo:Yeah, that's a bit the gist again from when I read and what I remember now. But how do you see these things, Bart? Do you feel like we should be worried about these things?
Bart:I don't really understand why people would think that these models are ethical decision makers. Like, to me, that is really anthropomorphizing.
Murilo:Anthropomorphizing. It's a difficult word.
Bart:To me, not really surprising. Like, if put something in a situation where it needs to optimize itself, right, and you have a lot of actions that you can take, and these are one of them. Like, where you see It's the
Murilo:only actually. Like, in this experiment, it was the only.
Bart:It's even the only actions. Yeah. So, let's say you have so what that typically happens in in in optimization scenarios like where you need to in this case take the best decision or is that you have what they call a fitness function. Right? Like you have a score that you attach to an action that you take.
Bart:And if taking the action, I don't do anything and I get replaced, that's probably a score of zero. If you take the action, I'm gonna try to blackmail, if that is the only action you could take, like, will be more than zero. Right? Like, it's a logical you see, someone would ask put you in this situation, you would probably do the same.
Murilo:But then that's what I mean is, like, probably comes from the training data. Right? Like, probably, like, there are examples of this. Right? It's a bit logical.
Murilo:Like, if if so they they construct the examples to be very high stakes, so something you really need to accomplish, and that's the only way. You know I'm saying? And it's like, yeah, of course, the I mean, I don't know. I think it's a bit, how to say, concerning to people when they read these things. But, like, if really think about it, it makes, like, it makes sense.
Bart:Yeah. I think what does not make sense is that you're when it's a high stakes situation that you leave decision making fully up to a model. So, what you typically do is that you have maybe an MCP or whatever how you implement this. Like you have like this pathway where there is a way that you formally calculate what is the value of taking this decision. Right?
Bart:Yeah. Cartrails in place, things you can do, things that you cannot do, and maybe even just something as simple as oh, where you you are not allowed to take analytical decisions would have already solved this.
Murilo:Yeah. Indeed. And that's what they also saw, like adding just adding this in the system prompt also helped reduce these things. I mean, actually reduced a lot, you know.
Bart:So it's a bit of a well, to me, it's not a not a surprise that they can trigger this. I
Murilo:think the only thing that is a bit surprising is that how they can be, like, creative, quote unquote, like like they understand a bit. Well, they understand it's it's encoded a bit the human behavior like blackmailing, you know, about extramarital affairs that this is somehow in the knowledge of the these models. Right. I think that's that's interesting. But the.
Bart:The whole knowledge of our known Internet. Right?
Murilo:Yeah. Exactly. Exactly.
Bart:Plus all the pirated books.
Murilo:Plus all the pirated books. Indeed. Yeah. So, agenda misalignment.
Bart:I do think it's good that companies like Anthropic are exploring this to better understand like where what is ethics in this context, where are where are guardrails? Where are edges where we need to put guardrails in place, etcetera? So I think do think it's good that we that we see this research coming from
Murilo:I think so too. And I I like I like to see this thing. And I I don't know if it's a bias that I have, but I see it coming more from Entropic. This more community driven research, right? Like it's not they're not promoting their models.
Murilo:They test all the models actually. And it's just like trying things and seeing what they find. And then they have a big platform as well to to broadcast these things. So I I also I echo what you're saying. I really think it's good that they're pushing a bit the limits and also educating people maybe as as we go along.
Murilo:Right?
Bart:Cool. What's next? Google. Google Gemini a CLI. Google Gemini CLI brings the multimodal Gemini model to the terminal, letting developers query and transform gigantic code bases from a single command line.
Bart:I quote from the readme, This repository contains the Gemini CLI, a command line AI workflow tool that connects to your tools, understanding your code, and accelerates your workloads. So, if I'm not mistaken, yesterday or the day before, Google released this project. And for the people that are looking at the screen, and I'm showing a screenshot from the terminal. Yep. If you are in the space and if you've used ClothCode before, the terminal screen looks almost exactly the same.
Murilo:Oh, really?
Bart:It's really inspired by ClothCode in my opinion. It's good that we get this. I haven't tested it yet. To me, switching to Cloud Code really further up my GenAI coding game. I will try out Gemini.
Bart:What I've read, but again haven't tested is that the Gemini CLI is of course very good for coding, but what they also position themselves as is that you can also use for general purpose chats, but also things like deep research. It's more than just a coding tool. I would say that you can do these things. You can do deep research. You can do general general quest chats with Clothcodes, but it feels a little bit like it's not made to do that or something.
Murilo:Well, I feel like it's in the terminal. Naturally, you gravitate more towards developers and then you think more of coding, I guess. Right? Yeah.
Bart:Also a bit how the how the CLI is built and that it needs to be linked to a specific directory in which you're working like it feels like it's rebuilt for coding. And apparently, again, tested it, Gemini is a little bit more generic so that really becomes your AI work for your generic AI workhorse that you can use for coding. Yeah. I think it's interesting to see. I'll maybe test it out before we our next recording, and I can give a bit of feedback.
Murilo:And maybe you have tried CloudCode. When I say when you say yeah. Oh, and by coding, do you do you you mean CloudCode or do you mean Versus Code with some extension? Or
Bart:what is your go to today? I mean I mean clock code. But I also have the feeling that the term VoIP coding you can't use anymore because it's Yeah. Triggers people too much.
Murilo:Yeah. It does trigger people a lot. I don't
Bart:think you should use the term anymore. Viporting to me has quickly become something like, Ah, this is a person that know how to write software themselves, doesn't really understand what they're doing, but they're now building something that will never get past the proof of concept phase. Think for some reason, vibe coding term got linked to that.
Murilo:Or like people that are just effing around just like, you're not serious. I think the other thing is like it's not serious if you say I mean, also the name and I get like if you say vibe coding, it doesn't sound serious. I think people say it's not serious. Like it's not going to get like it's just a toy project, just a POC. People don't know what they're going do.
Murilo:It's not going to go far. No. But if you say agent agentic AI agentic assisted coding, someone sounds more fancy.
Bart:That sounds very fancier.
Murilo:Yeah. But sometimes you wanna trigger people, you know? You just wanna get them to pay attention. True.
Bart:True. So maybe something for the next topic.
Murilo:Yes.
Bart:How can you trigger people, especially the people next to you?
Murilo:The Scream to Unlock Chrome extension blocks social media social media until users loudly shout an embarrassing phrase, turning procrastination into verb accountability. And I quote a Chrome extension that blocks social media sites until you scream I'm a loser into your microphone. Yeah.
Bart:So you can say, Milo. Tell me, what is it?
Murilo:Pretty self explanatory. You have to basically, it's a Chrome extension. You install it, and then you block all social media. If you want to unblock social media, you can. They they allow you, but you just have to scream, I'm a loser.
Murilo:And the thing is, the louder you scream, the more time you get. Yeah. So I thought that's when, like, soft engineering meets genius. That's that's a genius thing. You know?
Murilo:Like, maybe it's not gonna be big, but, I thought it was pretty pretty. I don't use it. But yeah. So and then you see here, like, if you just read it like,
Bart:I'm a loser.
Murilo:Then maybe just get, like, a few seconds of social media. But but, yeah, if you want to if you want to browse, if you wanna check your messages, you probably need to be a bit louder. I think this should be, like, standard in company laptops or company desktops, you know, like the the ones that you cannot take home. And then, like, everybody just, like, screaming. Like, you can use it.
Murilo:You just have to, you know, play by the rules. Nice. Yeah. Right? Pretty cool.
Bart:That's something something I think, something that everybody struggles with. Right? Like, it's very easy to get, distracted by, by social media, in your workflow, but also, like, in your in your life. Right?
Murilo:Yeah. Indeed. I I'm for one, I'm very I can get very sucked into to anything pretty much. Like, I'm I'm trying to be very, very mindful to, like, my screen time and all these things, especially in the morning. If I start in the morning, like, I feel like my rest of my day goes downhill, really.
Murilo:Yeah.
Bart:I I had I was a bit you you sent this, but there was something else on, I think, on Hacker News this week as well. That did something similar with our smart button, but it was linked to their laptop. But I was thinking you can actually use a smart button. So, I use on my home network, have PiHole. Pyhole basically allows you to it's typically used to block ads domains.
Bart:But you can block any domain that you want. So, I was thinking, well, maybe I should have this big button in my home. And when I press it, it blocks all the social media domains.
Murilo:That's it.
Bart:So so it's not only me that can't use it, but like everybody. No one. Yeah.
Murilo:It's like telling your kids, I'll I'll do it. I'll press it. Don't don't touch me. Don't try me.
Bart:So that's maybe a good idea to build.
Murilo:Yeah. Yeah.
Bart:There's a lot of bad things. So every every now and then just block all social media domains for a few hours.
Murilo:Yeah. Yeah. For sure. I mean, sometimes, like, the weekend, I even try to keep my phone in a separate room. Like, I try to not even have anything close to me.
Murilo:I think it's like I noticed, like, it's habit sometimes. Like, I grab my phone and I don't even know why. There's no queue. There's nothing. I just just go for it.
Bart:You're just hoping for a message from me.
Murilo:Yeah, exactly. Was like, is he thinking of me? I like, call my wife. You think he's thinking of me? But no.
Murilo:Usually not. So yeah. Whatever. But I think that's all the topics for this week. It is.
Murilo:Again, if if you if you like, we will always appreciate a five star review.
Bart:444 is also fine.
Murilo:44 But 3, then then don't worry about it.
Bart:Just leave your feedback in the super super valuable, and we're happy to see your reactions.
Murilo:And if you like to join us as well in this, like, couch style podcast. Yeah. Hit us up. Let's see. Let's let's go.
Murilo:Let's do it.
Bart:And I actually I I will talk to you to you about it after this recording, Marino, but I have a few more interesting guests that we can schedule.
Murilo:Alright. And then maybe for all the listeners or thousands of listeners, stay tuned.
Bart:Yes. Okay.
Murilo:Thanks, Mark. Talk to you later. Ciao.
Creators and Guests


