Agents Everywhere: OpenClaw, Codex, and the Post-Chatbot Shift

Hi everyone, welcome to the Monkey Patching Podcast, where we go bananas about all things
lobsters, vibes and more.

My name is Murilo, it's been a while.

I'm joined as always by my friend Bart.

Hey Bart, how are you?

Long time.

Yeah, long time.

I'm good.

It's bit...

It's a holiday for the kids.

It's a bit juggling crowd today.

But I did just come back from a short ski in Lévoche.

So that was nice.

Just last weekend.

It was actually snowing while we were there.

It was definitely good enough.

It's not that high.

It's like 1100 meters top.

So you need to be a bit lucky to have good snow, but it was actually a good weekend.

So you were lucky.

That's nice.

uh I went to South Africa for almost two weeks.

So we went to Cape Town and then we went to Johannesburg for an Indian wedding.

So whatever you picture, you you think of like Indian wedding in South Africa, that's
exactly how it is.

You know, the music, the outfits, the animals, you know, whatever you think, you're right.

like dancing to Indian traditional music with a, in the background, like a zebra walking
by.

exactly that's exactly how it was that's exactly how it was i could say much more about
that but yeah i know it was that's the thing i feel like i had to um i had to get my

biases in check as well i feel like there are a few moments that i was like if someone
records me right now canceled immediately you know it looks it looks bad

But it was fine in the context in time with the people it was fine, right?

But anyways.

actually from the guy that had this wedding that there was quite a big Indian community in
South Africa.

I didn't know that.

yes.

And it was also, I mean, was fun.

Cape Town is really beautiful.

But it was also a lot of, I don't want to say educational, but there was a lot of
interesting insights, right?

Because the history of South Africa is very recent, right?

With apartheid and all these things.

So a lot of the people we talked to, can actually have that very close, you know, like how
it was to live in this, like the parents of the groom, right?

For the wedding that I was there.

So yeah, so was very interesting.

It very nice trip.

But...

Happy to be back.

happy to we had some listeners being people ping me is like, we won monkey patch and I was
like, okay, enough is enough.

So yeah, exactly.

Exactly.

So wait.

Yeah, exactly.

Yeah.

So we know more.

We're back.

So this is it.

And you also have news on regards to your, your new, your new endeavor, right?

Like you're looking for help.

hiring.

We're hiring.

Yeah, yeah.

about and who are you looking for?

Well, you can actually find the vacancy.

Make some advertisement.

Do we need like a k-ching?

Like a sound effect or something?

um We should have these things, huh?

When we make advertisements.

But maybe self-advertisement is okay.

But, so you can find the vacancy on Cambrio.

With a C-C-A-M-B-R-Y-O.com Cambrio is our...

quote unquote official venture studio.

The quotes are something to explain another time.

But our first venture that we're now building is is top of mind, we're calling it.

Top of mind is

It's a tool to remember, let's say, the small things about your network.

Like when you go to a networking event or when you have a lunch or when you have whatever,
like you typically remember the faces, but all the details, like you very quickly forget,

right?

Like you had a lunch with someone and let's say it's a potential client of yours and he
says something like, it gives a small tidbit of information.

Like two weeks from now I'm having a small surgery on my knee.

like not super relevant for the commercial thing that you're doing with this client.

But like if you see them in three weeks, it would be nice to remember this, right?

Like all these things.

And we're making it very, very, very simple to record in whatever way any inputs like
notes, recording of a meeting.

If you're afterwards in the car saying something to the app, everything gets recorded very
easily, whether it's text, video, images.

what our solution does is basically from all that raw input, we have a layer that
structured all that into a structured knowledge base basically.

So we make that translation automatically.

And then next time you meet this person, we surface it for you.

So you get all this information at the moment that you need it.

And that's what we're doing with the top of mind.

And we have a vacancy open for an AI native developer, which is ridiculously hard to find
actually.

Yeah, so we talked a bit about before we hit record, right?

So this is the job position for people following the video, right?

This is it.

Cambrio.com, then you can apply here.

But you're looking for a very specific software developer, right?

Like you call it AI software engineer.

How would you describe that person?

and it's actually because we talk about this a lot, like on the podcast for the last, I
don't know, the last year.

I think like AI native development and like basically means like just using codecs or
cloud codes to build whatever you're building and not look into an, not look in an ID,

don't do any traditional development, just orchestrate your coding agents.

To me that is AI native development today.

And I think it's also something that's still relatively recent in a sense that like Opus
4.5 in what is it November, like it really enabled this, like the performance went up a

lot.

And I'm looking for someone that basically doesn't write code anymore.

I'm looking for someone that is very good at orchestrating coding agents.

And...

question, the question I had before was, I do use cloud code today.

I'm not to the point that I don't look at the code anymore, but for most of the cases, the
default, like let's say 90%, I still manually review, right?

So in cloud code you have like auto-approved changes or you manually review.

So then it says, I'm gonna do this, that's okay.

And I just, yeah, okay.

yeah, okay.

Would I still qualify for this position?

I think I could work with you.

Maybe if you did a podcast or something, not an app, maybe something.

No, but I think it's fine to, let's say, what you're mentioning is like in cloud code, you
have this verification step, Like you need to say accept or do not accept for every step

that you do in edit.

I personally don't think it's very efficient.

I don't have anything necessarily against it either, because I think it also helps to
build up trust.

Like if you're making this full switch to AI native development, like this is a good
intermediate step to...

built at rest.

Yeah.

Also not for me.

I agree.

There's also building a bit the mental model of the code and like, okay, these are the
things that are happening here or sometimes, especially in the beginning, cause you have a

lot of rules, right?

Like you want to say Claude don't make functions that are really big or whatever, or don't
use this type of pattern and use this, know, don't use a lot of nested for loops.

And I think in the beginning, when you actually, as you look at the code and you say,
actually don't do this.

do that and also add a note for you to never do it again.

And I think I noticed that in the beginning of the project that happens a bit more, but
then as time goes on, it's like, it's less necessary.

I can just say, okay, yes, yes, yes, yes.

And then I get to a point that like, once he gives me the plan and just approve.

I also think there's a bit of, so there's there is getting more confidence in the model,
but I think there's also a bit of a learning thing as well.

Cause sometimes I do request some things and it's less and less, but still sometimes I
think the prompts are too broad.

And then I noticed that this is producing stuff.

like, actually, that's not what I want.

Right.

So I think there's for me, it's been it's been helpful.

And I still sometimes I do auto proof, but it is still an exception today.

Right.

I think for me, at least, and I think for a lot of people, it's like there was like typing
code and the tab completion models.

And then you went to like, I can just do everything.

I'm just going to auto prove everything.

And then you just get this huge pile of code.

And then you can like you you start to notice that it's like there's a lot of code
duplication.

like things are not really organized anymore and like you kind of have to find a bit of a
in between, right?

Like it's not like it doesn't just work if you just say build me an app, right?

So.

true.

think there are actually it wasn't planned this way, but actually today I created a new
blog post on this how to get started with AI native development specifically on cloud code

though.

And the reason is because like over the last two weeks there were three moments where I
had to onboard someone on a project and to switch from traditional development to AI

native development.

really?

And it wasn't Cambrio?

with...

No, it was not for Cambrio.

I'm supporting some students.

wow, cool.

and they still see very little in their curriculum on this.

So we switched them to Claude Goat and what you notice is that people are researching
themselves, but then you very quickly find...

sort of old information, like four, five, six months old information on this.

And it says, ah, yeah, when you go to plan mode, you need to find this model or this and
this is Claude MD or all these, you need to have all this full spec file for it before you

can start.

And at that moment, that was relevant, but like so much changed since November, December,
right?

Like, and I think it's, this is just like, I think a lot of us just works out of the box,
it works now very intuitively.

And these are just some pointers that are probably outdated again in two months.

Yeah, it's true.

This go by very fast.

if you want to get into this, like AI Native Code, just give it a try.

And maybe just one thing, like how is it to, because I do have the question like juniors,
like people that are less experienced, is it different for them to pick these things up?

Or when you coach them, you realize that actually it's just, like, do you need to be quote
unquote senior to make use, to be efficient with AI native coding?

Or do you think they just need to be taught?

I think they pick it up very quickly, to be honest.

These are also, let's say, computer science students, right?

Like it's not that the underlying concepts are new to them.

What they of course do not have, but what you also do not have like with traditional
programming, like they have very little architecture experience.

Hmm.

And to me, I'm not sure how I feel about it yet, or how to tackle it, instead of with
traditional coding, you become one person in a team.

And the team has an idea on, you're going to do this, you're going to build this.

And this is the solution that we're building together.

But with AI-driven coding, you're suddenly, you become the team lead of five different
agents.

And that implies that you need to have a bit broader view on

How does the architecture look like?

yeah, it a difficult problem.

And maybe I'm also overthinking it because actually when picking it up and just starting
seems to be very easy for them.

Okay, well, that's good.

That's good.

It's a true data point, right?

And I think that's also what we're talking before.

You mentioned it's hard to find these profiles.

And it's also interesting to hear because we see a lot of this on blog posts and there's
two types of people.

I mean, today I think is a bit less extreme, let's say, but there's the people that are
100x developers, that they are true believers that you should never do this and they're

really evangelists.

And then there's the people that...

and this group has been decreasing, the people that say like, is just hype.

But apparently the signals we get, they're not yet the reality.

Well, I'm seeing a lot of people now for the vacancy that we have opened and why that's
also interesting is because you hear like they're currently working somewhere else and you

hear what they're doing there and I think the reality is that for

A lot of maybe tech startups aside, a lot of established companies, they are very far from
adopting an AI native workflow.

Like they maybe have maybe give their developers a license to co-pilot or something,
right?

Like it's a bit underwhelming.

So I think there is still quite a way to go.

Interesting, You know, very cool.

think maybe so for people that I want to...

This wasn't planned, right?

But maybe for people that want to get started, you would say this is the go-to for today,
tomorrow.

is maybe bit overstating it, but I think it's a very short document, right?

You only lose, I don't know, five minutes while reading it, and I think it's a good
starting point to really go all out AI native development with Cloud Code.

Sounds good and good luck also finding uh the position.

Again, it's cambro.com and then you can just look for jobs and just apply there.

all right.

So what do we have for today?

VentureBeat argues open AIs, open claw move is a pivot from chatbots to agents that take
actions across apps and systems.

The tension is that open claws fast and lose openness, help it go viral.

So what happens when enterprise guardrails and safety expectations move in?

A few things to unpack there.

Yes, please do.

So first things OpenClaw.

Previously Moldbot.

Even more ago, further going back, Claudebot.

Now OpenClaw.

The title says here OpenAI's acquisition of OpenClaw.

That actually did not happen.

Peter Steinberger, is the author of OpenClaw, got basically hired by OpenAI.

OpenAI has also dedicated resources to further develop OpenClaw.

For the people that don't know, OpenClaw is basically an agentic loop that you deploy on
your local-ish system, like your own system, it doesn't have to be local, and that can

call a lot of different models to do a lot of different things.

And because it is local, it also has access to do a lot of things with...

which creates a lot of opportunities, but it also creates some security issues.

But it is extremely powerful.

Also in a sense that it can create its own skills.

So if you say, for example, I find it important that you know how to generate PDFs for
markdowns, it's gonna research on how to do that and create a skill for that.

And from the future it can do that.

If you say, want you from now on to manage our GitHub issues in this and this and this
way.

Please do that, notify me when this goes wrong, give us an update on this goes.

It then does that.

We, for example, use this for top of mind for issue management.

uh So it is extremely strong.

And now with the pseudo acquisition of OpenClaw, VentureBeats is basically arguing that
it's a bit to the end of the traditional chat era.

as in chat GPT and that we will much more have something that we suppose just still
probably chat with, but can take way more actions because chat GPT these days is still

very much a, I request something and it gives me some text back, but it's not gonna be
able to send an email for me, right?

Or remind me tomorrow at five o'clock that I need to go to the doctor or like a lot of
actions it still cannot take, right?

Do you agree with this opinion or do you think it's maybe before you answer, I think I
also saw somewhere that open claw was already sponsored by OpenAI, but now it's still

going to be something separate.

I think still open source.

I'm not sure.

But I remember reading that it was still going to be.

Sorry.

Yeah.

yeah.

It will be open source.

But in the end, it's very cool what Peter Steinberg did.

It's very impressive, but it's more of a, let's say, a creative effort than that is a
super hard engineering challenge.

Even if OpenClaw doesn't get like, if OpenAI doesn't exactly implement OpenClaw, will
implement an alternative.

Because the call base is not super complex, it's just a very creative way of looking at
this, what Peter Steinberg did.

So I definitely do think that we will see where we now have our, let's say, very, where
chat GPT is kind of your Google, Wikipedia combination on steroids.

I think what we will very much go towards in the future is like your personal assistant on
your phone.

Yeah.

Also, this has been, I mean, it also said on the preview that you read, Like, OpenClob
became very popular because it's very loose, very flexible, right?

Like, the setup, you can connect with WhatsApp, with Telegram, with this, with that.

There's a whole bunch of stuff off the box already.

In some ways I also feel like it's...

Again, taking a lot of steps back and maybe this is a hard transition, but I remember when
we talked about generative AI years ago, I was thinking that like the models that were

going to win are the models that are specialized for one task.

And JGPT came out, it was a bit the opposite, right?

Like it was because everyone could use, everyone would just go there, be impressed.

And that's what kind of build momentum.

And it feels a bit the same with OpenClaw, like in the sense that because it's so
flexible, like for example, if I, if I create an OpenClaw instance, right, I probably

don't need WhatsApp, Telegram.

of iMessage, whatever, right?

I probably just want to feed one or two, but the fact that it's very, very flexible is
probably what made it so popular, right?

Do you think that this is the way for like AI products to be trying to be as generic as
possible?

Well, the underlying infrastructure is generic, but because you have this evolutionary
skills aspect, you can ask it to define skills.

It very quickly becomes like niche focused on you.

True, it's very adaptable.

It's extremely adaptable.

even though the underlying LLM is very generic, like the actual application of something
like OpenCloft for you as a person is super niche for you.

I think that is the extreme strength of this.

Yeah.

And also I saw a lot of the security concerns, right?

Because again, there's a lot of stuff, right?

It has a lot of accesses.

And there were a lot of people on Reddit and blog posts, like calling out the security
concerns, right?

And I think even if you go to OpenClaw, they have announced me here that OpenClaw partners
with VirusTotal for skill security, right?

Is this something that worries you at all?

Like, do you think about these things or?

think you need to be worried if you have people using this that completely ignore it.

One of the attack factors is that there is a skills hub and people start downloading
skills from other people.

That is very easy because I can basically inject stuff in those prompts that does
adversarial stuff.

You can also be dumb.

make your open cloud gateway open to the internet, like to the whole world and so
everybody can log in like that's just dumb but it's happened a lot.

You can give it access to stuff it shouldn't have access to that creates security like
it's and like all of these are just like you need to understand what you're using and the

powers it has and if you don't understand it then you probably shouldn't use it right if
from a security point of view.

So I think like I to nuance that a bit right.

I mean, in some ways, like open cloud, it's so easy to set up that people that don't
really know what they're doing can actually kind of do it.

But that's what I think the danger is, right?

Like it's, you don't really know what you're doing, but you get something out of it.

So it's like, it's fine.

But when it really isn't.

em

So for example, claw bots that we're using, they're set up in an isolated environment with
just the access that it needs for the skills that it has.

like the security issues are very limited.

And how many open cloud bots do you have?

Two.

Do you have any, this is all for top of mind or do you have for personal use as well?

One personal one top of mind.

One person over the top of mine.

cool.

Maybe this is a side question.

There's a lot of open-claw, cloud bots, molt bot alternatives.

There's like nano something, zero claw.

You try nano, nanobot.

Nanobot or nanoclaw?

think it's called.

And...

A bit less flexible, the good thing is that it runs on Apple containers, so it's a bit
more isolated.

What's also very good is that it runs on Entropix SDK, which means you can actually use
your Entropix subscription.

The OpenClaw you can only use via the Entropix API, which is way more expensive.

uh

But yeah, think this open claw is what?

Like six weeks old?

So we'll see a lot of these things popping up in the coming year, I think.

And you see like the the star is like 200,000 stars.

Like it's insane that the popularity is like if you not even a hockey stick is just like a
vertical line.

If you look at it, it's like it's really crazy.

So there's none.

And also just to name a few others that I saw like Pico claw.

This is written in Go.

And there was another one that I heard on the change log podcast, actually zero claw that
is in rust.

um

how easy it is to implement this, right?

Exactly, but that's also what I was and you're running this where actually you're in this
on Mac mini or

No, I'm running this on, well, I do some testing on my Mac Mini, but the ones that I
actually use, I use on Google Cloud VMs.

I was also looking, I mean, they're making this smaller, faster, like minus 5 megabytes
RAM, right?

All these things.

So I was also thinking like if I have Raspberry Pi and I just want to have something
running, you know, like just like a personal little system or something, you could just

mess with it, right?

true.

realistically speaking, you're not going to the models locally.

The models that you can run locally today, they're not good enough.

I think to really have good performance, still need to go the whole Sonnet 4.6 or Opus 4.6
route.

Yeah, so your advice for people who want to go in the open claw journey, just stick with
open claw.

For now, yeah, would say for now stick with open claw and with respect to the security
issues just think about what you're doing and don't be dumb.

and then don't just say, do everything, right?

Like say, just do this, do this, do this one by one.

Don't try to go too fast as well, right?

I think that's the thing.

All right.

you're normally gonna install something from the internet and if you run the install
script and assess, I'm only gonna be installed if you give me pseudo access to everything.

I mean, you're not gonna install it probably, right?

I mean, just keep thinking.

like, some people are like, yes, I read and agree to terms, know?

Yes, yes.

Just give me, just give me, me going.

Yeah, yeah.

It's a bit harsh, but I know what you're saying.

It's like natural selection on the 21st century, right?

It's like, it's like the next gen, like it's really like natural selection, right?

Like you can't.

You can feed your family, y'all, like the genes stop, you know, like that's...

All right.

What is next?

Next, we have a new paper tests whether the repo level context files like agents.md
actually help coding agents finish tasks and finds they can backfire.

The punchline is a double hit, lower success rates and more than 20 % higher inference
cost, hinting that, and I quote, more context mean more confusion.

That's a good question.

Here are the names.

Are you gonna list all the names?

No, I thought it was gonna be like MIT or something like this, but I don't know

We don't know who the authors are, right?

I'm not sure.

What is this paper about?

Wait, let me just quickly see if I can actually see where is which research institute is
coming from.

I can't see quickly.

So it is about, so if you are doing AI native coding or at least a hybrid version of that,
you often have files in your context.

Agents.md is an example where you specify I want to these and these type of agents and
then your CLI can pick that up.

the default track, quote unquote default, right?

Like from the Linux foundation.

And Claude implements it as Claude Omnid, that's what you mean?

No, because there wasn't there like standards that were donated to the like a Linux
foundation, a new Linux foundation.

I thought that agents MD was MCP was, but I forgot the name.

Let me just.

Okay, yeah, it's in my mind agents and we can very well be that it's done it but it just
is just like a way to define like these are the sub agents that you can call and that can

do tasks.

Right.

Another example of stuff you have in your context is like Claude.md, which is sort of your
memory.

I think it's actually in codecs is actually called memory.md.

Not sure there.

um

Other things are, let's say, I have a to-do.md or I have some description on this is the
diagram of the architecture that we're using.

And I have that in the mermaid diagram.

You have lot of things, like extra information that we think is valuable to the alum to
help us do better coding.

So it's basically files that you include to your project, but like they're metadata in a
way and it's really just for the AI.

Yeah, and we think that by having this in that context, they will perform better.

I that is a bit the hypothesis, right?

And this paper actually looked into that.

More specifically in the agents.md.

Like, is it actually valuable?

Look at this.

Is it?

There are probably a lot of asterisks needed here, but I think I can summarize what the
paper says is that it is typically not, because it very quickly increases the context that

you're working with significantly, which actually does not improve performance if you have
too big a context.

And your inference costs very quickly go up.

your inference, why does your inference cost go up?

you build up this chat history and every new request, you send along history again.

Hmm, okay, I see, Do you agree with this, actually?

I'm a bit surprised that they're saying this.

Do you agree with this?

What I would be interested in is, maybe it's actually in the paper, is when they tested
it.

Hmm.

And why am saying it?

Because what we see is with every model iteration, we see improvements in memory retention
across larger context windows.

We see improvements on caching, example, like we see improvements on actual usage of
subagents.

So I would be interested to know, because like to me,

There was a big change when Opus 4.5 was released.

When you...

A lot of things just work out of the box.

Yeah, Very true.

like, like depends a bit on what timeframe it is.

My personal opinion, like, because I use cloud code a lot.

I typically don't create specific sub-agents anymore.

I do not know, no.

I do sometimes specify that, and I did use it in the past, but it didn't like, to me it
didn't feel like it gave me anything, like it improved anything.

Like I use it like a UI UX agent and a test suite agent and a product management agent,
but like, I don't think it necessarily, like the performance became better.

Maybe it's not worse either, but like.

I was a bit indifferent to it.

So you did it kind of like to try it out quote unquote, but you didn't feel as big of a
difference and you switch back.

It's not because the models evolved or okay.

Interesting.

But maybe one thing, cause I saw on your, on your blog, the one that we talked about in
the beginning, you do mention a code simplifier.

I had to plug in though.

Yeah, this may be good point.

Maybe it actually overlaps a little bit with agents because a plugin is also sort of a
markdown file.

The difference is a bit is also with skills.

It's like they're a bit passive in a sense that your LLM knows that they're there, but
they're only gonna sort of mount the skill when it's needed, when you request it.

So it doesn't fill up context if you don't need it.

So can you recap that?

it's basically saying you.

the paper is on agents.md, which is always in your context, or should always be in your
context, right?

Like all the instructions are there.

But with these type of plugins or skills, like how it's done, it's like there's a very
small entry in your context, which says, know that these skills are there, and these are

the description skills.

But from the moment it actually uses them, it mounts the skills, and only then it comes in
the context.

So it's not really an issue that you need to think about, like...

that you always drag this along with every inference request.

I see.

I see.

Because even the the but that's like you're talking about agents MD and maybe just to make
sure we're all on the same page, including the people listening, the agents and agents MD

is basically like a markdown that describes more specifically what a sub agent should do.

Right.

So instead of like and base the idea is that by being very specific and what one agent is
going to do, it's going to perform better than just being very generic.

That's the assumption.

Yeah.

And then

This agents MD is always in your context, so it always increases, but like the code
simplify here, which is a plugin, it's not always in the context.

It's just like listed as it's there if you want.

And if you and the agent quote unquote double clicks, then it actually expand and then it
consumes the context.

Claude also has sub agents, which doesn't use agents MD.

Do you know how the mechanism is for Claude?

Is it more like a plugin or is it more like agents MD?

I think plugins are preferred over HSMD, but you could use HSMD.

oh

that Cloud also has subagents like...

Well, before plugins, because plugins are still relatively new, actually do have in Cloud,
you have something like if you do slash agents, you can define your agent and then it

creates a markdown per agent, if I'm not mistaken.

kind of like the plugins.

It's more like the plugins than the agents.md.

Now I think it's very similar to agents.md with the difference that you have this per
agent.

I see, see, I see.

Okay, interesting, interesting.

Yeah, these things evolve very quickly.

Yeah, so, thank you.

think about, like maybe more generically, like, does it make sense to have all this in
your context?

Like, not necessarily agents.md, but like, I think we always, especially before, again,
November, December, when Opus 4.5, before there was a lot of discussion, it's best

practice to have your spec file there, and you have that there, and that there, and...

I think a lot of that doesn't really start from scratch and only if you really need
something to be remembered put it in your context and otherwise don't.

Yeah, no, I agree with that statement, but I also think if someone is starting today, I
would say have a spec file still.

think a bit about...

You disagree.

Like for people that don't know, they never used this before.

Yeah, well, depends a but like, like, I think if you want to do a green field project, you
need to think, and that's a big mindset shift.

You should read my blog post.

like before in traditional development, you're going to think I'm going to build this
technical feature.

got assigned to build this technical feature.

Right.

And now you need to think about like, I'm going to build this functionality, that user
wants.

because it's your agent that's gonna build technically, but you are gonna act as the user.

Like this is the functionality that I want you to do.

And then there's a bit of a different way of looking at it.

And for you, it's easy to prepare that in a spec sheet.

I think that's fine, but to really focus on what is the functionality that you're
building.

Yeah, I think also again, so I don't know, maybe we'll say spec sheet.

I'm not sure if we're saying exactly the same thing, but one thing that I still have fan
off is like having like software requirements, you know, like just something like what are

the steps that we want for this project?

Because also it's easier to chat with your agent and say, okay, what should we tackle up
next or where are we in here?

And it kind of...

It's easier for me to just kind of have everything like this is what I want and to finish.

And this is where we are right now.

Right.

And also for me to reflect a bit or if I have if I change my mind or actually this should
work like this should work like that.

It is something that I still find useful.

Right.

Which in a lot of ways has nothing to do with agent so much but also like don't keep
things in your head like where do you want to go with this.

Right.

Having a bit the path of OK you want to have this app at the end.

So these are the steps that we need to take to get there.

But to me, is like project management, right?

Like these become, to me, how I do this is like, these are backlog items or these are
items that are working on in this sprint.

To me, that's not in the code base.

To me that for what we're using, for example, it's in the hit of issues and get the
projects.

Or it could be in Jira or it could be in Trello or whatever, right?

Like.

Yeah, I think for me a lot of times I'm doing things myself.

So that's why I just...

Yeah, because I just put on the COVID because...

something that you necessarily need to do in the codebase.

No, but that I agree.

I don't think it needs to be in the code base, but I think for me having in the code base
is also easier to chat with Claude and say, okay, what do you think should be the next

task?

Like, know, sometimes I don't even that decision, you know, I say, what should we tackle
next?

And then he says, ah, we should do this, this and this.

I was like, okay, maybe this is too big.

Maybe let's do this first.

Or let's, know, like.

But for that, actually ask it to check the hitup issues and to use the GH bash client, the
CLI to do that.

But that's a fair point.

having that knowledge of what's next, easily accessible, whether it be in the code base or
whether it be somewhere else, but like give your agent access to things that you as a

developer would also want access to.

Yeah, no, but then I think we're yeah, but then I fully agree.

think we're saying I think we're pretty much saying the same thing is just different
places and like I don't also don't not opinionated where it should be as well.

Who?

No, not today, but how?

Not today today.

Cool.

What do we have next?

ETH Zurich, uh Swiss University, that the paper comes from.

Interesting that they didn't include, interesting that it wasn't easy to find because it's
a...

Well, it's actually like if you look at the footnote number one under it, like it says
Department of Computer Science.

So we were dumb.

Yeah, it's a nice problem.

Yeah.

Yeah, exactly.

Exactly.

cool.

What else do we have?

Simon Willison highlights research suggesting AI can increase work intensity instead of
easing it, especially when productivity gains mask burnout.

The provocative angle is managerial.

If AI boosts throughput, how do organizations prevent that extra capacity from turning
into an always-on expectation?

I've noticed this as well.

Same way.

So I was working on like a very big refactor yesterday.

And actually, I already knew that this was a problem yesterday.

So yesterday's not really a problem, it came like a few weeks ago.

But yesterday is a good example.

Like yesterday I did in the afternoon, like I don't know, in four hours or something, I
did a refactor that would normally have very, very easily have taken me two weeks.

Mm-hmm.

And our application is in the very early stage, early stage life cycle.

like a lot of things are changing at the same time.

Right.

And you are, you fire off the agent work on this, and then I'm going to like split my pain
and I'm going to say, uh, also work on this.

Like this is a, this is separate in the call base.

You can already build a plan for that and also execute that.

I'm going to split the pain again.

And I'm going to have a third one going to work on that.

And then the fourth one, I'm going to work on that.

then suddenly, okay, 40 is not enough.

I'm going to open a new tab.

I'm going to like.

And you're juggling these, I don't know, at some point, six different, let's say, main
agents at the same time, and you need to have this context of all these things in your

mind.

And you're hyper-focused on this.

And it's also a bit addictive because you move very quickly.

But after four hours, like I say, I was very happy with what I've done, but you're
mentally tired.

Way more than I would have just, like,

on traditional coding for four hours.

Yeah, but I think it's more because of context in this in your case, right?

Like you have more context to manage and drains more your energy, right?

It drains more your energy, also like it's because of AI coding that this is possible,
right?

Because like you fire off an agent and it's going to take five minutes to do something.

So that gives me five minutes to do something else with another agent.

Yeah, yeah.

Yeah, I see what you're saying.

I think for me, well, two things.

If you still auto-prove, that only happens if you auto-prove.

I think if you still need to manually approve stuff, it's a bit less, right?

there is a window, right?

Like that you can still stay in front of a computer and wait.

And I think if it's five minutes, that's too long, right?

For me, when that happens still, like...

It's just because I have messages, have emails, know, so like, I don't, my context
switching doesn't, like, I don't think he snowballs as much, right?

Because I need to reply to this message on Slack, I need to do this, ah, there's this
email, then I go back, okay, now do this.

But it is an interesting, I never thought of that actually, but.

But it's maybe not a huge problem in a sense that...

Because this is a bit like blowing the whistle, right?

Like, we need to make sure that not everybody's gonna burn out.

But I think this is mainly a problem that you can have working on a project yourself.

Like you're a hobby project, you're gonna go all out on this.

Or as like...

maybe a technical founder because you have like, you yourself have this complete vision of
the product and you can work on a lot of these things at the same time.

But if you're working on a team and like you're assigned to these and these tickets, like
it's less of an issue I think because you're gonna work more sequentially by the nature of

the work that you do and the type of responsibility that you have.

Yeah, maybe that's I think the responsibility part, right?

Cause also, I mean, your scope, as a founder or whatever, like the scope is basically the
whole thing, right?

And if you have a team, your scope is going to be less.

And even if you could do more, you probably don't want to because someone else is going to
take ownership of that.

And there's also be like, so it's true.

It's true.

But yeah, I'm also wondering like, when you say this, so let me organize my thoughts a
bit.

Because a lot of times I feel like work your like you're paid for the value you bring.

Right.

And it's like if you can get two weeks worth of work in four hours.

Yeah.

You're drained, but then just like you can work four hours and that's it.

Right.

Yeah, but that's not how, like this, you want to be paid for the value that you bring, but
that's not reality, right?

Like you're paid for the hours that you do.

And yeah, I must say I'm only becoming more more pessimistic on the future of
employability of software engineers.

So I think like with these new models, the performance that you get out of the box with
Cloud Code, like,

Either we're gonna like, I am literally as efficient as today as five traditional
engineers would have been two years back, three years back.

Like even if you like it.

you have five windows, five agents running parallel for four hours, even if it's just for
the four hours, but like you have stuff on auto approve, like the pace of writing and

deleting it's faster than a person as well.

Yeah, yeah.

And maybe to make it more concrete.

the application that we have now, we're going to have test developers, test users going
live next week.

We worked on this more or less a month.

This would have easily taken a team of a few people six months, three years ago.

Right?

Like there's a huge difference, right?

Like, and I know that a lot of large companies that are not there yet, but are gonna
switch at some point, right?

I think it's inevitable and the models will only become better.

There's no going back from this.

And I think the reality is that you will need less people because you're not going to
write lines of code, you're going to orchestrate.

You're going to become the team lead of your agents.

And the only way to not have an impact on employability is if the economy moves way, way,
quicker.

And companies start doing way, way, more.

But I don't.

Think the economy is gonna heat up 200 times like for in the coming two years, right?

And the only alternative is that you need less people.

like, think it's...

um

I'm wondering also if there's a lot of nice-to-haves that it was always in sight but never
got to.

So people are just going to start doing more stuff.

Like even more than nice-to-haves, you know?

But I think that only holds when the economy moves quicker because like if you're gonna as
a company gonna create more output, I mean, you need to be able to offset it to the

market.

Right?

And the only is possible if people have a higher purchasing power, like everything needs
to follow that.

Because like from a I don't know what you're saying, like maybe this and this is also
valuable, but I never got the time to like that doesn't apply at this scale.

At this scale that actually has an impact on the economy.

Like, can the economy follow this?

not, like, who will lose a job?

Yeah, I'm just wondering if there are things that people are not having.

I mean, yeah, but that's what you're You're just saying I think it's a yeah, I'm not sure.

It's a I don't think I think I mean, let me let me organize my thoughts before.

I think software engineering is still gonna be there, like you said, like software
engineering is changing, right?

So what you're saying now is that people are more efficient, but they're still gonna be
software engineers, right?

It's not like we're saying there's not gonna be anyone because managers are just gonna
talk to their agents and they're have applications.

You still need someone.

Yeah.

I also think like everyone's gonna be more efficient, but maybe not everyone's gonna be as
efficient as you.

Because I do think you're...

You're very knowledgeable on, I mean, just how a system works and all these things.

And you also, I mean, I think even in like, I don't know, 10 years, not everyone's going
to be five X of what we are today, I would say.

You think?

think everybody will be 5x.

Five years from now, everybody will be 5x.

And you will maybe have like very good engineers that we even like in traditional code
would call the 10x or 100x engineers and you will have them in the AI coding era as well,

right?

But I think if you compare like your traditional developer, your...

one X traditional developer versus your traditional your AI developer five years from now
they will be five X.

Hmm.

Okay.

Yeah.

I mean, we can discuss, but I don't think we can conclude anything.

We'll just put an event on our calendar in five years from now.

And we'll just see.

Because yeah, I do think it's like the knowledge of like how system works and all these
things.

Not everyone's going to have as much.

Right.

I mean, even today, like you see a lot of people that like they...

They know a lot how one thing works, but like if you go a bit broader, then I think for
you to really be 5X, 10X, I think you also need to have a broader thing.

And I think that would always take time.

We'll see in five years, Marilla, we'll see in five years.

Have you started on an alternate career already?

Aside from podcasting?

Okay.

Maybe I'll become a chicken farmer.

There is actually a guy on LinkedIn that he was like, I forgot, it's like CEO or something
or principal engineer or something.

And then the next item says like, he's like media senior, principal, whatever, CTO, and
then goose farmer.

you seen that?

It's like, it's going to be you.

very fond of cheese, so maybe some cows.

What?

Yeah, maybe, Yeah, you're Dutch, so I think it's in your genes,

Yeah, yeah, like we...

on any given moment, we're either ice skating or we're eating cheese, Those are the two
options.

that's what I hear.

This is the by the way, this is the post you see.

Principal performance architect at Microsoft and then Goose Farmer.

That's cool.

I'm going to see that soon on a podcast or a chicken farmer.

Yeah, that's nice.

All right, cool.

What is the next item we have?

have the software engineering Rebench is trying to solve a messy problem in agent
evaluation.

Benchmarks go stale and models get, quote unquote, contaminated by training on the tasks.

The leaderboard format makes you feel like a live sport, but the real question is whether
continuously refreshed tasks can keep results honest as models ship faster.

You shared this one, but I thought it was actually quite interesting because we see this
SWB benchmark on every model release, right?

And every model they're doing better and better.

But the question is...

Are the models actually using these benchmarks to train?

Right?

And maybe they are, maybe they're not.

leakage over time of these benchmarks into the training data of the models and like how
realistic are these benchmark values?

And yeah, exactly.

Do you know more about it?

Not in detail.

The only thing that I do know is that the tasks that they give to the models as tests,
they are continuously evolving.

Meaning that they're continuously mining for new tasks and that also means that these
tasks cannot have been seen yet by a model.

That's a bit the approach to basically take away this risk of contamination of the
training data.

Yeah, so this is why here even on the leaderboard, right, there's a sliding scale.

So you only get the newest, right, within your thing.

And maybe you see here, Cloud Code is the first one, which is, let's see, where is orange?

Yeah, because Cloud Code is not really a model, right?

But they also put it here.

Well, it's not really a model, it says there like it's an external system and Claude code
is basically the CLI, right?

And it can do much more than just the model because it can also execute the scripts and
stuff like that, right?

Like it's more powerful that way.

Also one thing that is interesting, Cloud Code is the first one.

The resolved rate is 52%.

So a bit half actually only.

I would expect it to be more.

How far do you think we'll get?

Maybe linking to our previous discussion.

Do think this will get to like, do you think this will ever get to like 80 %?

Like any problem you throw at it, you'll fix?

Well, I think 80 % should be doable.

80 % should be doable.

Okay, cool.

And then the...

of people are always doubting like, yeah, but it's not good enough for what we're doing.

And the answer to that is always, let's just wait a bit until the next version comes along
and it's suddenly solved.

Yeah, yeah.

That's also why I some...

I don't know if you've recovered, but there was one article that they were saying how the
guy was just hype and he almost owned that he was just hype, that the things didn't work,

but he was kind of betting on the models would get better.

Like, don't need to do work, I just need to wait and the models would get better and then
it will work.

things about the application that we're building now, it's a bit the same, like we're
relying very heavily on an AI agent to do stuff in the application.

And it's not perfect, right?

Like, and the question is a bit like, are you gonna build a lot of scaffolding around it
to catch these errors?

Or are we just gonna wait three months to four, open 4.7, right?

Which will probably solve it.

Yeah, true.

That's what we've seen in the last two years continuously, right?

Yeah, it's And it's, yeah, easier to wait instead of like, try to change the whole
application and try to catch all these things than to,

Yeah, Is the new, the fast codecs model in here?

Spark is called codecs Spark.

It'll be interesting to see.

Codecs Spark got released, 5.3 codecs Spark, model by OpenAI, which is super fast.

Like what I was saying, like you need to wait a lot for cloud code, which is true, also in
codecs, but like this is like an almost like almost instantaneous reaction.

Like you ask it to do something, it immediately does it.

It's quite impressive, but I was wondering how does this perform on the benchmarks?

Because to me that is from a user experience point of view, AI coding still a bit of a...

It doesn't feel great because you need to wait so long.

Yeah, maybe this is what you're saying, right?

Like this whole Codex Spark is a way more performant, right?

Version of 5.3 Codex, right?

So they have some next.

on the screen now and on the right side of the split screen, like they use Spark and it's
like, okay, here the application is already, yes, and the other one is still preparing

what it's gonna do.

That's impressive, right?

That's impressive, right?

The demo here, of course, it's probably like I picked specifically to work in the demo,
but...

and I do think it's a tendency, right?

We'll see more of these models trying to be faster because like you said, it is maybe also
linking back to previous discussion we had, do you think this quote unquote solves a bit

the burnout problem because you don't switch context as much?

Maybe it's a point.

Definitely related to him.

Right?

Very cool.

I also heard a lot of very nice things about codecs these days, actually.

Like people even saying that they prefer codecs over...

definitely, yeah.

I think it's bit, comes a bit down to preferences.

I think it can be on par with Claude Cote.

Maybe it's also interesting how they do this.

So they have a collaboration with Cerebras, which apparently has a very high performance
hardware to do inferencing on.

So what...

So this is the route they're taking.

And Tropic also has a fast mode.

And what they do is that they, I'm not exactly sure on implementation, but they run on the
same hardware, but they use smaller batch sizes to speed stuff up.

So it's a bit of a different approach.

OBDi really uses a different type of inference hardware to do these, to serve these fast
models.

And when you say batch, is it you're batching from other people's requests or how does it
work?

I think the batches of your own requests to Claude, to give faster feedback before waiting
for everything to finished.

Interesting, So yeah, okay, we see that.

sure how that exactly gets implemented.

Yeah, and maybe we thought we're talking about it's been a while since we we chatted,
right?

This is 5.3 codecs spark, but also 5.3 codecs was also released, right?

So this is technically new since last time we saw and it was like 30 minutes after Opus
4.6 was announced, right?

So this is like getting mouse.

4.6 was the best on lot of things.

then Codex was slightly better 30 minutes later.

you were there.

There was a tropical state of the art for 30 minutes.

I do still think and actually the the rebench benchmark shows it is that Claude code to
the CLI like with the way it's set up with tool usage with plugins with everything.

is still the best way to go these days.

If you need to choose, you can't go wrong with Claude Cote.

Yeah, yeah, no, also think it's the only thing that I'm...

Because also I was talking to some colleagues and they were using cloud code and open
code, I think.

So like the CLI interface a bit changing.

But yeah, for according to them, wasn't, they didn't really prefer one over the other.

They didn't feel like you need to quote unquote specialize over one CLI tool than the
other.

yeah, it was more because you cannot use codecs or you cannot use OpenAI models or Gemini
models with Cloud Code or you can, but it's a bit funky, right?

Yeah.

Cool.

Maybe one last thing I just want to share.

thought it was pretty funny.

There's a rentahuman.ai because I think what they were saying is like everything is
agents.

So now if you want to hire a human to do the things like you hire a human for your agent,
right?

You don't hire.

You have agents for everything.

Now you hire human just for these things.

I thought it was pretty funny.

I'm not sure if it's actually I mean, it looks like it's a serious business.

Like people connect, like people are putting their rates and stuff, but I'm not sure how,
how serious this is.

And,

This was, I think it's also like being used by bots.

Right, I think it's an idea that occurred on Moldbook, if I'm not mistaken.

But I'm not 100 % sure on what I'm gonna say now, but this is an idea that agents
discussed, well bots discussed on Moldbook and that they created this platform and that

other bots can now rent out humans.

Yeah, there's even an MCP integration here.

So that was pretty funny.

So I just wanted to share that.

it's just, it looks funny, but it's got like, could very well be that we're going to see
these things in the future, right?

and it's not necessarily like you're like, I think a lot of people, you and me included,
will have their own bot assistance.

And when you and me want to find a date for lunch, we were actually changing stuff on
WhatsApp earlier today.

Like,

I'm just gonna ask my agent to align with your agent and our agents are just gonna put it
in our agendas.

Right, think that's not unrealistic.

think that we're not that far off from that.

That is true.

And then we're going to get there.

There's going to be someone who's like, who are you?

your agent hired me to be your agent.

Yeah, it's going to be.

like having a discussion.

That would be nice for them, right?

Like uh a person.

Yeah, it's been a hard week for Rello.

yeah, it's fine.

he's going to, yeah.

Yeah.

It's a podcast anniversary.

So we wanted to give you, we organized a treat for you or whatever.

Yeah.

So let's see.

I was also thinking of, yeah, doing something with the Cloudbot for the podcast as well.

Like something like this.

I mean, I think we automated a lot of stuff already, but.

show notes and stuff, so maybe I'll.

about it.

That's a good point, yeah.

Because we didn't release a lot of episodes since Clawbot was released.

And like, we see how much I use it now.

Personally, like we need to think a bit about like what can we automate with OpenClaw?

oh Problem with OpenClaw is that it's still very costly.

it consumes a shit ton of tokens.

What if you use a know, minimax?

Minimax models is that it's very cheap, but

the OpenClaw doesn't work well if you don't use at least Opus 4.5.

um Unfortunately, it's also recommended in the docs.

I'm not sure if it's that explicit in the new docs, but...

I saw somewhere.

consensus.

actually switched now because Sonnet got released.

Sonnet 4.6 got released, which is on par in terms of tool usage performance.

And it looks quite okay.

So I think that already makes it a lot more affordable.

Problem is that you, we were discussing earlier, like you can't use it with your entropic
subscription.

So you need to pay for via the API and that quickly adds up.

yeah, we need to think of something.

need to think of something.

maybe Olaj fans can sponsor it, right?

Yeah.

buy me a coffee.

Maybe one last question you mentioned and this is my personal curiosity.

You mentioned you code with like five different windows with five different coding
sessions in parallel.

Do you use Git work trees or something or how do you?

don't.

A lot of people do,

Yeah, how do you do it?

You just let it run like and they because I think you also call it like he says he sees
right like if a file has been changed since I read it to kind of.

Yeah, it does that quite well.

um

So HitWorktree basically becomes like this, like you open another directory in another
branch or something, like there's a bit of way to think about it.

So you're working with two agents and they're each on their own Hit branch.

But that also means that like they're not aware of what each other is doing, right?

I typically don't do that.

I think it's bit of a hassle to merge everything together.

So when I parallelize stuff, I make sure that agents are working on different things.

So one is working on backend, another one is working on the specific frontend aspect, one
is working on these agentic stuff.

Like it's working on different areas of concern basically, so that they don't overlap.

To try to minimize.

I mean, there's always a chance, quote unquote, that they are adding the same file, but
then minimally.

Yeah.

that the concerns are spread enough.

But it also makes that you need the context switches are big.

Because you're constantly working on really a different domain.

Yeah, yeah.

Yeah, I think the Git work tree, thought it was interesting because then it's really like
developers, you know, they create a new branch and they're working on things and it's

true.

Sometimes they don't know what the other people are working on.

So it's really just going off of a ticket.

And then sometimes when you merge, you do have conflicts.

interesting.

OK, cool.

That was it for today.

So we had a yeah, we covered like four topics and we covered a few extras.

think this also shows a bit we have a slightly different format, largely similar, right?

But I think we cover a few less.

But people can expect that's also something we discussed, right?

In our, let's say, mini sabbatical, right?

A few changes.

So there are a things that we still need to plan.

But there'll be some small changes, I think, right?

Things we're trying out.

You want to say anything more than that part or you want to leave it at that?

In our dinner that we have, our last non-agent planned dinner a few weeks back, we were
discussing like to have a bit more, we're gonna maybe do less frequent news updates.

Well, that's the working draft at least, maybe let's call it like that.

To have less frequent news updates, meaning once a month.

and for the other weeks that we're gonna have interviews with tech startups and or
investors.

Yes, exactly.

So it's just making more space for having more interviews.

I think we had a few last year that I thought was really good.

So just doing a few things more like that.

keep an eye out.

We might want to also rebrand a little bit, like for the new setup, but we don't know yet.

It should become a bit more concrete over the coming, let's say two months, something like
that.

It's a realistic timeline.

think too much is realistic.

But then again, if you're following us on Apple Podcasts, whatever, you shouldn't have to
look for us again.

It should just be the same.

Yeah.

rebrand, it's just a switch of the name of the existing podcast.

Exactly.

All right, cool.

Looking forward to that.

Thanks Bart.

I'll tell what's the name of your agent actually.

What was the name?

You had a name for it.

It wasn't Jarvis.

was like...

I one called Barnaby and the other one is called Binky.

But you had another one known before as Jeeves.

No, it wasn't Jeeves even, was it?

It was like a longer name.

Like, I don't know, but let's say Jeeves.

Let's say Jeeves.

I'll ping Jeeves and then we can set that dinner.

And then we should try.

Yeah, we should definitely try.

I'm also very curious about the different open clawed flavors as well.

Maybe I was thinking of playing a bit with that, but I was like, but if it doesn't work,
I'm a bit, you know, kind of, you know, part of me just wants to get it to work.

Right.

So that's a, yeah.

Okay.

Cool.

Maybe I'll give it a try.

access, like just to your WhatsApp or something.

Thank you, Meryl.

I'll see you all next time.

Ciao!

Ciao.

Creators and Guests

Bart Smeets
Host
Bart Smeets
Mostly dad of three. Tech founder. Sometimes a trail runner, now and then a cyclist. Trying to survive creative & outdoor splurges.
Murilo Kuniyoshi Suzart Cunha
Host
Murilo Kuniyoshi Suzart Cunha
AI enthusiast turned MLOps specialist who balances his passion for machine learning with interests in open source, sports (particularly football and tennis), philosophy, and mindfulness, while actively contributing to the tech community through conference speaking and as an organizer for Python User Group Belgium.
Agents Everywhere: OpenClaw, Codex, and the Post-Chatbot Shift
Broadcast by