Palestra - bitcoin++ Floripa 2026 - From Vibe Coding to Agentic Engineering

26 Feb 2026

Reading time ~2 minutes

Esta talk foi apresentada na bitcoin++ Floripa 2026, edicao exploits, em 2026-02-26. O ponto de partida e uma mudanca de postura: em vez de tratar IA como atalho para gerar codigo no improviso, a proposta e tratar agentes como parte de um processo de engenharia com contexto explicito, criterios de qualidade, testes, evals e workflows mais disciplinados.

O arco da apresentacao liga referencias classicas de engenharia de software e open source a praticas mais recentes de programacao com LLMs. A talk passa por vibe coding, agentic engineering, contexto, harnesses, skills, comandos e a importancia de transformar conhecimento tacito em artefatos legiveis pelo humano e pelo agente.

Parte do valor da palestra esta em enquadrar o uso de IA como um problema de sistema e nao so de prompt. A pergunta deixa de ser “qual prompt magico gera o codigo certo?” e passa a ser “como montar um ambiente em que o agente tenha contexto suficiente, feedback suficiente e restricoes suficientes para produzir algo que sobreviva ao uso real?”.

Capítulos por assunto

00:00 IA ja esta em todo lugar e o custo esta despencando
03:56 Do discurso publico ao contraste entre vibe coding e agentic engineering
07:52 O que modelos generativos fazem: tokens, aleatoriedade e embeddings
13:46 Bitter Lesson, contexto e por que o gargalo mudou de lugar
19:40 Context rot, dumb zone e quando recomecar a sessao
23:36 Compound engineering, specs e workflows praticos
31:28 Fechar o loop com testes, saida observavel e otimizacao para o agente
37:22 Perguntas sobre matematica, provas e quando usar outras ferramentas
41:18 Perguntas sobre prazer em programar, oficio e identidade do desenvolvedor
49:10 Perguntas sobre treinamento, RL e o que realmente faz um agente

Links relacionados:

Pagina oficial da talk: bitcoin++ Floripa 2026 talks
Entrada de trabalho da palestra: Palestra - BTC++ - From Vibe Coding to Agentic Engineering

Transcription (experimental) 242 trechos · 3 vozes

Fonte: YouTube

00:02 SPEAKER_01 First of all, I want to know here who uses AI here.
00:10 SPEAKER_01 Yeah, everyone, nice.
00:12 SPEAKER_01 It's uh actually if someone said it didn't actually everyone does because your EBA used it, your elevator has AI.
00:21 SPEAKER_01 Catches actually it's the AI using you and not the opposite.
00:26 SPEAKER_01 But uh even dialogue models had AI in it.
00:31 SPEAKER_01 Uh when they connect to the internet, they use some AI to regulate how to regulate the the the line, the uh signal.
00:47 SPEAKER_01 So but we're not talking here like AI in general, we're talking of course about language models.
00:53 SPEAKER_01 We're talking about code.
00:56 SPEAKER_01 Um first of all, why should I care?
01:02 SPEAKER_01 And uh it's uh especially after this great presentation, like there's so many problems.
01:09 SPEAKER_01 But uh but there's also some really interesting things going on.
01:15 SPEAKER_01 For example, the company side, uh the Shopify Shop if I say AI is no longer optional.
01:21 SPEAKER_01 You should be using AI AI is a baseline expectation so there.
01:25 SPEAKER_01 Um companies are expecting for you to use um Spotify says it's best developers haven't written a single line code since December.
01:37 SPEAKER_01 It's uh it's a lot.
01:39 SPEAKER_01 Um there's a guy that um wrote re-rolled next.js
01:47 SPEAKER_01 in byte or VT, I don't know how to pronounce this.
01:52 SPEAKER_01 And uh just one guy in took one thousand dollars and uh now it's building uh
01:58 SPEAKER_01 And now it's doing uh far thanks lesser57% uh smaller models.
02:04 SPEAKER_03 And there's actually um site.
03:25 SPEAKER_01 Sorry, guys.
03:26 SPEAKER_01 Um, so there he you built this next JS uh uh using V and uh there's a website using it already, uh government website is C IO box website.
03:43 SPEAKER_01 Um he did it over a week more than uh I I forgot right now, but uh it took like a little bit more than a week to do everything.
03:56 SPEAKER_01 Everything this is like when we look to people using it.
04:02 SPEAKER_01 There's a Toby for from Shopify, there's Jack.
04:07 SPEAKER_01 Everybody knows Jack here.
04:09 SPEAKER_01 You can see in their uh in their GitHub that they are shipping more code.
04:14 SPEAKER_01 And uh it's no small and uh more more more more specifically like this monster guy that uh he built open club we talked about open calling recently like he did32,000 contributions in two months in this year.
04:38 SPEAKER_01 Yeah, it's a it's a movie he works like16 hours a day, he's he's crazy, he's completely nuts, but uh he's well open AI bought him, right?
04:49 SPEAKER_01 So so he is uh great guy.
04:51 SPEAKER_01 Um there's uh some some some some some some some some some some some statistics some some there's a company that did uh more exactly statistics there's a looking for more than a hundred thousand developers,92% are using covenant assistance,26% of the code now is AI altered, and uh they're saving about four hours a week.
05:17 SPEAKER_01 So it's uh interesting stuff.
05:20 SPEAKER_01 And uh machine learning engineer scientist, I'm using UNED I never know since2010.
05:31 SPEAKER_01 I worked in spiritual Satoshi, work on Bitcoin VIP, wrote that book, and and this month is one year anniversary of by coding.
05:42 SPEAKER_01 Uh one year ago and re converts wrote um there's a new kind of coding I call vibe coding, and uh he in his words it's fully fully giving to the vibes, embrace potentials and forget that the code even exists.
06:01 SPEAKER_01 And now one year later he wrote that lots of people are talking about this, and uh he actually likes the word agentic engineering as in uh opposite way to by coding.
06:17 SPEAKER_01 In uh like my coding is kind of like he was doing something like for fun, not like serious work, and now that there are new people doing real thing with serious work, he really likes the the the name agentic engineer because agentich means that you're using the agents actually and uh the engineering part is the real part, there's an art and science and expertise to it.
06:46 SPEAKER_01 And um so let's rewind a little bit, remember the basics.
06:54 SPEAKER_01 So if we want to like really do this better, understand it better, we should understand what is another level.
07:02 SPEAKER_01 Uh like first of all difference between classical and edge and AI, like the classical AI usually you you recognize you you don't generate data, you recognize a cat, the GI you draw a cat.
07:15 SPEAKER_01 So I usually it's a classification model in the G and AI you you create new stuff that didn't exist before.
07:26 SPEAKER_01 And uh so it's can you can you see this this is the architecture very similar to GPT2 and uh and this shows a little bit it's too much, the like this is the attention mechanism, you we don't need to go like really full detail, but the the interesting part is the last part here that it predicts that that it should be.
07:59 SPEAKER_01 That it predicts uh distribution of possible tokens, and then you use a random function to big one.
08:09 SPEAKER_01 And basically that's how that's the difference between Gen AI, you use a random function in the end, you predict the distribution and big one, and and then you go and incredibly this works somehow.
08:25 SPEAKER_01 So going a little bit deeper into LLMs, you have something called the context window.
08:35 SPEAKER_01 The context window is your your AI has a limited amount of input that you can put it inside.
08:46 SPEAKER_01 And uh I put slash rot there because there is something called context rot.
08:52 SPEAKER_01 When you don't maybe when when when when when when when when when when when you don't use well enough your context window, you can get context rot.
09:00 SPEAKER_01 Where it uh for example adding too much noise, adding too much uh input.
09:09 SPEAKER_01 One way you can see like if we go back to C, it's uh context windows are like array, so you're mallocating inside already, and the other thing is a sliding window.
09:21 SPEAKER_01 I I made this small video here uh to show so if your our problem is like one to three, and uh we want the result to be like the sum of everything, so it's six, and now the sum of everything is twelve, and now the sum of everything should be24, but it gets23 because one is already outside of the context window.
09:42 SPEAKER_01 So now we are getting context rot.
09:45 SPEAKER_01 And uh this is uh a way to to visualize a little bit better what. going on but the actual the actual um function that it's using it's a little bit more complex it's uh there there's uh embeddings and and there's uh lots of multiplication this is the attention mechanism how how how it works it's the you multiply everything by everything and that's why it's so small and expensive and uh and um actually the token itself like when we put there lots of people don't doesn't know that we're using tokens and not specifically words or characters they like like for example how many R's times strong area the AI doesn't doesn't see there's one R here and two R what it's is it's straw it's one token it's uh an index to a vector and very it's another index to another vector so that's why there are so many mistakes in in this sometimes it's important to know these specific things because this is uh if you know that they use this kind of uh if the tokenized like this you know that this is a terrible thing that for two esque for a actual and like for example I have wrote this the car is not rot but there is a token it is a token carrot like there's a token car in carrot and uh and if you see the token space rod here is different from Roth over there are two different tokens and uh how AI tracks these kind of things is
11:48 SPEAKER_01 Next these kind of things is that it gets the the your sentence it tokenizes like getting all chunks and every token is an index and this index translates to a um higher uh... uh very high dimensional very high dimensional um vector, usually hundreds, sometimes even thousands of uh dimensions.
12:18 SPEAKER_01 And uh if you understand if you have any idea of how this works, you can you can it can help you to work better with AI.
12:33 SPEAKER_01 And now let's get to an agent.
12:35 SPEAKER_01 What actually is an agent?
12:37 SPEAKER_01 Actually the an agent is just that that a lines of code is an agent or the of course when we are using an agent in uh that the there's more code to it.
12:52 SPEAKER_01 Uh the more but this alone is an agent by itself.
12:56 SPEAKER_01 It's just another level in a while that execute tools, just this.
13:03 SPEAKER_01 So it's uh it's not very difficult to to write one, but it takes uh art and uh expertise and practice and actually you you have to test a lot to make the best agent possible.
13:26 SPEAKER_01 Um another thing that I still in the in the quote unquote basic basics is the better lesson from Sutton.
13:38 SPEAKER_01 So is uh AI messer is uh he has a book on reinforcement learning and he wrote this bitter lesson which is a very short paper but it very interesting but in in short it says that we uh we don't know how we think that's it so we don't know exactly if we try to make lots of um if we try to arrange the best way we possibly in the thing in how we think it's going to be out competed by a general method that just draws a lot of data there.
14:31 SPEAKER_01 So one thing that should be learned from the bitter lessons the great power of general purpose methods or methods that continue to scale with increased computation basically the transformer now that we use today for LMs and agents are simpler than the ones before and they work better and usually a simpler and more general method with more data more computing works better and uh this is for building LMs but also we can apply this on the other way around if we are using LLMs if we have higher structure if you start to tell it to do this this and that and do like this and and if you try to constrain it too much it's better for a less intelligent AI but sometimes it's worse for more intelligent AI.
15:31 SPEAKER_01 Usually you get if you're if you're uh working with a highly capable AI if you give a little bit less structure um it performs better.
15:42 SPEAKER_01 This is called also hardness engineering
15:45 SPEAKER_01 When we are using a CLI, for example, cloud code is called the harness, it's an AI harness.
15:52 SPEAKER_01 And how you structure this is called harness engineering.
15:55 SPEAKER_01 So when we are using more capable AI, more intelligent AI, usually having a little bit less structure, could work better, so you need to understand and kind of feel how it works.
16:14 SPEAKER_01 And uh there's for those who doesn't know, there's a slash commands is just a prompt.
16:20 SPEAKER_01 MCP is kind of just a prompt with calling the API.
16:24 SPEAKER_01 Skills also everything is kind of prompts, like the commands you you invoke directly from the CLI.
16:31 SPEAKER_01 MCP you connect to an NCP server and kind of execute the APIs, and you need to be thoughtful about this as well because it could get context mark because it some KFCPs are not well designed and inject so much into your context that it becomes unusable.
16:51 SPEAKER_01 Skills are more interesting way that there are folders with with those with instructions of for your AI to do stuff.
17:03 SPEAKER_01 And agents in D is just a file that uh the your AI reads every time when it starts to run in your in a specific order.
17:16 SPEAKER_01 Another thing is quantifying human AI synergy.
17:20 SPEAKER_01 This paper is very interesting because it could measure that AI performance of games that it called synergy depends strongly on the human's ability to understand that natural AI.
17:33 SPEAKER_01 So the what it's talking about here is that even if you're very good, a very good technical person, you uh you could be you could not be so good using AI because if you don't know how to collaborate with peers, so so kind of then you... you have to apply the the paper actually uses this word theory of mind, you have to have a theory of mind of the mind of AI to understand and kind of treat it like a co-worker.
18:05 SPEAKER_01 So usually people that works really well, like for example uh Peter Seinberg uh that I showed before, it it had it was a CEO of a company, it there were it was a technical founder that he was needing lots of engineers, so he he is really good to understand how to to manage other people and and that's one possibility of why he's so good using it.
18:36 SPEAKER_01 So yeah.
18:38 SPEAKER_01 Let's go to some some some some some some some some some some some the workflows to how how how how people are actually using um this agent engineering stuff.
18:51 SPEAKER_01 The most simple case uh uh I I use it a lot already, it's uh it's the plan feature.
19:02 SPEAKER_01 You lots of lots of uh AI is already implementing this, if you use cursor, if you use vault code, there's a plan code that you it doesn't write any code, it just plan stuff.
19:16 SPEAKER_01 Uh sometimes it writes a markdown file and um there there's lots of different ways to do it, but uh the thing is usually performs better, uh AI lots of time performs better if you write the plan.
19:37 SPEAKER_01 Because if we but get back to the context window as
19:40 SPEAKER_01 Actually, the context window.
19:41 SPEAKER_01 As we are building stuff, the contact window gets more and more walled.
19:47 SPEAKER_01 And the AI might not attend to the specific parts of your cont.
19:57 SPEAKER_01 So it might forget in the end what you ask in the beginning, and it might lose focus.
20:04 SPEAKER_01 So when you when you have a plan, usually it can go back to the plan, you know, okay.
20:11 SPEAKER_01 I did the first one, now let's read again.
20:14 SPEAKER_01 Okay, now let's do the second one.
20:17 SPEAKER_01 And it steers the model a little bit better.
20:22 SPEAKER_01 And also if the plan is too big, sometimes we need to break into some phases.
20:28 SPEAKER_01 A plan is also good because making this plan is also interesting because you can create a plan and read the plan.
20:38 SPEAKER_01 So instead of like do this and wait for the model, like uh create this app, make me a thousand dollars, make me a million dollars, make no mistakes, you... you actually see what the the model is planning to do, and you can change, can open the plan, change it.
20:57 SPEAKER_01 Uh on the execution phase I usually ask you to do GDD.
21:00 SPEAKER_01 Works really well for me because it well, you know, the DD works.
21:06 SPEAKER_01 And uh and in the end there's the code review part, which is really useful uh a lot for me as well.
21:14 SPEAKER_01 Uh it compares your code with the plan and see if if if if it executed the plan and can find some bugs, some mesh cases.
21:23 SPEAKER_01 Um human layer guys, the the guys from human layer, they they created something else.
21:32 SPEAKER_01 It's not that different, but uh they call frequent intentional compaction, and the thing is that they see that. using your plan there, it's good, but it works better if you are using uh a new code base, but if you have a no code phase with lots of millions of lines of code, it it wouldn't work so well, so they add another part which is the research phase.
21:59 SPEAKER_01 So the research part is like you talk to a mobile to say oh I'm playing to do this, this and this, research, and uh to try to understand how the system works, where should I uh uh what what what are the parts that I should change, what is the style, what's the format.
22:17 SPEAKER_01 Um don't try to bug hunting and stuff like this.
22:21 SPEAKER_01 So it it's uh it's the the idea is the same, but then the thing is it's a frequent intentional compaction.
22:28 SPEAKER_01 You're compacting this um you're you're compacting the the your contact.
22:37 SPEAKER_01 So instead of putting everything inside your context window, you research, make a compaction, intentional compaction like a file with the the important things and then using this file you plan what you're going to do, you write again another file, and using using those now you implement now you write the code and trying to keep context under40% usually it's also baseline.
23:11 SPEAKER_01 Uh it seems not it seems not a lot, but uh after after going over40%, there's lots of times that they call it that the model goes into the dumb zone.
23:26 SPEAKER_01 The context is get so big that the model starts to confuse themselves.
23:31 SPEAKER_01 And if for example if the model says, Oh of course, you're absolutely correct.
23:36 SPEAKER_01 You're in the dumb zone. just like erase everything starting in the section.
23:42 SPEAKER_01 The guys from every build they did something that they call compound engineering.
23:49 SPEAKER_01 There's these four phases.
23:51 SPEAKER_01 Again, plan, word review, but there's the compound part the again that they kind of try to understand what follow went wrong, and kind to try to document everything.
24:05 SPEAKER_01 So the model in the next time the model is going to work, it already knows how where it fails, usually fails, and it won't make the failures again.
24:15 SPEAKER_01 There is of course all of this, there's a little bit more.
24:20 SPEAKER_01 There's the links there where you can uh go and everything's marked down, it's marked down files with scales, so you can just open and read it.
24:30 SPEAKER_01 You don't need to know a specific language for it.
24:33 SPEAKER_01 But yeah, there's there's uh lots of uh specific markdowns for like well craped the prompts to to do it as well, but the the idea is after after you find the mistakes that it had in the review and then you use the compound uh idea to now create some documentation or to be better the next time.
25:00 SPEAKER_01 Um there's also um something that people call spec driven development.
25:08 SPEAKER_01 Uh and the idea is that we keep the generated code and the letter prompt.
25:16 SPEAKER_01 It's like deleting your code and version controlling uh binaries.
25:23 SPEAKER_01 When we're working with AI, it's interesting for us to version the plans, the prompts, your specs.
25:36 SPEAKER_01 And there's there is as I as I said there is no one best practice yet.
25:43 SPEAKER_01 There's people are discovering how to work well with this.
25:48 SPEAKER_01 But um GitHub created the spec kit, which they basically use a file called the constitution, which is something that it's not intended to change, like this is the spec we want for the the we want the product to be this is the the what we want it to be, and then we... we specify some some some some some some some some some some some some some some some some some some some some some details of this spec and then we plan we create tests and then we execute the tests um there's other guy that used uh this create this VMAD methods which is also kind of uh spectrum development but they created some kind of agile role playing where you you invoke some some some some some some some some some some some some some some some some some some some some some some some some some some some some some some some agents they they have names and oh you are this you are that and but there is uh one thing that I find it interesting from BMAT is that it's uh it's a good flow because it's it you... you can ask and and they reply more naturally so you can you can literally have a conversation and in the end it writes a good spectrum and there's Rolf Rolf it's uh there was a guy that created this methods and it's basically the this while
27:34 SPEAKER_01 You create a prompt and you make a while and go through.
27:41 SPEAKER_01 Why why is that?
27:43 SPEAKER_01 Let's go to the show a little bit better what is this if we get back here to the GitHub spec, for example.
27:53 SPEAKER_01 You have you have the constitution, you specify your plan, and then you have a list of tests.
28:02 SPEAKER_01 The idea here is to execute one task at a time, you give it to him and say okay.
28:10 SPEAKER_01 Create the most important one, and that's it.
28:14 SPEAKER_01 So basically you when you have a specification in uh a good plan in a big list of tasks, you can just put it to him, go and get you row and say, okay, choose the most important one and do it.
28:33 SPEAKER_01 And then it goes one by one, and that's one way to do contact engineering, because if you're doing test by task, it's it's getting it's getting the plan, and it's uh getting the prompt, and creates a plan and execute in in a in a limited context window, and then when it goes again in the loop, or the next next text task, it has a new fresh session.
29:04 SPEAKER_01 And uh yeah, there's lots of people using this, and actually Peter Cyber, it says that it sucks.
29:15 SPEAKER_01 It's a it's very dumb.
29:17 SPEAKER_01 Um this is using Roth, using all of those, it's it's the they are made for people using Opus, and uh when you use context.
29:30 SPEAKER_01 And uh when you use codex, you don't need this.
29:33 SPEAKER_01 Um I I don't use locals that much.
29:36 SPEAKER_01 I use sometimes in cursor, but it's very different from using cloud code.
29:41 SPEAKER_01 So it can have different different um ways, but um he says that usually using codex is much better than any of this because it's a better model.
29:54 SPEAKER_01 So when gets back to the better lesson, yeah, he what he does is he just he doesn't even create a plan in talk to codex brainstorming features.
30:05 SPEAKER_01 So what do you think about this?
30:08 SPEAKER_01 Uh is there another way that you would do talking uh with the model, and then when he thinks it's okay, okay, now execute run.
30:21 SPEAKER_01 And uh he has he doesn't even use branches or work trees or whatever, he just uses lots of checkouts from in virtual main, but also like he's working alone most of the time, so it doesn't it's a little bit different working alone from working with with a team.
30:42 SPEAKER_01 And uh one thing that is also important is that uh like if you to be productive, like use your mode select the nature of Z skip permissions because it doesn't need to like ask every time there's ways oh but it's it's not secure and this kind of stuff.
31:06 SPEAKER_01 You can do it in uh SSH in a remote computer for example.
31:12 SPEAKER_01 Uh but one thing that uh really important is the closed loops to use uh your models to for the for them to work really really really well, you shouldn't close the loops.
31:27 SPEAKER_01 That means that uh the model generates a code and have generate the test and can test and see the output of the code or the test.
31:39 SPEAKER_01 So if the model can see the output of what it's generating, it can self-correct and try again and try again and try again until it works.
31:49 SPEAKER_01 And actually it works a lot.
31:55 SPEAKER_01 Much better when you do something like this.
31:58 SPEAKER_01 I could put it already to work like40 minutes in a row and build complete stuff in like just a single single prompt.
32:11 SPEAKER_01 And uh also is as I told you before, there's a he's uh he was a manager, he he is the creator of PSPDF kit, and uh so he had the company he was managing lots of people, so sometimes he he sees how models do it and he says okay I will do it a little bit different, but um he let it go and uh so the idea is a little bit to be more like an architect and less of a coder uh so has he has some control he reads the code but not like it doesn't need to read like the the boilerplate he reads the... the most important architecture part so in summary um the the the main ideas that I could be steal from from from from from from from from from from from how people are using today is like better models usually needs less structure complex windows are finite compaction is important also the times try to cut the noise with context engineering if if if you're building something then you're going to build another thing just create a new session, close the loop, try to get the model to see the output it created, and optimized for the agent.
33:38 SPEAKER_01 Not so much for the human maybe or maybe for both.
33:41 SPEAKER_03 But yeah, thank you.
33:58 SPEAKER_01 Well, depends uh uh I I'm not I'm not talking too much about safety.
34:04 SPEAKER_01 He was the guy talking about safety more I'm a little bit more about productive, being productive productive and effective.
34:12 SPEAKER_01 Of course, safety has plays a role about it, but uh... uh for sure if you have uh good uh... uh engineering skills already, you're going to be you will be more safe and uh and probably more effective as well.
34:31 SPEAKER_01 However, there's the synergy part that is important to understand how how your is working there, so it it uh it will help you to to write better code like this.
34:48 SPEAKER_03 And um yeah, I have a question two questions actually one of them is like regarding the usage of different models, what changes for example from solids to open it.
35:03 SPEAKER_03 There are two different models um I I cannot tell you exactly but like for example uh it's faster but less accurate open things also more topics nice though, but that's more useful general.
35:24 SPEAKER_03 Um changes like the name of these models.
35:34 SPEAKER_01 There are different several different architectures of the models, for example, there are models that are MOE MOE's mixture of experts.
35:44 SPEAKER_01 So instead of having uh kind of one in terms of going you having your input and going through everything and getting your output, this feature of experts has a router.
35:59 SPEAKER_01 So it's a bigger model in general, but the round to a smaller specialized model, so it's so usually need more data, more memory, but but the you can get your your your your can get faster results.
36:17 SPEAKER_01 So this is one kind of architecture.
36:21 SPEAKER_01 You can have like a different number of uh of parameters, you can have smaller models, bigger models, wider models.
36:36 SPEAKER_01 And also the training is very relevant as well.
36:39 SPEAKER_01 You can you can um um when you're training your basically optimizing model for for a sum objective so sometimes you can train one data set for one model and another data set for another model.
36:55 SPEAKER_01 Um there is it it's um every model there is uh is a little bit different, and also there's always randomness.
37:06 SPEAKER_01 So even if you get the same architecture twice and you train it they're going to be slightly different.
37:16 SPEAKER_01 So it's hard for me to say exactly any other questions regarding mathematics. like I can try yet but the LMs they are not very good mathematics in general but I want to know like how can I use an LM to make uh optimization with a function that is function like uh yeah in the function like is it possible to do it or they're just gonna be bad like optimizing function yeah um so why wh... wh why LMs are usually bad in mathematics they are bad in mathematics because they don't see numbers they see tokens and uh actually compute like tuples so they don't compute no they don't compute to they they they had they kind of more or less remember the logic so if you but since they are all models are really big right now they're really good at smaller mathematics if you're if you're like computing what is I don't know three hundred times four hundred they will get it right probably because it's not that difficult but if you're going uh having very big numbers or something more about like differential equations like yeah so so the thing is that um these AI companies they are trying to do this in uh there are already some AI that did some some some some some some some some some some mathematical discoveries uh install it's uh mathematical researcher in us using LLMs in his research as well and uh so so they are they are starting to be useful but you need to understand the pitfalls to the
39:33 SPEAKER_01 Like why you do it.
39:35 SPEAKER_01 Maybe maybe it could work, but maybe it would be better, for example, to use uh there are specific languages for um what's the name?
39:47 SPEAKER_01 It's uh um uh proofs or generating proofs, mathematical proofs.
39:55 SPEAKER_01 So there are languages for for generating mathematical proofs.
39:58 SPEAKER_01 So that's how the medical experts are using it uses the the language to write and then it works this is the loop while doing it.
40:12 SPEAKER_02 Yeah, yeah, the... the loop like generate the... the idea test the idea generate the... the proof and uh test the proof see it works or not and then generate another idea and then you have this whole that it can get somewhere I think should read it first in the house feelings and uh I I have like you should uh when your color and your enjoying coach and then like these huge tools appear you have to like change your way of working to maybe you more plan more and I struggle with like enjoying these like change where you're like holding less and reviewing and planning home more and do you have like any advice to strive like how to uh struggle with that?
41:15 SPEAKER_01 Uh it's hard for me because
41:18 SPEAKER_01 Because I I enjoy more the outcome than the coding itself.
41:24 SPEAKER_01 I I find it very amusing to get in the end and like it seeing the results and and I I don't um not that I don't find joyful to like work in the problems I I also find it but maybe if you think a little bit more on the on the architecture part I don't know if maybe you enjoy this part as well but uh the coding has not getting literally easier it's it's like the the the problem kind of shifts like writing code is faster it's easier but having the the architecture in your mind and planning how you're going to do the things it's become uh... uh more important and relevant one.
42:15 SPEAKER_01 Uh do you know the uh Richard um I forgot his name, the guy from Cathedral in the Bazaar What's the name of the guy?
42:31 SPEAKER_01 It's uh it's a famous programmer uh he wrote his book Category in the Bazaar.
42:36 SPEAKER_01 And uh I'm I'm um... um if you follow him on Twitter he's talking about this a little bit but he's kind of in the same team as me very brave um he he he he w w w w wrote recently like this this this this this this this this this this this week or the last week that uh I was even thinking about putting him here in the in the talk but I think it was a little bit out of scope but um he told that he had this same idea as you he thought that he was someone that he was a as a person his in identity was to write code like I like writing code this is what I like writing code.
43:17 SPEAKER_01 This is what I like to do in with LLMs.
43:19 SPEAKER_01 He's kind of having he's realizing right now that it's not actually what he likes.
43:27 SPEAKER_01 He likes like solving problems.
43:30 SPEAKER_01 And uh that's that's how he's he's doing how that's how he's approaching this and he says that he's having a blessed.
43:39 SPEAKER_01 Um I I I don't have like a a better better idea than this.
43:44 SPEAKER_01 I hope it helps a lot.
43:45 SPEAKER_01 Maybe maybe uh looking for him on Twitter and and asking him like searching where he talked about this and asking something, but he's he talks a lot on Twitter, so I think it's kind of approachable.
43:59 SPEAKER_01 Thank you very much.
44:01 SPEAKER_01 Um I will build a little baby question, uh which is basically like as a developer, right?
44:08 SPEAKER_01 We want to be up for game, like solving problems, like we care about the outcome, and again the outcome, but we have to be good at by coding right now.
44:18 SPEAKER_01 So the question is uh what kind of durable things should we learn uh for example, uh the specific techniques for context engineering, they change a lot.
44:32 SPEAKER_01 You know better models come out and the techniques change.
44:35 SPEAKER_01 You know, new techniques arrive, old techniques become irrelevant.
44:39 SPEAKER_01 So uh what things should we focus on so that we stay on top of our game, you know what skills are durable and which skills are just you know uh more something more of a tiny, you know, it's gonna be what should we focus on?
44:57 SPEAKER_01 Um it's hard to say hard to say uh I I believe that no one has the answer to this question.
45:07 SPEAKER_01 N not the precise answer, uh I believe that we should like play just like
45:15 SPEAKER_01 J just like uh old programmers, they they they they they they they they they they they played make games in their computers they they they were doing some side projects and trying stuff and like oh I wrote this in C I wonder how how would be writing this in Hobo or Trump or whatever they write in those days so I think the the first thing is that you should embrace this playful nature and uh try things the other the other thing is to for me what I do is that I um I follow like great people that share how they are working, how they what are they workflows, what they what is working for them, what is not working, what they're learning.
46:08 SPEAKER_01 There are some people they are they're they are genius and uh and and not only this they they they share what they talk about about what they're doing and uh they like they have plots.
46:23 SPEAKER_01 So I'm always like taking a look, seeing what they're doing, and sometimes I'm I ask them, I say I... I see the this guy line and there I just ask them like oh I'm trying I tried this like you said, but it didn't work.
46:39 SPEAKER_01 What do you think could be wrong and usually answer?
46:44 SPEAKER_01 So it's it's uh that you have to try a lot and and practice, and there's so many new things that are going to be reinvented, we need to like re-approach how how we're going to use AI should be to be more effective and like the the fundamentals are really important, but how how we're using this and uh like we're using the there are people now like
47:21 SPEAKER_01 The guy from Basecamp is keeping CI doing everything locally because it's it's faster and like there're so many things are are like changing and and moving so like let's I I I just like look at them and try to see what fits with my workflow, what I I was using a lot this is the first one that I told you and then I started like trying something a little bit more close to to the intercymer what what he does and and sometimes I go there and come back.
47:57 SPEAKER_01 It's it's complicated because it there's too many things, nobody nobody knows anything, everybody's trying new things.
48:04 SPEAKER_01 But yeah, we need to just try there there there's a guy in uh in that company every he like works the morning and on the afternoon he just try new stuff, just plays.
48:21 SPEAKER_03 You have to play.
48:27 SPEAKER_03 So when when when when when when when when when when when you show the basic agent there was this uh step where uh I think the tool usage, right?
48:38 SPEAKER_03 Yeah, so for this to work, we need to know how to implement that response of once tool, right?
48:47 SPEAKER_03 So how... how does could you explain more how does that work?
48:50 SPEAKER_03 And how... how can we train the models because I think one thing is like okay, here's like all the text in the internet or whatever, and then another thing is like okay, at what point do you do something that makes the model they should be using a tool and how do we invoke that tool and stuff like that?
49:11 SPEAKER_01 Yeah, uh so there's two steps for for two steps to answer the this question.
49:18 SPEAKER_01 First, how they train the model.
49:21 SPEAKER_01 There are several steps in training.
49:24 SPEAKER_01 There's uh the first step is called pre-training, where they get the all data from the internet and input to the model.
49:33 SPEAKER_01 There's uh unsupervised training, they are trying they are training just to uh uh predict the next token.
49:40 SPEAKER_01 The second step is called supervised fine tuning, which is they have a data set of inputs and outputs.
49:54 SPEAKER_01 And now when you give the input to the model, you expect an output to the model, and then you compare the the the output of the model with the output that you expect it, you get this loss difference, and then you have back propagation, and uh and there's usually a third step, which is called um reinforcement learning with human feedback, which uh lets and less humans are involved.
50:25 SPEAKER_01 But um it's it's also uh a little bit similar, but instead of using the supervised training, it uses reinforcement learning.
50:35 SPEAKER_01 You have uh there's lots of different ways, um but generally you in reinforcement learning you have the agent, you have the environment, and the environment provides to the agents the state and the reward and the agent know that agent here in reinforcement learning is different from agent there.
51:01 SPEAKER_01 It it's it's true, it's the same name, but we're talking about two concepts.
51:05 SPEAKER_01 So the the agent gets the state and the reward and uh it's an action in uh language models what we are using uh the agent is the LM itself, the state is the the prompt that it receives, and uh the action is the next token, and the reward we create another model to simulate the rewards.
51:33 SPEAKER_01 So we train uh weaker model, we we we train weaker uh smaller, a little bit smaller models to kind of uh predict the size of the rewards that we should give to the model.
51:51 SPEAKER_01 So we input this depending on the on the what the model puts to the reward model, and it says okay it was good, it was bad.
52:05 SPEAKER_01 So for example, let's say that um let's say that you ask a question um what is Bitcoin, and you run this question four different times, and you get four different answers, and you give to a person, and the person looks at this question what is bitcoin and selects okay this is the best one, is the second best one, this is the third best one, this is the fourth best one, and uses this ranking to train uh reward model.
52:40 SPEAKER_01 So it starts to rank the the answers of the other models to provide uh numerical reward.
52:51 SPEAKER_01 So it's uh it's basically this this was the first part that we sorry uh uh regarding the tool usage.
52:59 SPEAKER_01 Okay, the two uses.
53:00 SPEAKER_01 So the two usage is usually on the second and third part, and uh so so you can use the
53:06 SPEAKER_01 So you you already have a foundational model, which is the pre-tain pre-training stage, and now you find tune in with tests.
53:19 SPEAKER_01 We want it to do I don't know, search online for this.
53:24 SPEAKER_01 And then there's a technique called react, which is it reasons what it should do, and then it makes an action, and then it observes the action.
53:39 SPEAKER_01 So it reasons, okay, I'll do this, I'll do that and the other thing.
53:43 SPEAKER_01 And then it creates the the second part with which which is the action part, and it says oh I'm going to use this tool, and the tool is defined in the system prompt.
53:54 SPEAKER_01 And uh it chooses from the system prompt and say oh I'm going to use this tool with these arguments because of the plan that I did.
54:02 SPEAKER_01 And then we have we when we see that it finishes this act phase, we stop the model, we get the the code that it generated, we uh uh invoke the tool, we we we get the code, we... we get the result from the pool and pull it back and says okay this is the result of the tool that you just invoke it.
54:30 SPEAKER_01 What it's uh what do you see what do you think and uh and then he says okay um I didn't get um I'm going to play it again uh maybe looking for this other keyword and and then it goes until we it uh believes it's completed.
54:48 SPEAKER_03 So the the model is like trained with some token that is like bash on something, and then we just like look for the specific string like when the model outputs this token it means that it wants to involve the
55:08 SPEAKER_01 Yeah, yes, and uh basically this we... we says okay you have these list tool for example to list the full the files in your directory uh you and uh and way and and then there is a you can do it in several different ways like react it was the first the first way to do um... um uh uh when it gets to the action phase you you you can make for example a code block and when it pauses the code block you get the code and execute the code the code could be this list uh list uh files for example and then you... you use uh you put it into bash or something do this get uh the the results back and continue to look at the so I guess the specific question is like how do you distinguish between the model outputing uh specific comment versus the model one thing to execute the one do you want to it's uh the structure it's a structure or some tags and it's like okay why you see this we try to execute yeah yeah exactly could be for example a code block with three vectic uh up and down could be um could could be like a markdown header like oh this is um plan this is action and when when when when when when when when when when when we get to to the header observe we stop and and look what's inside of action and execute what's there and uh that's basically it any more questions well I hope you like it so