Despite what the culinary industrial-complex wants you to believe, you only need two knives: the biggest one and the smallest one. You use the biggest one unless you can't, in which case you use the smallest one. I have come to think of AI as a big knife, it does a lot of things well enough, but you still need the small knife of human thought.
(source)
Approaches such as "baby AGI" use only a big knife. In fairness, you can cook a lot of things with one big knife, and while results might be edible, they're not _that_ good. Sometimes you get good results straight from the LLM, but its output is stochastic, and who knows what the next result will be. Self-correction also only goes so far, you see bots get stuck in an endless loop of trying the same things (or even of trying to spawn a new LLM agent to fix the problem). I've found Flux quite interesting for visualizing the different trees that the response can go down, but point is that reliability of output is still challenging.
Simple math: imagine that you have an LLM that returns a good answer 90% of the time, but 10% chance it returns a bad result that taints all subsequent results, and you start over. Say you have a complex task that needs 10 iterations. How many times will you need to run the task for it to succeed, in expectation?
As complexity of a system increases, our ability to understand it decreases. This is especially true of LLM's, which heavily on context window to identify relevant pieces of information. The more complex and end-to-end a demo is, the more attention it gets on Twitter - eg "I shipped a Next app through GitHub and Vercel via just my voice". That's fair, but it's also just more and more swings with a big knife.
One world I do see is one where there are fewer big apps, e.g. who needs a customized Salesforce implementation when AI can custom build a company-specific CRM for you? So maybe each app is easier to maintain, there's just _less_ code to maintain as you can make company specific assumptions and ignore other assumptions you don't need. I don't know if that's the end state though, companies grow and shit happens and things change - and software will need to adapt too.
Our god is the god of speed
I've been writing a lot of code with GPT and Copilot. What I've found is that it's great for 'relatively' simple things, both because the logic and resulting code is simpler, but also because there is more training data for 'quickstart with XYZ framework.' Trying some relatively simple ML tasks in poorly documented libraries does not do well, for example (without documentation of the code). It also does quite poorly doing complex ML / scientific logic (e.g. write custom optimized operator for this ML operation).
My general workflow is to use GPT-4 and ask it for an outline / scaffold of what I'm trying to do. This helps avoid a ton of boilerplate and makes the task much more enjoyable - blank canvases are hard to cold start from (according to a probably un-replicable psychology study I read in college). Instead of `src/documents/[id].tsx`
you are basically starting with a filled in template, and you can jump straight to the fun stuff, instead of remembering if `import React from react`
has denaturing brackets or not.
My learning rate has also dramatically increased. My professional front-end / JavaScript experience is basically zero, but I've learned enough to put together fairly complex apps like Prestige together in less than two days, complete with users, real-time AI chat integration, and properly animated chat messsages. Instant feedback is incredibly helpful: when I get an error, I ask ChatGPT and fix it, and the explanation helps me remember not to make the mistake again. Infinite prep (more to come here) applies as well. It's like having an L5 engineer who will answer all your questions, instantly, on your schedule. I'd rather have a highly experienced human around to help me not only execute but to help me build my mental model for why things should work, but this is magnitudes cheaper and easier.
I also really resonate with Simon Willison, that ChatGPT makes him more ambitious in his projects. I definitely feel that way too, not just for learning new languages, but taking on more complexity. If you have a fixed time budget, and you can use AI to both solve simple stuff/boilerplates (big knife) as well as help you work through complex stuff (small knife), then you can take on more of the complex stuff.
One thing I'd love is if GPT-4 would create PR's for each of your issues, overnight. GPT-4 is quite slow for interactbility (I use that time to go get some water), but I could just write some issues that I want to tackle and review the PR's in the morning. Kinda pricy today, but what if GPT-4 token costs drop 90%?
Our hearts are the battle drums
So it's easy for me to say that 'big knife approaches' such as AutoGPT/BabyAGI or 'Salesforce for Dentists but with GPT4' are uninteresting, so what is interesting?
A lot of the first applications of AI can be seen as extensions of 'refinement culture' - that we must make things 'perfect'. Students revising course papers, ESL speakers fixing grammar, or changing voice accents - these are fitting existing behaviors to a particular Western norm (particularly American, at least until guys start adding English accents to their Tinder audio lines). The fact that the internet is mostly in English (or at least the internet accessible to big AI labs) accelerates that cultural dominance.
But what does 'perfect' actually mean, is it *actually* the milquetoast RLHF'd upper-middle class American style? I doubt it. A running trend of this blog is that every status marker is simply a countersignal, that you do not have to 'fit in,' that you can forego perfection, that you do not need a ludicrously capricious bag. When the entire world speaks American English, what will the elite speak? Will they bring back the mid-Atlantic accent?
What I find interesting about AI is that it allows us to create and consume imperfect and unique goods. Reducing friction of creation has always increased the variety of content, and AI is simply that on steroids. Reduced costs mean that content needs to be appeal to fewer people to be worth its cost of creation.
So one thing I've been hacking on is an absurd meditation app. (Perhaps ‘personalized’ would be a better term)? Are these meditations weird? You bet! Will 2 to 4 million people like all of them? Almost certainly not! Who cares!
The generation is entirely AI-driven end-to-end -- from scripts, to subtitles, to audio, to images. Human discretion mostly comes in via choosing good image; I might be using Stable Diffusion wrong but it is quite tricky to create a good prompt and choose the final image.
Sing, of delight drink deep!
I'm overall super super invigorated by the possibilities here, every day there's a new problem and a new solution and it's just so fun. We have big knives and small knives, but more importantly, we’re constantly sharpening them all. Always want to hear counterarguments too. Hit me up!