AWS and other hyperscalers sell commoditized compute to run applications and add value through other add-on services (eg DynamoDB or BigQuery). They’ve spent a lot of money (“capex”) to build out the underlying infrastructure and to write the core software, and now make more or less pure profit. In theory at least; in practice, the hyperscalers tend to reinvest profits towards increasing customer value further.
I see OpenAI and other large language model providers as fundamentally selling compute. Sure, some compute is better than other compute, but it’s compute. Instead of writing and executing code to scrape websites, for example, run the page through GPT - this API call offloads compute from your instances to OpenAI’s. Hint: languages like Go are super efficient at processing API calls. Customers can also offload compute from humans to a machine, perhaps to label data today, perhaps to process loan applications tomorrow (joking…). AGI, if we get that, is similarly just offloading compute to the model. OpenAI’s recent actions also indicate to me that they’re looking to play the cloud game and focusing on customer ROI.
The value of large language models is that they can do things which are not valuable. People do valuable things already, we wrote code to do the valuable things, but now LLM’s make it possible to automate the less valuable things too. It’s not economically feasible for me to constantly pair program with an expert Javascript engineer, but it is to have Copilot by my side. Constantly updating website scraping logic is really tedious and annoying, but a LLM can be flexible enough that it gets the right answer when layouts change, even with the same prompt.
In this world, LLM providers are in the business of maximizing the value of compute, both from their raw GPU resources and from what they offer to developers. Some do it by building huge foundation models on specialized hardware, some do it by trying to shrink models to run on cheaper hardware. There is a nice accounting relationship here, where OpenAI’s revenue and costs are both directly proportional to their GPU usage. So all the LLM companies are functionally in the business of maximizing the value of their GPU resources - much like how traditional hyperscalers are in the business of maximizing the value of their cloud resources.
Building your own models is then fundamentally the same as on-prem vs cloud, aka the build vs buy argument. I think the DHH argument against clouds is mostly sound - his company has predictable and stable enough usage/growth (and low enough margins) that they can afford to host their own servers. If you have an LLM-based product that’s also stable and predictable, I think it would make sense to run your own model. On-device coding actually sounds like a viable example, the general product pattern seems mostly straightforward and stable, with lots of training data, and impressive results with very small models. But otherwise, how many people can really claim they are certain enough in their product to not use GPT?
Long term, a hyperscaler maximizes the value of their resources by maximizing customer value. Mathematically, think of that as number of customers x (value TO CUSTOMERS - spend per customer)
– if OpenAI drives tons of value for their customers, the customers will be happy to pay more. Don’t make me tap the sign:
OpenAI is starting to move more explicitly to developer value add, with their functions call implementation. This implementation is great, it makes working with structured data much easier, and reduces developer costs by making the services more reliable and needing fewer formatting imperatives / retry loops to get exactly the right format.
I view this as essentially developer tooling that helps developers get more value out of the raw compute. The more value you get out of the OpenAI platform, the more likely you’ll stay on their platform, even if GPT-4 level capabilities get commoditized.
Commoditization
Pure conjecture – suppose OpenAI suspects that other startups are catching up to their level of compute quality, or that they have reached a ceiling on their own quality. How can they continue to drive more value for their customers, if that level of compute is commoditized?
One way is by offering better tooling. This is the cloud playbook - build a strong core product, get customers and distribution, upsell additional products around that initial wedge. Another way is by dropping pricing – in the words of one successful hyperscaling CEO: “your margin is my opportunity.” OpenAI has constantly dropped prices while remaining best in breed; why would you even remotely consider using Cohere or Anthropic when they’re 10x more expensive than ChatGPT? Notably, Sam Altman has explicitly said that OpenAI’s goal is to drive down “the cost of intelligence.”
There’s a lot of VC-funded startups now trying to do “LLMops,” or more broadly, trying to help get more value out of your API calls. I see some of this being synergistic with OpenAI, but it could also serve OpenAI well to give away virtually all of that value, if it prevents customers from using other LLM providers that don’t offer the same tooling. Best in breed companies can definitely still exist. Take Snowflake and Datadog, which beat strong incumbent cloud offerings, but there is probably not a lot of room left for the also-rans, a red ocean if I’ve ever seen one.
Would it make sense for OpenAI to offer a hosted embedding store? Certainly at some scale, right? Pinecone, for example, might be better than any opensource offering or what OpenAI might be able to reasonably ship. But a good-enough implementation that captures 80% of use cases would be compelling enough, and probably capture most of the net new business. If I were trying to sell shovels, I would be seriously considering if it makes sense to sell shovels if there’s only one mine in town.
Competition
We’ve established that OpenAI is renting compute and is offering tooling to help customers better use that compute. “Does it help you get more value from the LLM” - should be basically the north star for every LLM product manager (spoken as a former Ads ML product manager, who was supposed to optimize for customer ROI but mostly optimized for revenue anyways).
I would love if a competitor went all in on the dev experience and tried to play some offense, instead of letting OpenAI define the developer market. Here’s an example: one use case I have is generating a reply and then immediately getting an embedding for that reply, then indexing it (in a list, not a vector store). A better experience is offering an option to do the two calls together - generate the reply, and immediately start embedding the response. This saves a network roundtrip and could even be cheaper for the model to infer, as you could cache the query results / tokenization and save compute. Maybe this usecase is too niche to be valuable, but maybe it’s not.
OpenAI’s advantage here is that they have developers using their platform, and while they might not be training on the customer data, they are definitely looking at how people use the platform and learning how they can make the developer experience better. I think this is a problem for Anthropic, that they are gating API access and thus missing out on how people would actually use their platform. I’m sure this is not intentional, that they would love to expand more, but are gated on GPU’s.
My sense is that OpenAI is essentially looking at the most common or lowest effort LLM tooling and just building that in-house. I think this is a lesson from ChatGPT and StableDiffusion and how many low effort copies ran around the app store, that they just should not exist. It’s a brand risk to have crappy ChatGPT apps around, it’s a brand risk to have people raise at $50M post for roughly 40 lines of code, if those are adjacent to your platform. I heard a rumor that ChatGPT is enabling file uploads natively for chat (not for code interpreter) — going after the ‘chat with PDF powered by GPT’ clones.
For a while I’ve thought about running a plugin store, essentially hosting a platform for chat interfaces to access plugins, and we would manage API keys / auth / brand safety in a secure and neutral way. But this business only makes sense if there are competing models and chat frontends. A world in which OpenAI is the only winner is one that sucks a ton of capital out of the rest of the environment.
For any competitor, the framing needs to be: how do we maximize compute value for our end users? Could Replicate or Mosaic build a similar value-add experience with function calls on top of open source models and offer that as a hosted service? Could GGML or Replit efficiently utilize local compute resources to offer comparable experiences to Copilot? Will multi-modal models win out, by delivering multiple modalities full of value?
To be honest, my main takeaway is to go all-in Nvidia stock. The key to the entire market is maximizing compute value-add, Nvidia is the biggest key ingredient to that right now. If you believe in the opensource on-device world, then Apple, with its M1 chips is the prohibitive leader and could also somehow sell even more computers and phones.
But really, I am not certain what to do with this paradigm. Maybe it isn’t actually a new idea, maybe I’m dead wrong. The DHH article has a great quote:
The cloud is sold as computing on demand, which sounds futuristic and cool, and very much not like something as mundane as "renting computers", even though that's mostly what it is.
Cross-apply that to “renting compute” instead of “artificial general intelligence on demand.”
But I think it’s a useful lens to look at. Is Langchain the Terraform of LLMs? Is Terraform the Terraform of LLMs? What were the lessons from the cloud wars that we should know?