Brad Lightcap (OpenAI COO) has a quick test for this: “Ask the company whether a 100x improvement in the model is something they are excited about”. (source)
The question really stuck with me, as it appears simple, but is not easy to answer and more broadly applies to all “build” investment decisions in AI (e.g., also for corporate use case decisions). Answering the question is tricky, as one is at risk of either being super generic or having to predict specific future LLM/AI capabilities.
Below are my first thoughts on when I would be excited, depending on what the key value contribution of my product/offering would be. I would love any form of challenge and additional thoughts.
When would I be excited?
- Key value driver: Multi-purpose and multi-tool automation
- Rationale: With growing capabilities, LLM-based systems can better integrate with numerous tools (e.g., ERP/CRM-system, code execution, databases, etc.) and use them adaptively across multiple required steps. This reduces efforts compared to precisely defining a specific set of instructions for each possible action/tool a LLM can use (e.g., code for extracting specific sales data from CRM).
- Example: Complex customer service request requiring interactions with multiple systems and adapting to customer input
- Key value driver: Using LLMs on proprietary or highly curated data
- Rationale: Value comes from providing access to data or corresponding insights, which are not readily accessible, even in better models. Yet, improved model quality will help to more precisely understand user intent and provide the corresponding output.
- Example: Answering HR and other admin questions based on changing comp. policies
- Key value driver: Deep workflow and or UI integration
- Rationale: If the solution provides value by integrating access to AI seamlessly into the workday of users, better models will only further improve the experience
- Example: Collaboration across multiple researchers on a scientific paper and allowing different ways to interact with the AI (e.g., from chat to highlighting text and selecting an action)
- Key value driver: Multi-step reasoning/planning/action
- Rationale: Using AI-based systems in more complex situations often requires multiple LLM-based steps, which suffer dramatically in case of faulty LLM behavior at any step, given cascading dependency. Even minor reliability advancements of LLMs can substantially improve this non-linear risk (e.g., 2% vs. 5% error rate in 5 dependent steps leads to 90% vs. 77% chance of successful completion)
- Example: AI-based market research, requiring problem framing, data acquisition, analysis, interpretation, and communication
When would I not be excited (unless see exception)?
- Key value driver: Domain-specific models (based on public data)
- Rationale: While current models may lack accuracy in niche domains, this can improve substantially with future models, reducing the need for specific domain models (incl. language and jargon specificity)
- Exception: Need for high efficiency/size (e.g., mobile version) or requiring fundamentally different architecture (e.g., biological/chemical model)
- Key value driver: Finding and providing/visualizing the right data
- Rationale: With growing context windows and ability to utilize specific data from the context (reduced needle in the haystack issue) a growing set of situations might be solvable with LLMs + data pipeline instead of more sophisticated RAG-based solutions (retrieval augmented generation). This still requires access-rights-dependent data ingestion, yet limits the need for searching, junking etc. The visualization of data could also be further commoditized by out-of-the-box functionalities (e.g., the LLM creating the visualization code by itself or integrating with a respective library with little effort)
- Exception: Need for high efficiency (cost scaling with context length).
- Key value driver: Guardrails constrained output generation
- Rationale: Current models often require numerous guardrails and mechanisms to ensure that the content follows certain rules (e.g., not including any medical information). This often leads to complex, partially rule-based structures. With increased controllability and reasoning capabilities, future models might more easily and reliably follow rules.
- Exception: When deterministic rules are required to ensure the highest level of safety or compliance with emerging regulatory requirements, yet the total system complexity would likely still be reduced
- Key value driver: Memory-based personalization
- Rationale: With growing context length, declining cost, and increased text understanding, as well as emerging context functionalities in solutions like ChatGPT, personalization based on previous input of a user can be increasingly commoditized
- Exception: Main value coming from providing additional context outside of the boundaries of the application (e.g., device data, CRM-data etc.)
For now, several of the points are highly linked to where their value contribution comes from establishing workarounds to LLM shortfalls, that hopefully disappear in the future. While I am quite torn on a few of the statements, I think it is a good starting point for discussion and subsequent refinement. On top, when thinking about buy vs. build use cases in corporates, there are many further factors to consider, which are not addressed here (e.g., competition & differentiation, data ownership, learning effect, existing architecture & systems, …).
Hear more related statements from Brad Lightcap and Sam Altman on the topic: 20VC Video