From the Lex Fridman podcast with the Cursor team when they discuss The Future of Programming
I would say… Let’s just revisit this thread in 6 months and see where we are,
From the Lex Fridman podcast with the Cursor team when they discuss The Future of Programming
I would say… Let’s just revisit this thread in 6 months and see where we are,
why have an API key set up like this instead of using dotenv?
This is an example of what I do with plain language logic in just a standard chat gpt4o instance.
This is o1 preview in session trained . Just some copy paste logic I made.
My programmed model it knows its function at hello.
That was just a quick single page app. There’s no node.js server so the only thing you have to work with is local storage. The key gets stored locally so it’s actually safe…
If this was something I ended to share or publish I’d obviously go the node.js route. This app is personalized to me so why do I need to complicate it with a server an everything? local storage is fine.
oh it’s not shared, I see! thanks for the clarification
there are a lot of scams recently where they ask for a key and that results in it getting leaked to third parties, even some that the mods have taken down, hence my questioning
Great point… I probably should have the UI point out that the key is only saved locally.
I should point out that OpenAIs realtime console sample does the same thing. It prompts you for a key and then stores it in local storage.
welp, if you are going to have someone put their keys on something, you most likely want to have them to create a .env
having a text box that asks for a key, even if it just saves locally, always raises suspicion
one alternative, is to host it as a freemium and provide the key.
another is to use an open source LLM, such as ollama, that way there is no api key involved (and also no cost associated to it)
Point taken… this was a quick sample that if you ever were to actually run it would be very clear that your key isn’t going anywhere. There’s no web server involved. It’s a static file called “invoice-builder.html”. I could put the key in the .env file but the browsers not going to let me read it.
If you’re worried about the web page posting it to some other service then you shouldn’t load that page. Putting the key in a .env file isn’t going to magically prevent the code that’s using the key from doing something evil.
ah, I see, so I guess you are using a in line javascript script in the html file to send a request to the api
Yes… just fetch() there are zero 3rd party scripts even being used.
Your points are valid. If it wasn’t obvious to you then it likely won’t be obvious to others. I should have better described what was happening. My bad.
The actual generated code wasn’t the point of the post. It was the fact that I simply described what I wanted the program to do in pseudocode and then I asked the model to convert that pseudocode to html. What I got out was a fully functional app that I actually used to generate and print an invoice.
your conversation programming language is awesome man, hope you get to release it
sorry to get a bit too much into the security of it all, it’s some what important, so hopefully its somewhat valueable
so, to get back to the topic, do you plan on expanding it? or perhaps releasing it?
No security is important so please don’t think the feedback isn’t appreciated. It’s secure but I can see how just looking at the screenshot you might question things…
There’s not really anything to release because it already works in every LLM powered chat experience. This is simply a prompting technique if anything. If you’re using tools like cursor, v0, ChatGPT+Canvas, Claude Artifacts, etc. to generate code, then you’re already doing this. You’re just (likely) using those tools without any real structure to how you prompt them for code. Convo adds that structure.
As for expanding it… That’s what I’m currently doing… I’ve started a Sections folder where I’m building up collections of reusable Convo parts. Sections are the equivalent of modules in most programming languages. If you’re building a UI and want a sidebar that opens & closes you can just add the Sidebar Layout section to your prompt and the code that comes back should have a fully functional sidebar expressed in whatever language/framework you’re working with. Again, there’s no magic that I’m doing here. The LLM just knows how to do this. You should be able to take that section and append it to your v0 prompt and it should add a collapsible sidebar to your v0 component.
Here’s a more explicit example that uses a section which all OpenAI LLM Support:
If you’re using python you can send the model a prompt like:
Create a .py file for making model calls:
Section: LLM Support
Use OpenAI for LLM support:
- Add a configuration setting for specifying the API key (hidden). Link for creating keys is https://platform.openai.com/api-keys
- Add a configuration setting for specifying the chat completion model:
- chatgpt-4o-latest (default)
- gpt-4o-mini
- o1-preview
- o1-mini
- Note that o1-preview and o1-mini don't support "system" messages or settings like "temperature".
You’ll get back a python class like this:
# Supported models
SUPPORTED_MODELS = {
"chatgpt-4o-latest",
"gpt-4o-mini",
"o1-preview",
"o1-mini",
}
# Models that do not support "system" messages or "temperature"
LIMITED_MODELS = {"o1-preview", "o1-mini"}
def __init__(
self,
api_key: Optional[str] = None,
model: Optional[str] = None,
):
"""
Initializes the LLMSupport instance.
Args:
api_key (Optional[str]): OpenAI API key. If not provided, it will be read from
the `OPENAI_API_KEY` environment variable.
model (Optional[str]): The model to use for chat completions. If not provided,
it will be read from the `OPENAI_MODEL` environment variable.
Defaults to "chatgpt-4o-latest".
Raises:
ValueError: If the API key is not provided or the model is unsupported.
"""
self.api_key = api_key or os.getenv("OPENAI_API_KEY")
if not self.api_key:
raise ValueError(
"OpenAI API key not provided. Please set the OPENAI_API_KEY environment variable "
"or pass it as a parameter."
)
openai.api_key = self.api_key
self.model = model or os.getenv("OPENAI_MODEL", "chatgpt-4o-latest")
if self.model not in self.SUPPORTED_MODELS:
raise ValueError(
f"Unsupported model '{self.model}'. Supported models are: "
f"{', '.join(self.SUPPORTED_MODELS)}"
)
def get_chat_response(
self,
messages: List[Dict[str, str]],
temperature: Optional[float] = None,
max_tokens: Optional[int] = None,
**kwargs
) -> str:
"""
Generates a chat completion using the specified model.
Args:
messages (List[Dict[str, str]]): A list of messages for the chat completion.
temperature (Optional[float]): Sampling temperature. Ignored for limited models.
max_tokens (Optional[int]): The maximum number of tokens to generate.
**kwargs: Additional arguments to pass to the OpenAI API.
Returns:
str: The generated response from the model.
Raises:
openai.error.OpenAIError: If an error occurs during the API call.
"""
# Adjust parameters based on the model's capabilities
if self.model in self.LIMITED_MODELS:
# Remove system messages
messages = [
message for message in messages if message.get("role") != "system"
]
# Remove temperature
temperature = None
try:
response = openai.ChatCompletion.create(
model=self.model,
messages=messages,
temperature=temperature,
max_tokens=max_tokens,
**kwargs
)
# Extract the assistant's reply
return response.choices[0].message["content"].strip()
except openai.error.OpenAIError as e:
# Handle OpenAI API errors
raise e
You could ask for that same component as JavaScript, Rust, C#, C++, etc… And you will get the appropriate code back. Again nothing special I’m doing other then I’ve worked out the exact chunk of text that will give you a fully functional component add to your project.
If nothing else just think of Convo as a library of snippets that can add well defined behaviors and components to any coding project.
The thing Im trying to work out is can you compose these chunks of pseudocode code together to build whole applications like you can compose modules/functions in a more traditional programming language.
The answer is yes but there are some best practices so I’m just trying to work out what those best practices are.
I’ll share more details after I have this flushed out a bit more but using Convo I created a more capable version of swarm tonight… I started about an hour ago and have a working client and server without having to write a single line of code. In fact the client & server would have been able to talk to each other on the first shot but I didn’t specify the websocket port to use so it coded the server to port 3000 and the client to port 8080. I had it re-gen the client code to use port 3000 and it just connected and worked. Both sides written in pseudocode…
Ok it’s not the full implementation of swarm yet but it’s a transport system capable of rich multi-agent communication. So what features does it have?
My goal with this is to model every interaction that humans might do while performing tasks. All without having to write a single line of real code because why not…
For a bit more background… I designed this framework, this framework, and this framework plus a half dozen more not published to the web. All which do some variation of this. This will be like my 20th pass at such a framework which probably means that if I haven’t gotten it right yet, I’m probably not going to
I’ll share more when I have something more concrete working…
UPDATE:
I’m working on a central agent directory service that agents can register themselves with and lookup other agents. They’ll use natural language for both tasks…
That sets up the ultimate in flexibility. For example, an agent will be able to register itself for a particular task but then specify that it only works M-F from 9am - 5pm PST.
If anyone would like to discuss how identity in a decentralized agent system could work I’d love to chat. I’ve been exploring a PGP based system with some Proof of Work and key revocation extensions that GPT recommended.
You funny man I made the type of version of that last night too lol
Inspired by this topic, here’s brief outline of a book that I am contemplating on writing (well with LLMs) generated by my brief description
Title: Bridging the Gap: Enhancing Human-LLM Communication Through Pseudocode Refinement
Table of Contents
Introduction
1.1 The New Era of Human-AI Collaboration
1.2 Challenges in Communicating with Large Language Models (LLMs)
1.3 The Role of Pseudocode in Natural Language Communication
1.4 Objectives and Structure of This Book
Part I: Understanding the Dynamics
Chapter 2: Large Language Models Explained
2.1 What Are LLMs and How Do They Work?
2.2 Tokenization and the Explosion of Possibilities
2.3 Strengths and Limitations of LLMs in Code Generation
2.4 Reflecting on AI Interpretation of Human Instructions
Chapter 3: The Nature of Pseudocode in Natural Language
3.1 Defining Plain Natural Language Pseudocode
3.2 Differences Between Formal Code and Pseudocode
3.3 Common Pitfalls in Pseudocode Communication with LLMs
3.4 Case Studies: Misinterpretations and Misalignments
Part II: Enhancing Communication Through Detail
Chapter 4: Identifying Missing Pieces in Pseudocode
4.1 Material vs. Nitty-Gritty Details: Understanding the Spectrum
4.2 Techniques for Spotting Omissions
4.3 The Impact of Missing Information on LLM Outputs
4.4 Examples of Critical Missing Elements
Chapter 5: Strategies for Effective Detailing
5.1 Prioritizing Information: What’s Essential?
5.2 Balancing Brevity and Clarity
5.3 Using Hierarchical Structuring in Pseudocode
5.4 Incorporating Contextual Clues for LLMs
Chapter 6: Collaborative Refinement with LLMs
6.1 Interactive Prompting Techniques
6.2 Asking the Right Questions to Elicit Details
6.3 Utilizing LLM Feedback to Improve Pseudocode
6.4 Iterative Refinement Processes
Part III: Bridging Gaps Before Code Generation
Chapter 7: The Explosion of Token Space and How to Manage It
7.1 Understanding Token Variability in LLM Responses
7.2 Controlling Output through Constrained Inputs
7.3 Examples of Managing Token Diversity
Chapter 8: Stitching Missing Pieces Effectively
8.1 Techniques for Material Additions
8.2 Addressing Nitty-Gritty Details
8.3 When to Let LLMs Fill in the Gaps
8.4 Avoiding Over-Specification
Chapter 9: Refining Pseudocode for Robust Interpretation
9.1 Standardizing Language and Terminology
9.2 Clarifying Ambiguous Instructions
9.3 Using Examples and Analogies
9.4 Testing Pseudocode with Multiple LLMs
Part IV: Practical Applications and Case Studies
Chapter 10: Real-World Examples
10.1 Case Study: Building a Simple Algorithm
10.2 Case Study: Complex Systems and LLM Collaboration
10.3 Debugging Pseudocode with LLM Assistance
10.4 Success Stories in Human-LLM Partnerships
Chapter 11: Tools and Resources
11.1 Platforms for Experimenting with LLMs
11.2 Pseudocode Editors and Linters
11.3 Communities and Forums for Collaboration
11.4 Further Reading and Advanced Topics
Conclusion
Chapter 12: The Future of Human-LLM Interaction
12.1 Emerging Trends in AI Communication
12.2 Ethical Considerations and Best Practices
12.3 Continuous Learning and Adaptation
12.4 Final Thoughts and Encouragement
Appendices
A.1 Glossary of Terms
A.2 Sample Pseudocode Templates
A.3 Reference Guides for LLM Prompts
A.4 Bibliography
Can’t wait to read it one thing I’d add is that it’s not just about using pseudocode to “instruct” machine based agents. We already use it to instruct other humans. “Go clean your room” is just a pseudocode based instruction that starts a room cleaning program in another human.
I saw that in your screenshot… Microsoft asked me to explore creating a new multi agent framework and convo was just something that emerged as part of that exploration. The fact Swarm was just released is an interesting coincidence. I dug into swarm and there’s not really anything overly groundbreaking in the ideas. The Routines idea I figured out in the AlphaWave Agents framework I built over a year ago. I called them Scripts but it’s the same step based instructions. AlphaWave does their limited form of in process handoff as well.
True. In the introduction below, I think I should nuance that a little more.
Introduction
1.1 The New Era of Human-AI Collaboration
The rapid advancement of artificial intelligence has ushered in a new era where humans and machines collaborate more closely than ever before. Large Language Models (LLMs), such as GPT-4, have become powerful tools capable of understanding and generating human-like text. They assist in drafting emails, writing code, creating art, and even offering recommendations. This symbiotic relationship holds tremendous potential, but it also presents unique challenges that stem from the fundamental differences in how humans and AI interpret and generate language.
1.2 Challenges in Communicating with Large Language Models
While LLMs are adept at processing vast amounts of information and generating coherent responses, they rely heavily on the input they receive. Ambiguities, omissions, or imprecise instructions from users can lead to outputs that deviate from the intended goals. Unlike humans, who can infer context and fill in gaps based on shared experiences or intuition, LLMs interpret language based on learned patterns from their training data. This difference often leads to an “explosion of token space,” where the AI explores numerous possible interpretations, sometimes resulting in undesirable or unexpected outcomes.
1.3 The Role of Pseudocode in Natural Language Communication
Pseudocode serves as a bridge between human logic and machine execution. It allows us to express algorithms and processes in a way that is abstract enough to be easily understood by humans yet structured enough to be translated into actual code. When communicating with LLMs, using pseudocode in plain natural language can enhance clarity. However, the lack of precision and the omission of critical details can hinder the AI’s ability to generate accurate and functional code.
1.4 Objectives and Structure of This Book
This book aims to explore the intricate dynamics of human-LLM communication, particularly focusing on how we can refine our use of pseudocode to improve the collaborative process. By identifying common pitfalls and providing strategies to enhance the clarity and completeness of our instructions, we can harness the full potential of LLMs in code generation and other complex tasks.
The book is structured into four parts:
By the end of this book, readers will have a comprehensive understanding of how to communicate more effectively with LLMs, leading to more accurate outputs and a more fruitful human-AI partnership.
This probably isn’t directly related to your book but I thought I’d point out that we want to start capturing all of these pseudocode->code example pairs so that we can fine tune better coding models. All of the current generation of coding models are trained just off the code half of the equation. They’re missing the intent which is captured in the pseudocode.
If you start capturing the intent->code mappings I suspect that not only will you end up with a model that’s SoTA at coding tasks, but it will likely be SoTA at reasoning because the two tasks are closely intertwined.
I’ve been working on some pretty cool waiting stuff for new consoles about multi agent memory grabs that are editable of through a dragon drop Mewtwo effective dec isions in the future from the past