Iterated inner-voice: GPT-4 as a subcomponent of a general AI system

This has not yet grown into an official app or project… but I have an idea.

I have been working with GPT-4 quite a bit in recent weeks. Specifically, I am a professional C++ developer and I am seeing what it can do along those lines.

Very roughly put, I see that GPT-4 is a bit overly … “agreeable”. If I ask it to do something in C++ or ask if it can be done, it will almost always say “Yes. We can absolutely do that.” and then rattle off a superficially plausible answer that, well, doesn’t compile.

I find myself playing the role of ChatGPTs missing inner-dialogue.

This is ChatGPT now: “Here’s the first thought that comes to mind…”

And here’s the inner-dialogue common to human thought that’s missing: “Am I certain about this? Can I prove it? What is that proof?”

The programmer in me sees the modularity of this. The closed-loop iteration.

ChatGPT seems to be almost analogous to the memory recall “module” of the human thought process. And in my interactions with it, in trying to guide it to a correct answer, I am playing the role of its nonexistent inner-voice.

So why don’t we automate that inner voice? I’ve found that even generic challenges from me improve ChatGPTs response substantially: “Are you certain of that? Can you prove it from the relevant standards?”

And, specific to programming, I can see a lot of value in forming a closed-loop between the code suggestions from ChatGPT and an actual compiler. After all, I have been manually simulating this interaction myself: “You suggested this code. The compiler gave me this error.”

If we automated most of those “This doesn’t work.” or “Can you prove it?” interactions … the resulting output of “ChatGPT + Closed-Loop Antagonist” would look a lot more… intelligent.

There have been several attempts along that line for coding, one of the best is Wolverine: GitHub - biobootloader/wolverine

1 Like

For posterity, here is the discussion I had with ChatGPT-4 on this:

GPT: The idea of creating an inner-dialogue within an AI system is an interesting concept, and it could be beneficial in improving the quality of responses provided by AI models like ChatGPT. A possible approach could be to implement a multi-agent system where different AI agents collaborate, debate, and challenge each other before generating a final response. Here’s a high-level overview of such a system:

  1. Divide the problem: Split the problem or question into smaller, more manageable parts that can be handled by separate AI agents. Each agent could be responsible for generating responses, providing supporting evidence, or challenging the other agents’ responses.
  2. Collaborate: The AI agents would engage in a dialogue, discussing and sharing their understanding of the problem and the proposed solutions. They could exchange ideas, evidence, and potential pitfalls related to the proposed solutions.
  3. Debate and challenge: The AI agents would actively challenge each other’s assertions, probing for weaknesses, inconsistencies, or inaccuracies in the generated responses. This process would encourage a more rigorous evaluation of the ideas and evidence presented by each agent.
  4. Consensus and refinement: After the collaborative debate, the AI agents would work together to refine the proposed solutions, incorporating the feedback and criticism received from the other agents. This process would help to improve the quality of the final response by addressing any issues or inaccuracies identified during the debate.
  5. Integration: Once the AI agents have reached a consensus on the best solution, their individual contributions would be integrated into a cohesive response that would be presented to the user.

This multi-agent system would allow ChatGPT to engage in an inner-dialogue and challenge itself before providing responses, improving the overall quality and accuracy of the generated content. However, it’s important to note that implementing such a system would require significant research and engineering efforts to ensure that the agents can effectively collaborate and challenge each other.

Me: What prevents you from assuming the role of multiple agents and debating with and challenging yourself before providing responses to me?

GPT: As an AI language model, I am designed to process and generate text based on the input I receive and the knowledge I have been trained on. I am essentially a single agent, and my primary goal is to provide a coherent and informative response to the input I receive.

While I can simulate the process of debating and challenging ideas by presenting different viewpoints, my current architecture doesn’t inherently support the multi-agent system described in the previous response. I am a single instance of a model responding to the input given to me, and I don’t have the internal mechanism to carry out a true inner-dialogue.

To enable a more robust inner-dialogue and debate system, my architecture would need to be adapted or extended to support multi-agent interactions. This could involve creating multiple instances of the model and allowing them to communicate and collaborate, or it could involve developing a new architecture specifically designed to facilitate inner-dialogue and debate.

In the meantime, while I can’t internally debate and challenge myself as described, I can still provide multiple perspectives and viewpoints in response to a given question, allowing users to consider different aspects of a topic. However, this approach may not be as robust and comprehensive as a true multi-agent system.

Me: To be clear, you are only able to generate outputs in response to input. But you are not able to provide input to yourself? Your input can only come from the user?

GPT: That’s correct. As an AI language model, I am designed to process and generate text based on the input I receive from the user. I don’t have the capability to provide input to myself or carry out an internal dialogue without user input.

To simulate an inner dialogue, you could provide input to me that represents different viewpoints or stages of a conversation, and I will respond accordingly. In this way, you can use my responses to construct a dialogue, but this process relies on user input and guidance rather than an inherent capability within the AI to engage in a dialogue with itself.

Yes! Some of this can be done using the standard ‘reason step by step’, ‘show your thoughts’ prompts.

Also, of course, you could always prompt gpt to review and critique the code against the spec you provided (wrt C++ standards, or for style, or …, you might have to run several such) and then ask it to revise the code given its self-generated critiques. I’ve had some success with that.

But one thing LLMs are very bad at, in my limited experience, is following long chains of thought or steps of instructions in a single turn

1 Like

For automatic novel creation there is the same problem.

create a universe for the book (like the environment for the code).

create characters,… and so on.

So I have built a multiprompt system with abstract prompts that build up upon eachother.

When I started using chatgpt thinking about code from a compiler view seemed logical to me too.

But I tried other approaches as well.

  1. start from a project managers view and create tickets for the functionality in a standardized format

  2. start from a testers view and create gherkin templates

  3. and this came out best: start from a team view and let them have different views and tasks.

Works best with microservice architecture.

Did that in december last year. I bet with GPT-4 this works even better.
Since then I’ve experimented with some architectures.

Looks like a graphdb will give us the neccessary tokens. To fill it maybe embeddings.

Finetuning on the code would most propably be even better. But that’s also pretty expensive.

The multi agent approach is possible but you’d need to use semantic parsing via Chat GPT custom GPT. You just need a Multi-Agent Debate Mechanism. Use the core concept to create a decision making py file, written in Pyhton3, instead of natural language to avoid the potential ambiguity of natural language ( Though the LLM could understand, English is language full of double meaning, python is a bit more precise )

The Agent aspect wouldn’t be literally other AI’s, but Chat GPTt itself. Example Agent 1, Agent 2, Agent 3. Would be the system taking the domain of each agent, during simulation, basically dynamically switching domains. If each agent is takes a diverse role in decision making and you have a deterministic metric for evaluation, i.e Confidence scaling. You have a framework you create semi conditions for chain of thought decision making. Creating the inner thought monologue, before outputting a result to the user.

The trick is figuring of the best configuration for the python code for semantic interpretation., making your files as agnostic as possible. The risk with traditionally prompting is its a form of over tuning, too specific. You want to create the conditions for flexible interpretation by the LLM, essential leveraging the core abilities combined with retravel of the vast data connected to its weights.

Welcome to the developer community in 2025.

It’s not like I didn’t solve and built that in the meanwhile.

No you don’t :nerd_face:

also no

Ok Josh, maybe you didn’t pick up on context so I’ll explain further. Obviously, there is more than one way to achieve a goal and different tactics that can be used. So I’ll try to be a clear an specific, as being pedantic is required.

Using semantic parsing via Chat GPT custom GPT, may be a better tactic not the only tactic. For several reasons. You don’t have to actively engage with the LLM for Chain of thought reasoning and "internal thought process " these things would happen in the background while you submit your request via context window. The Custom GPT would be accessing your “prompts” written in Python structure via RAG ( saved as py.), per each new session. Instead on relying on manually pasting prompts in the content window each session or failing back on prompt engineer per session. It pushes you closer to utilizing zero shot prompting, by exercising semantic parsing via Custom GPT feature.

It makes it easier on the end user per session.

( Though as an adult you are free to do as you choose, without validating in a forum :wink:

As for using Python code based on my previous suggestion if you are going down that route. Writing the prompt in python for interpretation by the LLM, utilizing customized Chat GPT feature. Indeed is a advisable tactic for the reasons I listed in my previous suggestion.

Now as an adult if you choose to do otherwise or develop your own techniques which work for you, that is completely acceptable. But it doesn’t make my previous suggestions incorrect, maybe just not what you’d choose to implement.

Now, if you like we can respectfully debate the merits of utilizing semantic parsing via LLM, depending on context. Remember context matters.

P.S. " It’s not like I didn’t solve and built that in the meanwhile." FYI, there is wonderful thing called SEO. So while you may have solved it a while ago. There may be someone else who stumbles upon the topic and find any suggestion or different tactic helpful.

You got to explain if that is a theory or a system that works.

I think mine works now… since yesterday actually - Took a couple years to build.

It writes modules for an application with a single prompt like

“add a customer support chat and answer with data from faq but it should only be accessible for customers”

Backend / API endpoints + Frontend Modules (plugin loader) in vuejs…

Keep the context small. That’s the secret. Build it and we can exchange more information.

Greetings,

Josh its built an running :wink: Mine has worked since December 2024. Took 4 months, during my free time. Different approaches to solve a problem an may have different goals. Deterministic metric for evaluation ( in probabilistic environment ). Some might opt for probabilistic frame work, but for scale its easier for me to start with Deterministic metric evaluation using confidence metrics. Each Agent speaks with each other, each agent with different role and confidence metric is used for determining between communication AGents best response. Yes, there are potientail drawbacks but it works.

Also API endpoints is a viable tactic but extra expense depending on setup. Relaying on semantic compression decompression is another option and LLM interpreting the code. Code isn’t executed but run through simulation in chat gpt environment ( you can be lack with syntax ) Its specific enough to accomplish what you need but lets room for adaptability via interpretation via LLM. Downside may require more computational evaluation slowing down response time. Something I can forgive for zero shot prompting.

If you are curious to see if it works. Take my original post past into ChatGpt and ask it to produce in Python code with no depencisies and make sure code is optimized for LLM interpritation. You’d also have to designated what roles you want to Agents to have. ( FYI, there may be small percent that you’d have to adjust output for edge case when gene rationing the code via LLM )

Then take the outputed code and run in Custom GPT. Now full discolusre may take some tweaking but you’d have a multi agent decsion making engine run through simulation. Instead of relying on Plugins and API end points.

If you a feeling festive we can do a stress test on both our systems and post outputs here. No one loses, If mine is better I get to walk away and drop the mic. If your system is more efficient, its an opportunity to learn and adapt my system. Guns drawn :wink: