Long-horizon autonomous coding

georaven7 · October 26, 2025, 8:42pm

Hi Community!
There are so many cool coding tools and agents out there, but unfortunately I can’t leverage them for my projects because I need an autonomous coding agent that can reliably generate production-ready C++ code (at least, this is the promise). So some time ago, I decided to start an educational (and fun!) side project to build something myself.

std::rave is a long-horizon AI programmer that, given a concise project description and test cases, can synthesize complex algorithms.

Imagine a system that can work for a day or two and synthesize tens of thousands of lines of working code. Here is one use case

gist.github.com

https://gist.github.com/georvn7/ccfc0e791e0c6500c85590f929c35a8b

simple_c_compiler.stdrave.cpp

// std::rave - a personal educational project.
// Does not represent a commercial product, prototype, or concept!
// 
// Created by George Raven, © 2025
// @georvn7

// This file has been auto-generated by a machine learning models for educational purposes only.
// Use with caution – the compiled code may contain defects, security vulnerabilities, or other issues.
// Do not integrate this file into production or sensitive code bases without thorough review and testing!

This file has been truncated. show original

I’m currently working on two things:

A benchmark for long-horizon coding tasks (similar to the one in the link)
A system to synthetically and fully autonomously generate data to reinforce the models

BTW, GPT-5 and the GPT-OSS family of models shine in std::rave.

Anyone else into long-horizon autonomous coding?

lucid.dev · October 26, 2025, 9:35pm

Yes. Very interested in this. Here is a demonstration of a working system.. https://www.youtube.com/watch?v=a7Z6o3qQ0oU&t=1s

It’s specifically designed for long-running tasks with minimization of token usage and maximization of included documentation, code, etc. in the context window (i.e. a “floating” or “moving” context window).

This is in python but obv. the LLM could write in any language and test the compiling, etc.

georaven7 · October 26, 2025, 9:42pm

lucid.dev, looks solid! Can you share what kind of mechanisms the agent utilizes to enforce syntactical correctness and to debug when the things go wrong?

lucid.dev · October 27, 2025, 12:32am

Syntax and debug are inherent in a multi-turn system with developer instructions and “floating” context window in terms of operation of the system as described below:

A) User sets initial conditions:

Set of codebase documents (including new documents created on previous assistant response)
Set of instructions to LLM for using system (create planning documents, implementation documents, this is how you execute terminal commands, this is how you modify code, etc.)
Other settings relevant to managing context window truncation and automation

B) User prompts LLM with above data and receives response

C) LLM response contains sets of instructions picked up by the middleware upon receiving response, and “modifies the state of the system” (i.e. creates/modifies docs, executes commands and captures output/logs, etc.)

D) System then modifies context window as necessary with data depending on users settings, and automatically re-prompts the LLM to continue the task

E) After indefinite iteration, either user or LLM (through instruction block) tells system task is complete and halts automation.

In regard to syntax/debug specifically:

Anytime a code change occurs from the LLM a linter is run on a per-file or per-project basis. Thus any integrated compiler or method that can lint or do debug/compile runs on the code provides the necessary work, and the output is simply stored in the “code block change” data (i.e. as a “result” of the LLM’s last code change). OR if the LLM executes an actual custom “terminal command” to use existing utilities etc. to attempt to test compile a file, start a server, or use CLI interaction with the codebase in question.
The linter/compile output or output from the terminal is then collated and included in the automatic re-prompt data that is used to call the LLM again with the newly modified context window.
Thus the LLM iterates indefinitely until it gets it right, or fails and chooses to halt to await user review.
1. User can of course manually share data being obtained concurrently through an IDE or “push it” to the “conversation” (thread) so that it’s provided to the LLM either during the next re-prompt (if in automation) or manually the conversation is paused

georaven7 · October 27, 2025, 5:02am

Got it!

By design std::rave only builds complex algorithms with C++ and STL (it can’t build web site for example) BUT must have full autonomy, there are many domains where the autonomy is not optional (games, robotics etc). As such, EVERY instruction from the prompt is heavily verified, there is quite machinery to support that, bespoke development environment: linter, debugger, tracer, logger, build system, and more lean tools designed around two core principles:

High-throughput interaction: Keeps pace with LLMs that process information orders of magnitude faster.
Contextual precision: Delivers high-quality, relevant information/feedback to the models at the optimal moment for both high-level overviews and, when necessary, granular details

Topic		Replies	Views
AI That Can Truly Learn and Retain My Codebase Community gpt-4 , chatgpt	4	2637	May 4, 2025
Any AI Tools for Software Project Refactoring exist? Prompting gpt-4 , chatgpt , prompt-engineering	9	4948	January 17, 2025
How do you get GPT4 to generate code for larger scale problems? Prompting gpt-4	7	6946	December 17, 2023
PoC: AI that builds it's own source code Community	4	2427	August 2, 2022
Comparing GPT-5 Pro and earlier LLMs for developers Codex chatgpt , gpt-5	13	7985	March 13, 2026

Long-horizon autonomous coding

Related topics