Hi Community!
There are so many cool coding tools and agents out there, but unfortunately I can’t leverage them for my projects because I need an autonomous coding agent that can reliably generate production-ready C++ code (at least, this is the promise). So some time ago, I decided to start an educational (and fun!) side project to build something myself.
std::rave is a long-horizon AI programmer that, given a concise project description and test cases, can synthesize complex algorithms.
Imagine a system that can work for a day or two and synthesize tens of thousands of lines of working code. Here is one use case
I’m currently working on two things:
A benchmark for long-horizon coding tasks (similar to the one in the link)
A system to synthetically and fully autonomously generate data to reinforce the models
BTW, GPT-5 and the GPT-OSS family of models shine in std::rave.
It’s specifically designed for long-running tasks with minimization of token usage and maximization of included documentation, code, etc. in the context window (i.e. a “floating” or “moving” context window).
This is in python but obv. the LLM could write in any language and test the compiling, etc.
lucid.dev, looks solid! Can you share what kind of mechanisms the agent utilizes to enforce syntactical correctness and to debug when the things go wrong?
Syntax and debug are inherent in a multi-turn system with developer instructions and “floating” context window in terms of operation of the system as described below:
A) User sets initial conditions:
Set of codebase documents (including new documents created on previous assistant response)
Set of instructions to LLM for using system (create planning documents, implementation documents, this is how you execute terminal commands, this is how you modify code, etc.)
Other settings relevant to managing context window truncation and automation
B) User prompts LLM with above data and receives response
C) LLM response contains sets of instructions picked up by the middleware upon receiving response, and “modifies the state of the system” (i.e. creates/modifies docs, executes commands and captures output/logs, etc.)
D) System then modifies context window as necessary with data depending on users settings, and automatically re-prompts the LLM to continue the task
E) After indefinite iteration, either user or LLM (through instruction block) tells system task is complete and halts automation.
In regard to syntax/debug specifically:
Anytime a code change occurs from the LLM a linter is run on a per-file or per-project basis. Thus any integrated compiler or method that can lint or do debug/compile runs on the code provides the necessary work, and the output is simply stored in the “code block change” data (i.e. as a “result” of the LLM’s last code change). OR if the LLM executes an actual custom “terminal command” to use existing utilities etc. to attempt to test compile a file, start a server, or use CLI interaction with the codebase in question.
The linter/compile output or output from the terminal is then collated and included in the automatic re-prompt data that is used to call the LLM again with the newly modified context window.
Thus the LLM iterates indefinitely until it gets it right, or fails and chooses to halt to await user review.
User can of course manually share data being obtained concurrently through an IDE or “push it” to the “conversation” (thread) so that it’s provided to the LLM either during the next re-prompt (if in automation) or manually the conversation is paused
By design std::rave only builds complex algorithms with C++ and STL (it can’t build web site for example) BUT must have full autonomy, there are many domains where the autonomy is not optional (games, robotics etc). As such, EVERY instruction from the prompt is heavily verified, there is quite machinery to support that, bespoke development environment: linter, debugger, tracer, logger, build system, and more lean tools designed around two core principles:
High-throughput interaction: Keeps pace with LLMs that process information orders of magnitude faster.
Contextual precision: Delivers high-quality, relevant information/feedback to the models at the optimal moment for both high-level overviews and, when necessary, granular details