Codepilot: GitHub Copilot on Steroids

What you actually want to do (and I’m not doing this yet) is you want to show the model a little bit of relevant code but then you want to show the model as much of the architecture (symbol information) as possible.

@jochenschultz and I had a discussion about approaches that could achieve this. I also had a great call with an engineer on the GitHub Copilot team Friday and they’re thinking the same way. GitHub already extracts the symbol information from everyone’s code so they just need to work out which symbols to show the model.

2 Likes

If I was to compare Codepilot with GH Copilot, I would say that GitHub is attempting to tackle this problem at an unearthly scale. They’re seeking a general solution for how to reason over 115TB of code and do it in a way that’s cost effective. Lofty but admirable goals, and I have no problem working with them to help them achieve those goals…

Codepilot, by contrast, is per/repo, has a significantly larger context window, and has the ability watch your code (coming soon) so it can update its index before you even commit your changes. But that added freshness comes at a cost. Codepilot needs to borrow your PC for most of its compute and you the developer need to pay for all indexing and inference costs.

Both have a place in my mind.

2 Likes

It’s interesting because what you almost want is to present the model with the equivalent of TypeScripts type definition files to show it the overall structure of the codebase.

Wheels turning…

I think I’m going to update Codepilot to create 2 indexes. One for code and a separate one for the type information. I suspect that this will dramatically improve the quality of the code that Codepilot generates.

Another long weekend I see…

Looking forward to trying this, since its almost word for word how ive marketed my plugin on here :stuck_out_tongue:

But for real, how cool is it that we can build something that we can use to make itself better?

I have a version of this that runs on ChatGPT and id LOVE your feedback, since it seems there are VERY few of us on here who have built this particular kind of thing. Its called Recombinant AI™ .

1 Like

Viktor,

Ive been meaning to message you, but it applies here too! I think we’re over thinking it a little…at least for development on ChatGPT.

The model can reason a TON without needing to see everything. What Ive done that has filled a bunch of these gaps is just to use a cloud file that the user creates.

Then you have some instructions on your endpoints that tell ChatGPT to make a task list and update at regular intervals, as well as create a summary of the codebase. Then you have an endpoint thats only purpose is to loop through the process until it hits its context window.

@stevenic The difference that giving the model a set of examples makes is wild. I just added some prompt chains that pulls a relevant cheat sheet, and then dynamically creates file with more project specific instructions. I can only imagine that indexing all that would make it stupid smart.

First test run:
https://chat.openai.com/share/bff1a848-642b-4a4c-8932-69df42782880

1 Like