I cant get chatgpt (or any llm) to actually contribute beyond examples

Just sharing some thoughts/frustrations here and hoping to see some feedback or get other peoples experiences, and hopefully a solution at some point.

I have noticed that llms are extremely laissez-faire about what they do and it prevents them from actually being useful assistants. Let me explain. While I do believe the algorithmic and code producing capabilities of these tools are likely better than at least 80% of people, they have this problem where they dont seem to grasp the idea that we’re not necessarily using them for fun or for learning; i believe most people are trying to achieve some real world benefit from using them.

An example is that, I have a large project i am building. I can easily develop a practical architecture and development plan with an llm. If i want an algorithm or a specific function created or bug fixed, i can do that. However,
the big problem is that after defining all the specifics of a large project and actually trying to build it.

I have given llms extremely detailed mockups of my architecture, database schema, project flows, actions, services, etc. Then when I instruct it that we want to actually build real world code (which it is capable of), it always does something like here is an example of how you might achieve this, even though i informed it that this is supposed to actually work for my project concept. Or, it will do something like create a class then say //your code here

Its pretty frustrating because even after giving it pages of detailed documentation and project specs, i have to basically yet at it to make it understand this isnt just some tutorial or something I am doing for fun, i am actually here because i want to build something, even if it will take a long time and a lot of steps. Then the bigger issue comes when finally getting it to produce production code, the chat gets too long and freezes out and i have to do the same instruction again.

Has anyone had a similar experience? Any tips?

Thanks

I’ve definitely had a similar experience, though I’ve been exclusively using from API (completions, not responses) for a while now.

My experience is this:

  1. You have to actually know what the context window contains.
  • If your not using API and using a chat interface, it can be hard to tell what’s “actually being passed” as the context window is limited, and sometimes data you THINK is being passed is in fact being truncated, especially in long conversations with multiple documents.
  1. You do have to pass explicit instructions like “don’t use example or placeholders or psuedocode”. Which it sounds like your already doing.

  2. Depending on what your asking, it’s important to also recognize the “output” limit. For example, depending on the model you at max will get between 5k - 10k tokens of actual output. So if the response would require more than that, then you are going to get a sort of truncated output.

  • I’ve gotten around this in the past by first instructing the model to BLUEPRINT the changes, and “make a plan” to do multiple code sprints, each addressing a set of changes, and then I set up a prompt-response series where it “moves through a checklist” to produce 1 or 2 code docs at a time.
  1. My largest success parameter is at most getting the model to produce 1000-1300 lines of working code for a single doc, or to produce several 100-200 line documents or code snippets/diff files. I think it does do more than this sometimes, especially o1 and o3-mini models, but that’s about what I shoot for as a “really good” result.

  2. Definitely model plays a role. I used 4o models for coding in the past but now exclusively use o3-mini or o1 models for coding.

If you want to discuss specifics any further, let me know. I’ve ended up designing an entire system around document management, document sharing, context window management, etc. (through API in a custom home-built interface) in order to successfully manage these processes specifically and achieve good coding-assistant results using the models I mentioned above.

I’ve actually reached the stage where I’m looping in an “orchestrator” and “agentic” functionality so that then the system can simply start with goals, documents, write access, etc. and proceed with creating and testing sets of code for various purposes. I’m getting close to a fully auto-poietic system in the in-house environment, but in the meantime I’ve got a lot of experience now building the systems that allow effective coding to occur across large document sets and integrated code base (I’m working with around 100 code files (not counting libraries/modules/etc.) probably totaling around 50k-60k lines of code. I don’t ever share it “all at once” (because that would be beyond context window limits) but do have a system whether either myself, or the model (by reviewing README files for the various levels of the system) can choose/request the relevant documents related to a given issue or development trajectory and thereby proceed with review, analysis, and implementation rather seamlessly.