Hi!
I’m increasingly using the chat completion API to sequentially process a larger amount of data (say, a couple of ten thousand tokens) for tasks that are easily described to a LLM that has some background knowledge, but would be much harder to actually automate in code - so I’m automating these using the chat completion API and, e.g., gpt-4o-mini. But when the amount of data becomes larger, I’ve noticed several times that the output of the LLM degrades into a loop after a while, presumably because the LLM looses track, or just aborts. Did you try similar things and find some tricks to improve that?
For example, this list of OSGI configurations is generated automatically with my AI code generation pipeline from annotations in all Java files containing annotations @ObjectClassDefinition and @AttributeDefinition like this. Now, it’d be nice if you just could concatenate all java files with those annotations, and run a prompt that generates those tables. But what often happens for me is that it either aborts, or degenerates into a loop.
There are obvious workarounds:
- Process the files one by one and combine the output. That takes more time and raises the token count considerably, as prompts and background knowledge will be repeated. On the bright side, if you create temporary files it’s easy to omit files that already have been processed and only process new / changed files (as aigenpipeline implements).
- Process, say, 10 files at a time, which is somewhat cumbersome to implement and loses some advantages of the first approach.
But I wonder whether there are some prompt engineering type tricks I could pull instead, to process all files with one request. Does anybody have interesting suggestions / experiences there?
Thanks so much!