Background: Coming from a low-code / no-code perspective, through prompt engineering I would say LLMs can make anyone better at pretty much “anything” using roleplay prompting with the exception of code. Is this bug or feature?
Chances are that any problem or task (with the exception of things like surgery or flying a plane) can be done better/quicker/faster with the inclusion of an LLM. One area where so far this does not seem to be the case is code - even though apparently LLMs should be fairly good at it.
Dev types will release something “no-code” or “low-code” that is effectively a waste of time because of the learning curve required to use them (think BotPress - no hate, they mean well). It looks like GPTs are in part an attempt to work around this however GPTs are limited to their training or training+RAG and for large pieces of information, 20 turn RAG costs can be eye-watering.
A different work-around would be to run an embedding which brings me to the product documentation (lets say OpenAI and BotPress as examples. Again, no hate, they are simply my most urgent needs).
My Workflow.
- Go to the OpenAI or BotPress & scrape the documentation
- embed it
- get LLM to write code as a Q&A exercise grounded in the documentation rather than LLM output generation
Simple right?
Well nope. 3 hours later and I have zero. Scripts to scrape OpenAI’s documentation didn’t work. I eventually opted to copy and paste into MS Word and go from there but even that didn’t work as intended. Part of the copy and paste saga led me to https://platform.openai.com/docs/tutorials/web-qa-embeddings.
What I don’t understand is that assuming the scripts here actually work, why do GPT4, GPT3.5 and Claude all fail to either
a) provide working script derived from Documentation (which should conceptually be part of model training data)
b) just say…dude… refer to URL
This is not a training data cut off date issue as seen in a comparison between GPT3.5 and the most recent GPT4 preview (see attached image)
This is a serious issue for the no-code/low-code crowd because this seems to happen with every code related task I have tried or used. I stopped working with Langchain becasue at the time GPT couldnt give me a coherent explanation of what a ‘blob’ is. In the end I resorted to copy pasting Langchain libraries into .py files just to get things to work and eventually stopped tinkering with Langchain altogether.
In conclusion, the question remains: why do LLMs seem to be blind to product documentation, and is this limitation intentional? If so, what is the reasoning behind it?