Would like to seek for your advice on how to optimise the tokens for the API calls.
I am using a local vector DB to store the PDF contents, and loading the similar content in the embeddings while calling the ChatGPT API. However, I realised there is a token limits on the questions + embedding + response.
I would wonder how you guys optimise the knowledge base in the API to improve the precision of the contents.
Don’t forget chat history. Not needed for all applications, I guess.
Prompt layout frameworks like Promptrix can help quite a bit
I suspect many people just roll their own.
That sounds like a bottleneck to build a good OpenAI system.
Perhaps one way could be to identify the type of prompts that cause errors due to too large messages from the token limits being surpassed, and then try to revise the prompt to fetch results that will be slightly lower than that limit, in addition to upgrading your models to the newer ones that can support 32k tokens. This worked for me where I had no control of the database, but realize it might not be applicable in your situation but thought to share just in case.
Thank you sir.
I am using the turbo-32k now. ：）