Hi, I want to use LLaVA but with very long prompt over 5000 token.
I’m now using GPTs with long prompt and it works very well.
So I want to replace GPT4 with LLaVA, open source VLM model however, I cannot find the way I give long prompt
I would consider researching LangChain. Their framework currently leads on this front.
The “retrieval” mechanism is likely RAG (retrieval augmented generation), something that has existed well before Assistants. Meaning, there’s plenty of good documentation out there to get you started
Then, it would be a matter of picking your own DB to store the vector embeddings and going from there.
The OpenAI Cookbook has some good examples with different databases. Supabase might be a good one to start maybe?
This forum is also filled with people asking RAG questions, so you may be able to search around topics here in the forum for more particular questions as well.