I am trying to figure out the best route to be able to load a long text document (think a 60 page lease or medical paper). Then i want to ask questions about the text. Is this fine tuning? Seems like fine tuning would only work if i had sample responses.
Seems every scenario i try runs out of tokens.
No, finetuning is the worst option for this.
Please use the search, this question gets asked ad-nauseum. (there are even free tools to do this)
So search embeddings? When i get results from embedding it is a vector. How do i interpret the results?
I would like to query (for instance) “please summarize the above document in bullet points”.
It does this very well with short docs in the playground and also chatgpt but can’t seem to figure out how to get larger bodies of text into it.
I have a pipeline written down that creates embeddings for subdocuments, uses semantic search to find the relevant subdocument, and then uses
text-davinci-003 to rephrase the subdocument for a specific audience. Does that seem to be helpful for your use-case?
Could be. Would love to see what u put together…
I have replied in a private chat message with details of a workflow you can use.
We have also resolved the breaking down of documents for fine tuning, hallucination issue, and providing accurate citations, and video/html in responses. It is a mix of embedding and fine-tuning.
We are not quite ready for a public announcement yet and are proving limited BETA access on a case-by-case basis (Depends on use case)
I will share more to the community in coming days.
Did you try chatdochub.com ? it seems that there is no limitation regarding their PDF size. Try it out.