Guys I wanna build a custom Q&A platform. I want GPT3 to answer users’ questions ONLY from books that I feed to it. I know teaching or fine-tuning happens with providing JSON file that has example prompts/results. In my case I’m confused how to do it cause I don’t have any specific prompts. I just want GPT3 to use The Books as a DB to answer from. If it can’t find an answer it should reply with ‘sorry I don’t know’. Can anybody point me to the right direction?
You should probably consider using embeddings and not fine-tuning for this type of application.
Cheers!
This tutorial is exactly what you need: OpenAI API. Author uses web scraper to get content, but you can skip this part.
The SEO Pub made a video which can help you get a starting point.
In the video, he basically tells ChatGPT about his idea of a tool and then asks it to write the code for him.
He uses this template: “I want to code a tool for a webpage that will allow visitors to _____________.
Can you write the code for this?”
I used it to make ChatGPT develop so simple tools like a password generator and so.
In your case, you can try this out: “I want to code an app that will allow visitors to ask questions and the OpenAI API would answer them.
Can you write the code for this?”
I never tried to develop an app with ChatGPT, it would be interesting to see the outcome.
Keep use updated.
Good luck.
I recomend starting with docs, then using ChatGPT to answer additional questions and discover new possibilities. The problem of hallucinations also occurs in the case of questions about the possibilities of OpenAI, it is worth knowing more or less what the documentation says about it in order to be able to pick out something that is obviously untrue.
Guys thank you so much, but I’m not using ChatGPT at all. I’m talking about the API. Here’s my ideal result:
- I upload a book or books to OpenAI account
- Then create an API integration with Davinci
- User inputs a question
- I send this question as a prompt to Davinci and add to the prompt to ONLY search for the right answer in the files provided by me
- Davinci will return the right answer
Now this being said what confuses me is the definition of how files should be uploaded to OpenAI.
What API Docs says means that it doesn’t quite work like I think it should… It states that I have to upload EXAMPLE file with possible prompts & ideal results so that Davinci can learn what is my desired outcome. But this is not what I need… Hence the confusion…
I was referring to ChatGPT as an object to proposition to use ChatGPT to discuss integration possibility, not to solve you problem. I am aware that you want to use the OpenAI API.
I strongly encourage you to read this tutoria: https://platform.openai.com/docs/tutorials/web-qa-embeddings.
I upload a book or books to OpenAI account
The only possibility to upload content to OpenAI is to use fine-tuning and files, however I don’t think it’s your use case, especially when you need only right answers.
What you should do is to create vectors using the Embeddings API for your content that measure the relatedness of text strings. Then you store vectors on your side in chosen vectors database.
Thanks to this, you will be able to match the question to the appropriate content. Once you know which fragment of your knowledge base probably contains the answer to the question, you send it along with the query to OpenAI.
Yes, that is not the best approach.
You should use Embeddings as a number of us have advised.
This involves breaking your book into chucks (appropriately sized) and storing the chucks (paragraphs could be a good starting point) in a DB table row, and the running the OpenAPI API to get an embedding vector for each “paragraph” (for example) and store the vector in the same row as the data (for example).
Then you can take your question and covert it to a vector in the same way and then do some basic math (dot_product, etc) of your “question vector” against all the “paragraph vectors” in the DB and rank them.
It’s basically a semantic search and retrieval application, not a chatbot application.
Hope this helps.