I need help buildning a chatbot on my own data


Im buildning a chat bot that should answer question about articles.
So im finetuning the api with all the articles?
How should my fine tune json look like? I want the bot to only answer questions from data in the articles nothing else.

1 Like

Chat over your own data is best handled with embeddings and a vector database. Fine-tuning is better used for changing the structure or way that GPT response.

See the Embeddings Guide and cookbooks for how to get started.

There are a lot of articles/documents in my database.
I dont want to send the whole database everytime someone ask a question about any articles?

You don’t send for every request. First you go through and create chunks of your articles and create embeddings for each chunk. Those are stored in your database or in a vector database (with metadata of what article they belong to).

On user query, you create embedding of the query, run similarity search on your vector database (this is built-in function on any common vector database), and add the resulting chunks to your GPT prompt.

If an article is ever updated, you generate new embeddings and replace them in the vector database.

A vector database isn’t strictly required, you could store the embeddings in the database you currently use and write your own code to do the similarity search, or see if your database has a vector extension/plugin.


Hi Pontus,
Novaphil is exactly right. Read through the links he posted above and you will be on your way. Once you have the embeddings down and understand them, if your bot isn’t replying in the ‘manner’ you want, like the wrong tone or you want more on-point responses, then look to fine-tuning with the questions and answers from your documents.

You will have to send all the questions and responses back to the API each time but just for the current conversation. There are ways to handle that too if it gets to large.

Good luck and have fun with it.