Training with blank prompts

Has anyone done training with blank prompts (where all the info is in the content tag)

I have blocks of technical information about a person, company or product (and general content from our web pages), I want to train GPT to answer questions related to the website content

I have considered splitting the text in half and putting half in prompt and half in content
I have also considered not giving a prompt and putting everything in content
I have also considered making up prompts like “raymonds profile”, “product a” etc

In simple terms, I want to find a way to train GPT on the content contained within our website

Thanks

3 Likes

I haven’t tried it myself, but I’m told that it can work.

You might want to try both (all) methods with just a little bit of data to see which one works best.

It will still hallucinate and not always stick on topic, though, even with fine-tuning.

Are you creating a chat bot for your website?

PS - welcome to the forum!

Check this: New and Improved Embedding Model

You can use Weaviate vector db to store your info.

You can then semantically query and generate content using Davinci 3.

2 Likes

Thanks for the link. I discovered that after posting and have headed in that direction.

The whole concept of embedding was confusion at first - but now I have the hang of it, I can see the real power it offers.

1 Like

Game changer for AI powered apps!

2 Likes

Hi @raymonddavey

The whole concept of embedding was confusion at first - but now I have the hang of it, I can see the real power it offers.

We also write about this topic (working with embeddings) on our blog any tips or suggestions for people new to embeddings? E.g., what is the thing you would have liked to read?

Thanks

1 Like

Hi, I’d like to jump on the discussion, as I have plans to use weaviate in a product - if you don’t mind.

On embeddings, simple tutorials would be good.
Almost everyone who comes to this forum the first thing they want to do is to fine tune the models to include knowledge base of their preference and the AI model to use it.
Eventually all learn that in most cases the embeddings is the way to go.

One of the common use cases is to create embeddings from a large dataset and not lose context, or otherwise said, when the end is user is sending a request, to make use of the whole dataset rather a single embedding.

For example a novel of 250 pages. Each page would be an embedding, and some embeddings wouldn’t have anything in common, but when an end user makes a request, to combine all necessary embeddings to formulate the correct response with GPT-3.

And another thing is the auth part. During the development I’m using the sandbox without auth implemented becase there is no sensitive information.
I tried to implement the auth, but was unusual complex. Maybe you have a walk through for the people who are not accustomed with weaviate’s auth integration.

Thanks @georgei – that’s super helpful and we will def keep this in mind.

I have;

  1. Not losing context with OpenAI embeddings on a large dataset;
  2. Authentication content.

Would you mind making yourself known to me on the Weaviate Slack? We would love to get your help on the above.

I’ll reach out in the next days, thanks.

If you do not want to loose any information a suggestion is to experiment with how much the information in the vectors should overlap.