Training with website

I have a website with a lot of information and I would like gpt to search it before answering something specific. it’s possible

1 Like

Yes this can be done very well

Use this extension and then append to your prompt site:domain.com

Or do the same thing with Perplexity.com

https://www.perplexity.ai/?s=u&uuid=dba9a968-22aa-4b05-973a-3addb5826878

1 Like

but gpt would look for the answer first on the website and then if it doesn’t find a search elsewhere? (in the data he already has training)

“model”: “text-davinci-003”,
“prompt”:“site: terra.com.br qual a noticia mais importante hoje?”,

?

I believe you misunderstood, it would be for example how can I use the openai gpt api the user sends a question and I search on a website/pdf before answering something that is on the website/pdf and if I don’t find it there yes search in its base by knowledge standard.

You can do this with embeddings, take a look at the API docs.

But I didn’t understand how can I use this embadding in my model?

You must use the embeddings API to index the information you want to use as a source of data, then you do it again for the search query. You perform a cosine distance calculation operation between the search embedding and those of your data source. The first results will be the search results. These can be used in the prompt to query the Completions API for a final and elaborated response.

I simplified it a lot, but more or less…

1 Like

damn, thanks for the reply. I still don’t understand, do you happen to have an example? From the documentation, I didn’t understand how to get and consult this data whenever the user sends a question at the prompt

You could ask ChatGPT for instructions/examples in the language you need :wink:

OK… here ya go… here is how I do this in my lab for testing and system engineering eval:

  1. I have a DB table with the text I want to search stored with other params in each row. I do this with completions text but of course you can populate your DB as you like.

  2. When I commit the text to the DB, the model calls the embeddings API, gets a vector, and stores the vector in the same table row. Now the table row has both the text and the vector (the embedding).

  3. When I search, I choose the ranking method (normally use the dot_product because it is the same as the cosine similarity function for unit vector 1), and use the API to get the vector for the search term.

  4. Then I pull every vector from the DB (along with the id of the row) and run the dot_product of the search term vector with each text vector and store the results in an array.

  5. Then, I sort the array (based on the correlations method used) and there you have it :slight_smile:

Example: Set up Search

Example: Top Vector Search Results

HTH

Of course, if I add a completion for:

What is a cave system?

… and run the vector search again, we get:

So, I think it’s easy to understand, now right @regulador261 ?

1 Like

does this work? OpenAI API