Teaching GPT the information it will be working on


I have been reading the new Assistants API and comparing it to the ChatCompletion. I was wondering what would be best if you wanted to teach GPT something before asking it questions, like giving it a research paper before asking it anything. Would Assistants API be more fit for the job than ChatCompletion, or rather, in what cases one might have a edge over the other ?
The maximum character length for the instructions key on the Assistants API is 32768, whereas gpt-4-1106-preview has a 128k token limit - in which case you would probably give it the paper information within system role message before writing other prompts.

Any thoughts ?

The way an API model works is that you only “teach” by sending all the relevant tokens to the AI at the same time a query is sent, which then allows it to form a response.

The AI does not learn from your interactions with it.

The assistants API allows one to upload a text file such as your distilled paper, and then if it is too large for the model context, it will be chunked and the AI input will be filled with the most semantically-similar parts to the user input part of a query.

OpenAI’s retrieval system is not limited to only the best quality chunks, but instead it fills the available context length at your expense, and also doesn’t have transparency about the improvements one could program themselves to ensure quality vector database retrieval.

1 Like

I have heard about vector databases and them being used to work with GPT. Do you know a good place where I can learn more about this ? Thank you

I’ll send you over to Microsoft’s Azure, which, along with them offering OpenAI AI services, includes better documentation.

OpenAI’s older cookbook goes right into terse python code and third-party products:

1 Like

Interesting. Would embeddings work on code as well ? Like if I were to give GPT 100 line of code and then to ask it questions, would embeddings have trouble finding similarity between code and written language ?

You can put 100 lines of code into the AI context and just ask. GPT-4 used to be capable of understanding more than that when total comprehension was required to make improvements that affected multiple parts of code.

It is when you need to take an entire GitHub repository - like say, OpenAI’s python library - and get parts retrieved that then you could explore embeddings.

However the model uses “semantic search” to generally find topical similarity. Differentiate if you are talking baseball or orangutans, Tokyo or Kyoto. Getting back the exact code that answers your natural language question would be quite a challenge.

For the example you provided with a GitHub library, for best responses you’ll probably have to give to the model the whole code (assuming it isn’t too large for the context) at the beginning of the conversation for it to work well ?

The idea of embeddings being used as augmentation is that some information can indeed be automatically placed at the beginning of conversation.

The data is chunked into pieces. That can be done in an organized manner or haphazardly.

An artificial “assistant” role message can be inserted, and we can indicate to the AI, “here’s some documentation I found relevant to the next user question: (blah)”

So if the entire code library is too large to reasonably fit (or to pay for in each question), and I don’t want to paste every time into my chat box, embeddings can be a case where a user query “I’m coding chat completion, and am using the ChatCompletion method, using the OpenAI python library’s chat method and httpx dependencies” might get you the 1/3 of the code that is actually most useful.

Can the overall structure of how the pieces interplay be seen in random or even per-file retrieval though?

Yeah, was thinking the same, since the overall structure of the pieces does matter.

Had a funny thought when thinking about what you said on semantic search. This is going to be more expensive and it will take longer, but might work as an idea. You could use 2 models to ask questions on a GitHub repository, you give the whole code to the first model without asking to respond with anything long, so that you stay within the limit (assuming you enter the 128k limit there’s on GPT-4 turbo). When there’s a question, you ask it to provide back the pieces of code that are directly linked to the question. And finally you provide the second model with the relevant pieces of code and with the question.

Don’t know exactly how long that might take or how expensive it can be though.