Ways to deal with prompts larger than model's context length

I am working on an interesting use case where we are trying to get a private GPT-4 instance to write small scripts using our in-house domain-specific language. The language is pretty simple and even the most complex scripts rarely go over 100 lines of code. In my first iteration, I experimented with a simple approach where I included a user guide in the system prompt and asked GPT-4 to write a straightforward script for me. The results were promising so we would like to explore this further. However, the first challenge I am facing is the length of all of our documentation which is in the region of 900k GPT-4 tokens. I have looked at options like chunking or creating summaries of each page of the doc however, none produce better results.

I was thinking if I could get GPT-4 to first come up with a pseudo script that suggests which specific parts of the documentation it would requires to write the final script, then in step 2 I can provide the pseudo script with the needed subset of documents to complete the job. This looks like something I can build using an agent framework. Am I on the right path here? Or is there a better, simple and a GPT-native way to solve this problem?

Hi @suhas.chatekar,

You can start experimenting with this using the Assistants API.

Get started quickly by using the Assistants Playground.

Just name the Assistant, add instructions, enable file search, and upload a text file with the code docs to its knowledge base.

1 Like

Thanks for this.

Am I right that this is using “tool calling” under the hood? If so, that is what something like LangGraph would be utilising as well. If I go down the agent framework path, I was thinking of using LangGraph, mainly because rest of what we have built is around Langchain ecosystem.

Yes, file_search is itself a tool that the assistant can call.

Here’s how it works under the hood:

The file_search tool implements several retrieval best practices out of the box to help you extract the right data from your files and augment the model’s responses. The file_search tool:

  • Rewrites user queries to optimize them for search.
  • Breaks down complex user queries into multiple searches it can run in parallel.
  • Runs both keyword and semantic searches across both assistant and thread vector stores.
  • Reranks search results to pick the most relevant ones before generating the final response.

By default, the file_search tool uses the following settings but these can be configured to suit your needs:

  • Chunk size: 800 tokens
  • Chunk overlap: 400 tokens
  • Embedding model: text-embedding-3-large at 256 dimensions
  • Maximum number of chunks added to context: 20 (could be fewer)
    Source: Docs
1 Like