Ways to deal with prompts larger than model's context length

suhas.chatekar · July 5, 2024, 10:20am

I am working on an interesting use case where we are trying to get a private GPT-4 instance to write small scripts using our in-house domain-specific language. The language is pretty simple and even the most complex scripts rarely go over 100 lines of code. In my first iteration, I experimented with a simple approach where I included a user guide in the system prompt and asked GPT-4 to write a straightforward script for me. The results were promising so we would like to explore this further. However, the first challenge I am facing is the length of all of our documentation which is in the region of 900k GPT-4 tokens. I have looked at options like chunking or creating summaries of each page of the doc however, none produce better results.

I was thinking if I could get GPT-4 to first come up with a pseudo script that suggests which specific parts of the documentation it would requires to write the final script, then in step 2 I can provide the pseudo script with the needed subset of documents to complete the job. This looks like something I can build using an agent framework. Am I on the right path here? Or is there a better, simple and a GPT-native way to solve this problem?

sps · July 5, 2024, 11:19am

Hi @suhas.chatekar,

You can start experimenting with this using the Assistants API.

Get started quickly by using the Assistants Playground.

Just name the Assistant, add instructions, enable file search, and upload a text file with the code docs to its knowledge base.

suhas.chatekar · July 5, 2024, 12:19pm

Thanks for this.

Am I right that this is using “tool calling” under the hood? If so, that is what something like LangGraph would be utilising as well. If I go down the agent framework path, I was thinking of using LangGraph, mainly because rest of what we have built is around Langchain ecosystem.

sps · July 6, 2024, 12:54pm

Yes, file_search is itself a tool that the assistant can call.

Here’s how it works under the hood:

The file_search tool implements several retrieval best practices out of the box to help you extract the right data from your files and augment the model’s responses. The file_search tool:

Rewrites user queries to optimize them for search.

Breaks down complex user queries into multiple searches it can run in parallel.

Runs both keyword and semantic searches across both assistant and thread vector stores.

Reranks search results to pick the most relevant ones before generating the final response.

By default, the file_search tool uses the following settings but these can be configured to suit your needs:

Chunk size: 800 tokens

Chunk overlap: 400 tokens

Embedding model: text-embedding-3-large at 256 dimensions

Maximum number of chunks added to context: 20 (could be fewer)
– Source: Docs

Topic		Replies	Views
File search and token usage (assistant) API gpt-4 , chatgpt , assistants-api	5	303	February 11, 2025
Sending large document via API call and asking for a question over complete document? Prompting api	3	1773	February 26, 2024
Strategy Recommendation for "Custom Code Generation GPT" through API API api , gpt-4-turbo	6	3542	January 17, 2024
How to impart large, specific instruction set -- prompting, fine-tuning, other? Prompting gpt-4 , fine-tuning , prompt-engineering , assistants-api	5	180	January 14, 2025
FewShot with Document Refiner Prompting api	6	868	February 13, 2024

Ways to deal with prompts larger than model's context length

Related topics