I am working on an interesting use case where we are trying to get a private GPT-4 instance to write small scripts using our in-house domain-specific language. The language is pretty simple and even the most complex scripts rarely go over 100 lines of code. In my first iteration, I experimented with a simple approach where I included a user guide in the system prompt and asked GPT-4 to write a straightforward script for me. The results were promising so we would like to explore this further. However, the first challenge I am facing is the length of all of our documentation which is in the region of 900k GPT-4 tokens. I have looked at options like chunking or creating summaries of each page of the doc however, none produce better results.
I was thinking if I could get GPT-4 to first come up with a pseudo script that suggests which specific parts of the documentation it would requires to write the final script, then in step 2 I can provide the pseudo script with the needed subset of documents to complete the job. This looks like something I can build using an agent framework. Am I on the right path here? Or is there a better, simple and a GPT-native way to solve this problem?
Am I right that this is using “tool calling” under the hood? If so, that is what something like LangGraph would be utilising as well. If I go down the agent framework path, I was thinking of using LangGraph, mainly because rest of what we have built is around Langchain ecosystem.
Yes, file_search is itself a tool that the assistant can call.
Here’s how it works under the hood:
The file_search tool implements several retrieval best practices out of the box to help you extract the right data from your files and augment the model’s responses. The file_search tool:
Rewrites user queries to optimize them for search.
Breaks down complex user queries into multiple searches it can run in parallel.
Runs both keyword and semantic searches across both assistant and thread vector stores.
Reranks search results to pick the most relevant ones before generating the final response.
By default, the file_search tool uses the following settings but these can be configured to suit your needs:
Chunk size: 800 tokens
Chunk overlap: 400 tokens
Embedding model: text-embedding-3-large at 256 dimensions
Maximum number of chunks added to context: 20 (could be fewer)
– Source: Docs