Detecting When a PDF Should Use File Search vs Full Context

It seems that when we submit PDFs to ChatGPT is has two modes:

  1. If PDF is very large: PDF gets chunked and the LLM receives a `file_search` tool for retrieval.
  2. If PDF is small: The entire content gets stuffed on the message context.

I want to achieve the same result using the Responses API. However, it is hard for us to estimate how large (how many tokens) a PDF is: because it would require us to extract the text and the images.

Has anyone implemented this? Any recommendations?

https://platform.openai.com/docs/guides/pdf-files

Note: the Supported models section needs updating.

I read that before posting, obviously.

You can try using the input token count endpoint.

It will return something like:
InputTokenCountResponse(input_tokens=402183, object='response.input_tokens')

It will give you an estimate of how many input tokens will be consumed for the given prompt.

Then, you can decide to transform it into a vector store or something else if it is above a certain amount.

2 Likes

That seems like a promising path but I have a couple of questions on how that works:

If I give the conversation argument then the input is added to the conversation — and then I can no longer decide not to add the PDF to the conversation?

If I don’t give the conversation argument does that endpoint charge me for the input tokens, and then I have to pay again if I do decide to give the PDF? Or is that endpoint free?

It is free, as far as I know (as long as there is no abuse).

It won’t affect the conversation, it is basically a request simulation. Even the parameters are basically the same as a response request, and you can measure other types of inputs too.

1 Like

Awesome. Will use this. Thanks!

2 Likes