[Responses API] File Upload in Query vs ChatGPT

I have been trying out the new Responses API and wanted to include multiple PDF files in the conversation like this:

chat_starter = client.responses.create(
    model="gpt-4o-mini",
    instructions=prompt_instructions,
    input=initial_query,
    temperature=0.7
)

starter_id = chat_starter.id

initial_file_response = client.files.create(
    file=open(file_path, 'rb'), purpose="user_data")
initial_file_id = initial_file_response.id

initial_response = client.responses.create(
    model="gpt-4o-mini",
    instructions=prompt_instructions,
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_file",
                    "file_id": initial_file_id,
                },
                {
                    "type": "input_text",
                    "text": initial_query,
                },
            ]
        }
    ],
    previous_response_id=starter_id
)

I chained multiple responses with files in each line. However, as I add more files (4 files to be exact), I hit an error stating:

BadRequestError: Error code: 400 - {'error': {'message': 'The total token count of all files exceeds the maximum limit for this model. We can only stuff the first 4 files.', 'type': 'invalid_request_error', 'param': 'input', 'code': 'context_length_exceeded'}}

So ideally, I would reduce the token count but is there another way I can go about this? I can upload way more files on ChatGPT and ask questions on each one in the same conversation. How do I reproduce that behaviour in the API?

2 Likes

Try RAG via File Search instead?

https://platform.openai.com/docs/guides/tools-file-search

If I used RAG and turned the input file into a vector store, can the model still understand the tables and graphs in the uploaded PDF file? My understanding is the vector store is created only from text in the file.

2 Likes

Won’t work. The script is corrupted by having curly quotes and a comment written in the middle.

The script also does not address the central issue concerning this five-month old topic - sending more PDF file context that the AI model context length.

"input_file" content type in a user message is not really an “upload”. It is direct placement of complete extracted text and extracted page images into AI model context. The OP put more PDF in than can be placed in the model 1M token context window length.

ChatGPT succeeds by NOT doing that, but only making a vector store for tool search, which cannot provide full immediate context and understanding of a single PDF file with its conflated return chunks. And no, it cannot understand tables if the text cannot be extracted programmatically, but Code Interpreter with a ChatGPT reasoning model may have the AI making dozens of calls to try to extract some content itself with code and vision.