Send files to completion api

alexxx · December 29, 2023, 5:50pm

Hello,
I am trying to send files to the chat completion api but having a hard time finding a way to do so. I have seen some suggestions to use langchain but I would like to do it natively with the openai sdk.
Any tips on how to do that?
Thank you

alexxx · December 29, 2023, 6:00pm

Trying to do something along these lines

from openai import OpenAI 
client = OpenAI() 
file = client.files.create( file=open("file.pdf", "rb"), purpose="fine-tune" ) 
client = OpenAI() completion = client.chat.completions.create( model="gpt-4-1106",
        messages=[ {"role": "system", "content": "You are a helpful assistant that can read PDFs."}, 
                   {"role": "user", "content": f"Extract the text from the 3rd page from {file.id}"} ] ) 
print(completion.choices[0].message)

someone mentioned the following code that works but want to understand what is happening under the hood

from langchain.document_loaders import PyPDFLoader
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain

#Locate your PDF here.
pdf="<YOUR_PDF_GOES_HERE>"
#Load the PDF
loader = PyPDFLoader(pdf)
documents = loader.load()

api_key = "sk-?????"
llm = OpenAI(openai_api_key=api_key)
chain = load_qa_chain(llm,verbose=True)
question = input("Enter your question here : ")
response = chain.run(input_documents=documents, question=question)
print(response)

cyzgab · December 30, 2023, 4:29am

To do it natively, you’d need to use OpenAI’s assistants api instead of the Chat Completion api. See here.

The files can be used by tools such as Code Interpreter or Retrieval.

Code Interpreter can parse data from files. This is useful when you want to provide a large volume of data to the Assistant or allow your users to upload their own files for analysis.

Similar to Code Interpreter, files can be passed at the Assistant-level or individual Message-level.

In the langchain example you shared, it’s doing Retrieval Augmented Generation which is similar to what the Retrieval in the assistants api. In langchain, the contents of the pdf files are parsed out and that text is added to the prompt. In the assistant api, this is handled for you.

Stawsh · February 11, 2024, 9:42pm

Alexx, did you ever find a way? I want to do, I think, the same thing. I want to ask some OpenAI API to summarize the content of a report that I have in a PDF/DOCX/etc. file. After reading about it’s ability to take a file, I had started down the ASSISTANTS API path. But after into I see that this leads to Retrieval which give the assistant more areas specific “knowledge” on which to base its answers … NOT what I want to do. I want a summarize of the content of my file. I am now looking at OpenAI Completions, but so far, I find it to be asking for me to, e.g., parse the file and pass in paragraphs.

Stawsh · February 12, 2024, 1:03am

Frustrated with not finding an OpenAPI by which I could get a summary of a file, I decided to try the Playground - Assistants. I uploaded a file, which required the gpt-3.5-turbo-1106 model. I then had say: “I need a summary of the report contained in file ”. It ran for quite a while, going through Create a thread, Run the thread, Run queued, Run in_progress and then Run expired 5894 tokens (5832 in, 62 out). I see no explanation for expired, but in the “Response” I do see “tool_calls” of “type”: “retrieval”. AND I don’t think that “retrieval” does what I need to do, i.e., summarize my report.

jens_a · February 26, 2024, 12:13am

Retrieval is what you want, per the docs:

Once a file is uploaded and passed to the Assistant, OpenAI will automatically chunk your documents, index and store the embeddings, and implement vector search to retrieve relevant content to answer user queries.

Assistants tools - OpenAI API

If it’s expiring have you tried using a smaller document to test with that first, it could be the size of your document is taking a while to get embeddings for or otherwise handle.

anushka · June 10, 2024, 12:25pm

Hi, whats the difference between using chat completion API & assistant
when uploading a pdf file & playing around with it?
what is the go to option?

Obey-the-State · June 27, 2024, 12:52am

The “go to option” is to use the “assistant” API to interrogate documents.

The first time I think you should:

go to the ‘playground’ https://platform.openai.com/playground/chat?models=gpt-4o
choose “assistants” (the second one) with ‘create’ in the Top LH corner
click “create” and add a system instructions (plus any other settings)
click “+ files” and upload your file. It will take a while to be processed (a few minutes)
try a few queries in the playground to make sure it is working like you think it should.

Using your assistant ID and your file ID you can now interrogate your document through the Assistant API (not the chat completions API). There are many examples of how to interrogate a document through the Assistant API on these here internets.

You can later upload documents, modify assistants and create new assistants through the Assistants API.

Note that the Assistants API can be “expensive” (not really, but compared) . The back-end retrieval of information from your document can be 15,000 tokens

What you are wanting to do - interrogate Docs through the chat completions API - will, I think, happen … somehow, some hack, only because it will be 5% of the cost.

Obey-the-State · June 27, 2024, 1:10am

Another three good options are:

fine-tune a model based on your document’s data. Time consuming but a great solution, and great learning. Then use chat completions on your fine-tuned model.
send (appropriate) chunks of (extracted text from) your document in the 128k (or however large) message space allowed by your model of choice, say gpt-4o. Large messages will start to get ‘pricey’ (say $0.03 !) but it is a solution. You might need a separate, possibly simple, AI call to determine what is the ‘appropriate’ parts of your document for your current question/discussion.
Retrieval Augmented Generation (RAG). See: How to parse PDF docs for RAG | OpenAI Cookbook This uses the chat.completions API and encodings.

cyzgab · March 16, 2025, 3:59pm

ICYMI, we can now send native PDFs to the API.

They’ll be processed as images & text (via OCR?).

https://platform.openai.com/docs/guides/pdf-files?api-mode=chat

from openai import OpenAI
client = OpenAI()

file = client.files.create(
    file=open("draconomicon.pdf", "rb"),
    purpose="user_data"
)

completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "file",
                    "file": {
                        "file_id": file.id,
                    }
                },
                {
                    "type": "text",
                    "text": "What is the first dragon in the book?",
                },
            ]
        }
    ]
)

print(completion.choices[0].message.content)

Topic		Replies	Views
Is attaching a file to a prompt possible through API as it is in the UI? API	12	11542	March 18, 2025
How can I upload pdf files in chatgpt and ask for a summary of it? API chatgpt , api	6	33948	December 23, 2023
Chat Completions vs Responses and pdf file (new PDF file vision upload modality added to CC) API chat-completion , playground , gpt-4-vision	4	1294	March 25, 2025
Send file as attachment in the prompt and ask questions about it instantly API chat-completion , file-uploads	7	43789	December 17, 2024
Chat completions api for attach a pdf API gpt-4	5	8629	June 10, 2024

Send files to completion api

Related topics