Send files to completion api

Hello,
I am trying to send files to the chat completion api but having a hard time finding a way to do so. I have seen some suggestions to use langchain but I would like to do it natively with the openai sdk.
Any tips on how to do that?
Thank you

Trying to do something along these lines

from openai import OpenAI 
client = OpenAI() 
file = client.files.create( file=open("file.pdf", "rb"), purpose="fine-tune" ) 
client = OpenAI() completion = client.chat.completions.create( model="gpt-4-1106",
        messages=[ {"role": "system", "content": "You are a helpful assistant that can read PDFs."}, 
                   {"role": "user", "content": f"Extract the text from the 3rd page from {file.id}"} ] ) 
print(completion.choices[0].message) 

someone mentioned the following code that works but want to understand what is happening under the hood

from langchain.document_loaders import PyPDFLoader
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain

#Locate your PDF here.
pdf="<YOUR_PDF_GOES_HERE>"
#Load the PDF
loader = PyPDFLoader(pdf)
documents = loader.load()

api_key = "sk-?????"
llm = OpenAI(openai_api_key=api_key)
chain = load_qa_chain(llm,verbose=True)
question = input("Enter your question here : ")
response = chain.run(input_documents=documents, question=question)
print(response) 

To do it natively, you’d need to use OpenAI’s assistants api instead of the Chat Completion api. See here.

The files can be used by tools such as Code Interpreter or Retrieval.

Code Interpreter can parse data from files. This is useful when you want to provide a large volume of data to the Assistant or allow your users to upload their own files for analysis.

Similar to Code Interpreter, files can be passed at the Assistant-level or individual Message-level.

In the langchain example you shared, it’s doing Retrieval Augmented Generation which is similar to what the Retrieval in the assistants api. In langchain, the contents of the pdf files are parsed out and that text is added to the prompt. In the assistant api, this is handled for you.

3 Likes

Alexx, did you ever find a way? I want to do, I think, the same thing. I want to ask some OpenAI API to summarize the content of a report that I have in a PDF/DOCX/etc. file. After reading about it’s ability to take a file, I had started down the ASSISTANTS API path. But after into I see that this leads to Retrieval which give the assistant more areas specific “knowledge” on which to base its answers … NOT what I want to do. I want a summarize of the content of my file. I am now looking at OpenAI Completions, but so far, I find it to be asking for me to, e.g., parse the file and pass in paragraphs.

Frustrated with not finding an OpenAPI by which I could get a summary of a file, I decided to try the Playground - Assistants. I uploaded a file, which required the gpt-3.5-turbo-1106 model. I then had say: “I need a summary of the report contained in file ”. It ran for quite a while, going through Create a thread, Run the thread, Run queued, Run in_progress and then Run expired 5894 tokens (5832 in, 62 out). I see no explanation for expired, but in the “Response” I do see “tool_calls” of “type”: “retrieval”. AND I don’t think that “retrieval” does what I need to do, i.e., summarize my report.

Retrieval is what you want, per the docs:

Once a file is uploaded and passed to the Assistant, OpenAI will automatically chunk your documents, index and store the embeddings, and implement vector search to retrieve relevant content to answer user queries.

Assistants tools - OpenAI API

If it’s expiring have you tried using a smaller document to test with that first, it could be the size of your document is taking a while to get embeddings for or otherwise handle.

Hi, whats the difference between using chat completion API & assistant
when uploading a pdf file & playing around with it?
what is the go to option?

The “go to option” is to use the “assistant” API to interrogate documents.

The first time I think you should:

  • go to the ‘playground’ https://platform.openai.com/playground/chat?models=gpt-4o
  • choose “assistants” (the second one) with ‘create’ in the Top LH corner
  • click “create” and add a system instructions (plus any other settings)
  • click “+ files” and upload your file. It will take a while to be processed (a few minutes)
  • try a few queries in the playground to make sure it is working like you think it should.

Using your assistant ID and your file ID you can now interrogate your document through the Assistant API (not the chat completions API). There are many examples of how to interrogate a document through the Assistant API on these here internets.

You can later upload documents, modify assistants and create new assistants through the Assistants API.

Note that the Assistants API can be “expensive” (not really, but compared) . The back-end retrieval of information from your document can be 15,000 tokens

What you are wanting to do - interrogate Docs through the chat completions API - will, I think, happen … somehow, some hack, only because it will be 5% of the cost.

Another three good options are:

  1. fine-tune a model based on your document’s data. Time consuming but a great solution, and great learning. Then use chat completions on your fine-tuned model.
  2. send (appropriate) chunks of (extracted text from) your document in the 128k (or however large) message space allowed by your model of choice, say gpt-4o. Large messages will start to get ‘pricey’ (say $0.03 !) but it is a solution. You might need a separate, possibly simple, AI call to determine what is the ‘appropriate’ parts of your document for your current question/discussion.
  3. Retrieval Augmented Generation (RAG). See: How to parse PDF docs for RAG | OpenAI Cookbook This uses the chat.completions API and encodings.
1 Like