Assistant API cant read my PDF.. How come?

Hey,

Somehow the Legacy Assistant API cannot read my PDF, which is very very strange.

I keep on getting the error.

These are my steps:

My PDF is just a normal PDF file. Dont know why I get it.

This is the response:

“value”:"The file contains binary data which couldn’t be decoded as text, indicating it might be a non-text-based file, such as an Excel or PDF document. I’ll try to read it as an Excel file next to see if this works

indicating it might be a non-text-based file, such as an Excel or PDF document

The answer is right there. For whatever reason it’s not able to read your PDF. It looks like you’re using Code Interpreter to try and open it?

PDFs are usually NOT text-based.

How you see the PDF is not how a computer sees it

If you want it to read a PDF, you can use the Vision API.

https://platform.openai.com/docs/guides/vision

Or you can deposit it and utilize in Retrieval

https://platform.openai.com/docs/actions/data-retrieval

You can also convert it to Markdown so the bot can read it better (Best option)
(This is the first website I saw for PDF → Markdown)

2 Likes

Yes, I am using Code Interpreter to open it. Maybe thats the problem.

{
  "instructions": "You are a AI Business Auditor",
  "name": "AI Auditor",
  "tools": [
    {
      "type": "code_interpreter"
    }
  ],
  "model": "gpt-4-turbo"
}

maybe I should test it out with other options like:
code_interpreter , retrieval , or function

Btw the link regarding “data retrieval” seems complex.

I just want simple 2-3 PDFs which are max 5 pages long to be added as Context to my GPT assistant.

you give it a PDF you mean when you build a custom GPT with the database?