Adding PDF in the assistant API input

I am creating an assistant for doing information extraction from PDFs
I tried using the assistant UI playground on the OpenAI platform and it worked pretty well
Now I am trying to do the same using OpenAI python SDK v1.2

Here’s my code:


with open('instruction.txt') as f:
    instructions = f.read()
    
assistant = client.beta.assistants.create()
    name="assistant",
    instructions=instructions,
    tools=[{"type": "code_interpreter"}],
    model="gpt-4-1106-preview"
)

file = client.files.create()
file = open("file1.pdf", "rb"),
  purpose='assistants'
)
page_num = 1
thread = client.beta.threads.create()

message = client.beta.threads.messages.create()
    thread_id=thread.id,
    role="user",
    content=f"extract for page {page_num}, (print all page text)",
    file_ids=[file.id]
)

run = client.beta.threads.runs.create()
  thread_id=thread.id,
  assistant_id=assistant.id,
instructions:"If there's no page, return a 'END' in the json response."
)

start_time = time.time()
while run.status!= "completed":
    run = client.beta.threads.runs.retrieve(
        thread_id=thread.id,
        run_id=run.id
    )
    if run.status in ["failed", "cancelled", "expired", "requires_action"]:
        print(f"run failed: {run.last_error}")
        break

end_time = time.time()

messages = client.beta.threads.messages.list(
    thread_id=thread.id
)

print(messages)

but in the output, I receive the following:

[ThreadMessage(id='msg_dHNi9zzDWoSDfBbFIC6wfHlp', assistant_id='asst_86coHdMlTjRdeYYjABp61x1s', content=[MessageContentText(text=Text(annotations=[], value="To assist you further, could you please provide more details about the uploaded file? Specifically, it would be helpful to know the type of file you've uploaded (e.g., PDF, Word document, text file, etc.) and what content you're expecting to extract from page 1."), type='text')]created_at=1699994179, file_ids=[], metadata={}, object='thread.message', role='assistant', run_id='run_5z7ScQh4f8xBZjylwqLP7CcR', thread_id='thread_hXk19MpSMNZwvpGY8GH9LQZE'), ThreadMessage(id='msg_tkdWWQvIDt8KMgUk1YJdMhCv', assistant_id=None, content=[MessageContentText(text=Text(annotations=[], value='extract for page 1, (print all page text)'), type='text')], created_at=1699994178, file_ids=['file-xCZr5vMxS1jc6vm4trWIbR5y'], metadata={}, object='thread.message', role='user', run_id=None, thread_id='thread_hXk19MpSMNZwvpGY8GH9LQZE')]

So the response I got from the model was:

To assist you further, could you please provide more details about the uploaded file?
Specifically, it would be helpful to know the type of file you've uploaded (e.g., PDF, Word document, text file, etc.) and what content you're expecting to extract from page 1."

When I tried the same using playground, I did not receive any similar messages

What changes do I need to make in the above code to make it read the file and do the OCR as it usually does while using code_interpreter in the playground?

Specifying its a PDF in the message content worked

This is very ridiculous. openai erases file type suffixes when uploading files, and requires developers to manually specify file types when using assistant to read files.
I hope openai must fix this problem in next version.