Files API issue when sending pdf files to extract out info

We’re trying to use Files API via javascript to extract some info.
Here’s code to extract out info.

async analyzeCOI(files: { file: any; fileName: string }, prompt: string) {
const uploadedFiles = ;
try {
// 1. Upload files to OpenAI
for (const file of files) {
const uploaded = await this.client.files.create({
file: file.file,
// purpose: “assistants”,
purpose: “user_data”,
});
uploadedFiles.push(uploaded);
}
const fileIds = uploadedFiles.map((f) => f.id);

  // 2. Build content array: all files + prompt
  // const filePromises = files.map(async (file) => {
  //   const base64 = await fileToBase64(file.file);
  //   return {
  //     type: "file",
  //     file: {
  //       filename: file.fileName,
  //       file_data: `${base64}`,
  //     },
  //   };
  // });

  const content = [
    // ...(await Promise.all(filePromises)),
    ...fileIds.map((id) => ({
      type: "file",
      file: {
        file_id: id,
      },
    })),
    {
      type: "text",
      text: prompt,
    },
  ];

  // 3. Call chat/completions API
  // add 1s delay
  await new Promise((resolve) => setTimeout(resolve, 3000));
  const completion = await this.client.chat.completions.create({
    // model: "gpt-4o",
    // model: "gpt-4.1",
    model: "gpt-4.5-preview",
    messages: [
      {
        role: "user",
        content,
        file_ids: fileIds,
      },
    ],
    // max_tokens: 2048,
  });

  // 4. Parse result
  const responseText = completion.choices?.[0]?.message?.content || "";

  // console.log("responseText: ", responseText);
  // 5. Try to extract insuranceData from AI result
  let insuranceData = [];
  const jsonMatch = responseText.match(/```json\n([\s\S]*?)\n```/);
  // console.log("jsonMatch: ", jsonMatch);

  if (jsonMatch?.[1]) {
    // console.log("jsonMatch[1]: ", jsonMatch[1]);
    insuranceData = await JSON.parse(jsonMatch[1]);
  } else {
    const match = responseText.match(/\[[\s\S]*\]|\{[\s\S]*\}/);
    insuranceData = await JSON.parse(match?.[0] || responseText);
  }

  return insuranceData;
} finally {
  // 6. Clean up: Delete uploaded files from OpenAI storage
  console.log("deleting files: ", uploadedFiles);
  // for (const uploaded of uploadedFiles) {
  //   try {
  //     await this.client.files.delete(uploaded.id);
  //   } catch (e) {
  //     console.error("Error deleting file: ", e);
  //   }
  // }
}

}
}

Sometimes we received message from

Request URL
v1/chat/completions

Request Method: POST

{
“id”: “chatcmpl-BeZ5uEV8tuim1I6BksjS26VfWYzku”,
“object”: “chat.completion”,
“created”: 1749007722,
“model”: “gpt-4.5-preview-2025-02-27”,
“choices”: [
{
“index”: 0,
“message”: {
“role”: “assistant”,
“content”: “Please provide the document you would like me to analyze. Once I receive the document, I will analyze and return the extracted insurance details in the specified JSON format.”,
“refusal”: null,
“annotations”:
},
“finish_reason”: “stop”
}
],
“usage”: {
“prompt_tokens”: 503,
“completion_tokens”: 32,
“total_tokens”: 535,
“prompt_tokens_details”: {
“cached_tokens”: 0,
“audio_tokens”: 0
},
“completion_tokens_details”: {
“reasoning_tokens”: 0,
“audio_tokens”: 0,
“accepted_prediction_tokens”: 0,
“rejected_prediction_tokens”: 0
}
},
“service_tier”: “default”,
“system_fingerprint”: null
}

    "content": "Please provide the document you would like me to analyze. Once I receive the document, I will analyze and return the extracted insurance details in the specified JSON format.", is very puzzled to us. Clearly we have waited for 3sec to complete uploading the file attached.  It doesn't always happen but happen often enough.

We’ve been struggling on this issue for a while. Anyone has any suggestions to work around this issue? Very appreciate any inputs.

This has been an ongoing issue with the PDF file service.

100 tests

Token usage shows files are not supplied to the AI


I ran a slow loop of 100 calls using two small PDF files for vision only, uploaded each time, then a sleep. Then asking about them via chat completions, reversing the input files being uploaded each time (which for base64 instead of file_id, exhibits another immediate problem of only the last PDF being seen).

I used both a random user parameter and a time() nonce in user message to break caching.

The input prompt usage collected for these tells the tale of the unreliability of the endpoint:

>>>prompt_usages
[1874, 1874, 62, 1874, 1874, 62, 1874, 62, 1874, 1874, 62, 62, 1874, 62, 1874, 1874, 62, 62, 1874, 1874, 62, 1874, 1874, 62, 62, 1874, 1874, 1874, 62, 1874, 1874, 62, 62, 1874, 62, 62, 1874, 1874, 62, 62, 62, 1874, 1874, 62, 62, 62, 62, 62, 62, 62, 62, 1874, 62, 62, 62, 1874, 1874, 62, 62, 1874, 62, 1874, 62, 62, 1874, 1874, 62, 1874, 1874, 62, 1874, 62, 1874, 62, 62, 1874, 1874, 1874, 62, 1874, 1874, 62, 1874, 62, 62, 1874, 62, 62, 62, 62, 1874, 1874, 1874, 1874, 1874, 62, 1874, 62, 62, 1874]

>>>print(usage_counts)
Counter({62: 51, 1874: 49})

50% of the PDF calls to chat completions (and gpt-4.1-nano in this case but seen on all others) FAIL to provide the AI the PDF contents.

The PDF input_file feature is broken

This topic has been flagged to OpenAI:

I’ve started testing the responses API with PDFs and I’m not seeing the same issue there. Perhaps too early to tell, but I’ve run around a dozen PDFs through it and I’m not seeing the same intermittent “No PDF file included” responses there.