Vision fine-tuning training-file preprocessing fails with HTTP 500 "Infrastructure Issues"

Hello,

I am consistently running into a backend infrastructure error when trying to run supervised vision fine-tuning jobs. I need help to identify the root cause, as the user-facing logs do not provide any actionable details.

The Problem

When the job attempts to preprocess my training file, it runs for about 60–80 minutes and then completely fails with an HTTP 500 error.

  • Small training files: Preprocess successfully.
  • Validation files: Preprocess successfully ).

Large training files: Consistently fail during preprocessing.

The API error tells me to check the Logs tab, but the UI Logs/Events tab only says: “File Preprocessing failed for file training file,” with no actual error trace.

My Ask

Could someone from the backend team look up the Job ID below and check the internal logs to tell me why this large file is failing to preprocess and how to overcome it?

Failing Job Information

Job ID: ftjob-e0c116095e0a4da5a9a8ec5ccd80c4fd

Training File ID: file-323036aa3f0f4c019c00d5498416abbc

Validation File ID: file-f8a5247df8604db28ede600d41871fd9

Hyperparameters: n_epochs=3, batch_size=8, learning_rate_multiplier=2.0, seed=42, suffix: sampled-62

  • Timeline: Preprocessing started at epoch 1780648527 and failed at 1780651980 (~57 minutes).

Exact API Error Response:
status: failed

error: code=‘500’,
message=‘File preprocessing failed. For more details, please check the Logs tab.’,

param=‘Infrastructure Issues’