High latency with a fine tuned 4o-mini model

Hey folks. I am working on a fine tune problem where we are generating json files that follow a specific internal structure. Previously, we were using a RAG based approach and were able to get a result in single digit seconds. Once a model was fine tuned, this latency is up to be over one minute. Have others experienced this?

For what it’s worth, this was a test run. The output was poor quality but number of output tokens was consistent with the targets.