Optimizing Response Time for Multilingual JSON Output with GPT-4o-Mini

I’m facing performance issues with generating a multilingual structured output using the GPT-4o-Mini model. Here are the specifics:

  • Prompt: Dynamic, up to 200 words and 1,400 characters.
  • Output: JSON schema in 2 languages, with 3 main keys. Two of these keys hold arrays of objects, each with 5 keys.
  • Response Time: Currently takes 10-12 seconds, which is too long for acceptable user experience.

Settings:

  • Model: gpt-4o-mini
  • Temperature: 0.9
  • Frequency Penalty: 0
  • Presence Penalty: 0.6

Question:

How can I optimize or reduce the response time while maintaining consistency across the multilingual output in a single request?