I’m facing performance issues with generating a multilingual structured output using the GPT-4o-Mini model. Here are the specifics:
- Prompt: Dynamic, up to 200 words and 1,400 characters.
- Output: JSON schema in 2 languages, with 3 main keys. Two of these keys hold arrays of objects, each with 5 keys.
- Response Time: Currently takes 10-12 seconds, which is too long for acceptable user experience.
Settings:
- Model:
gpt-4o-mini
- Temperature: 0.9
- Frequency Penalty: 0
- Presence Penalty: 0.6
Question:
How can I optimize or reduce the response time while maintaining consistency across the multilingual output in a single request?