I’m integrating OpenAI’s GPT-5 model with Salesforce Apex using the chat/completions API. The integration works, but responses are very slow — usually around 25–30 seconds for a single query.
A few details:
Using Named Credential for the API callout.
Request payload includes model = “gpt-5”, system + user messages.
When max_completion_tokens = 2000, I get responses but they are slow.
If I lower max_completion_tokens to 1000, I sometimes get no response at all (empty choices).
Example: asking a medical question takes ~30 seconds end-to-end.
In my experience, GPT-5 is simply very slow and unstable. Also if you allow it, GPT-5 selects the ‘best’ model for the task, so you might not even be waiting on GPT-5, but on another model. I’d recommend using the correct model immediately for your application. Check the documentation for more details on the different models.
What are your reasoning effort and verbosity parameters set to? These settings can dramatically affect response time. The system prompt has been replaced with instructions.
If I lower max_completion_tokens to 1000, I sometimes get no response at all
That is expected.
Example: asking a medical question takes ~30 seconds end-to-end.
The response time varies based on the complexity of the question, reasoning effort, and verbosity.
Salesforce Apex code runs on Salesforce servers, correct? So, there an extra layer of latency, right? There are other layers within the Apex classes?