I am sending request to open ai function calling and it is taking around 40-50 seconds

akumar1 · December 11, 2025, 8:35am

’m experiencing a consistent latency issue with the OpenAI API where many requests take 20–30+ seconds to complete, even after several optimizations. I’m using the model to extract entities using an ontology-based function-calling setup, and my input text averages around 30,000 characters (~8–9k tokens). I’ve tried switching models (including gpt-4.1-mini), simplified the function schema, and verified using Stopwatch in .NET that the delay happens inside the OpenAI API, not in my application. Even when the output is small, response times frequently exceed 20 seconds, which feels unusually high and inconsistent. I suspect the slowdown may be due to internal queueing related to TPM/RPM limits, concurrency restrictions, or deployment capacity, but I’m not fully sure how these contribute or how to diagnose them. I have access to the OpenAI dashboard, but I’m unclear which metrics (rate limits, autoscaling behavior, concurrency settings, instance count, etc.) directly impact this type of latency or how to interpret the queue time vs. compute time breakdown. I would appreciate guidance on whether this latency is expected for large ontology-based extraction prompts, how to determine if my requests are being queued, and what configuration changes (adjusting rate limits, enabling autoscaling, modifying deployment settings, or switching model versions) could help improve consistency and reduce response times.

Topic		Replies	Views
[GPT-3.5-Turbo-16k] Response generation is slower now for Function Calls API gpt-35-turbo , function-calling	9	3105	October 13, 2023
GPT-4.1 models are very slow due to API response. API	6	867	December 29, 2025
Slow Responses From POST https://api.openai.com/v1/chat/completions API api	1	251	October 17, 2025
API call latency poses an issue API api	0	492	April 15, 2024
Unstable speed of gpt-3.5-turbo-16k API api , gpt-35-turbo-16k , performance	7	1193	December 29, 2025

I am sending request to open ai function calling and it is taking around 40-50 seconds

Related topics