OpenAI thread run API incredibly slow

I am using the OpenAI thread run API, and it takes anywhere from 20-45 seconds to get the response. In the playground, when I run the same prompt in a assistant of identical config, it takes a maximum of 5 seconds to generate the full response.

Assistant config:

{
        model: 'gpt-4-turbo',
        temperature: 0.8,
        top_p: 0.2,
      }

Thread run method snippet:

  const run = await openai.beta.threads.runs.createAndPoll(thread.id, { assistant_id: assistant.id, max_prompt_tokens: maxPromptTokens })

  if (run.last_error) {
    throw new Error(run.last_error.message)
  }

  const messages = await openai.beta.threads.messages.list(thread.id, { run_id: run.id })
  const tokenUsageStats = run.usage

  const latestMessage = messages.data.pop()

  if (latestMessage.content[0].type === 'text') {
    const { text } = latestMessage.content[0]
    const { annotations } = text
    const citations = []

    let index = 0
    for (let annotation of annotations) {
      if (ignoreCitations) {
        text.value = text.value.replace(annotation.text, '')
      } else {
        const { file_citation } = annotation
        if (file_citation) {
          const citedFile = await openai.files.retrieve(file_citation.file_id)
          citations.push(`[${index}]${citedFile.filename}`)
        }
        index++
      }
    }

    return { response: text.value, tokenUsageStats, threadRun: run }

I have also tried different models like gpt-3.5-turbo/gpt-4o but still it seems to take that long.

Will interacting with the API always take this long even though executing it through the playground takes significantly less?

traceroute

might be your friend…

Also maybe it would be good to update the SDK by giving it a keepAlive: true option to fight latency?


@srijans

could you try to connect this way:

const axios = require('axios');
const https = require('https');

const agent = new https.Agent({
  keepAlive: true
});

const instance = axios.create({
  httpsAgent: agent
});

const run = await instance.post('https://api.openai.com/v1/beta/threads/runs', {
  thread_id: thread.id,
  assistant_id: assistant.id,
  max_prompt_tokens: maxPromptTokens
}, {
  headers: {
    'Authorization': `Bearer YOUR_API_KEY`,
    'Content-Type': 'application/json'
  }
});

if (run.data.last_error) {
  throw new Error(run.data.last_error.message);
}

const messagesResponse = await instance.get(`https://api.openai.com/v1/beta/threads/${thread.id}/messages`, {
  params: { run_id: run.data.id },
  headers: {
    'Authorization': `Bearer YOUR_API_KEY`
  }
});

const messages = messagesResponse.data;
const tokenUsageStats = run.data.usage;

const latestMessage = messages.data.pop();

if (latestMessage.content[0].type === 'text') {
  const { text } = latestMessage.content[0];
  const { annotations } = text;
  const citations = [];

  let index = 0;
  for (let annotation of annotations) {
    if (ignoreCitations) {
      text.value = text.value.replace(annotation.text, '');
    } else {
      const { file_citation } = annotation;
      if (file_citation) {
        const citedFileResponse = await instance.get(`https://api.openai.com/v1/files/${file_citation.file_id}`, {
          headers: {
            'Authorization': `Bearer YOUR_API_KEY`
          }
        });
        const citedFile = citedFileResponse.data;
        citations.push(`[${index}]${citedFile.filename}`);
      }
      index++;
    }
  }

  return { response: text.value, tokenUsageStats, threadRun: run.data };
}

Where do you live btw.? I found that connecting from my local machine in germany is exceptionally slower than from a server located in usa (>10 seconds).

Maybe you can try by upping a cloud instance over there and try again.

Also using VPN might be an option. Could be something your government is doing e.g. they might want to know what you are doing with AI – am I getting too paranoid?

1 Like