Questions taking over a minute to process with my AppsScripts API connection, how can I dramatically increase speed?

Hi there.

I come to you because I am running out of options to make my OpenAI integration project successful. I have a GSheet that contains a long set of questions, which I need the model to answer based on documentation I initially pass it.

I am using a Google AppsScript tool (javascript-based language) to set up the API connection, and it works fine except for the problem that AppsScript has a maximum execution time of 30 minutes, and my script is leading OpenAI to take over 1 minute to answer each question, which leads to not having received even half the answers I need by the time the execution times out.

This is my current script (a set of functions nested in GenAI_answers()), where I first create a vector store with the necessary documentation, then I open a thread, and then I run it with my assistant while passing each question and retrieving the answer. Note that I am already using 3.5 turbo to increase the processing speed, but even this is not enough.

What are some optimization steps I could take in order to make it faster?
Thank you so much!!

Script:

function GenAI_answers() { 

  const ui = SpreadsheetApp.getUi(); // Build user interface
  SpreadsheetApp.getActiveSpreadsheet().toast("Running...", "Status", -1); // Set window showing status of the process
  
  const sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Catalog_Project'); // Get working sheet
  const lastRow = sheet.getLastRow(); // Get last row of the working sheet
  
  const docUrl = Browser.inputBox("Enter the URL of the documentation folder"); // Build imput box asking for screening files folder url
  const folderId = extractFolderId(docUrl); // Extract folder ID from the screening files folder url
  
  const OPENAI_API_KEY = 'xxx';  // Set the OpenAI API key
  const ASSISTANT_ID = 'asst_xxx';  // Set the OpenAI Assistant ID
  const headers = { // Set headers enabling API access
    'Authorization': 'Bearer ' + OPENAI_API_KEY, // Authorization to access API
    'Content-Type': 'application/json', // Format of the content we are passing/retrieving from the API (.json file)
    'OpenAI-Beta': 'assistants=v2' // Version of the Assistant we are using (v1 was deprecated)
  };

  // Step 1: Upload PDFs to an OpenAI Vector Store
  const vectorStoreId = uploadPdfsToOpenAIwithRetry(folderId, headers); // Create and capture ID of the Vector Store (to store files in a vector that we then pass to the AI assistant when initiating the conversation/thread)
  
  // Step 2: Create an empty thread with the attached vector store
  const threadId = createOpenAIThread(headers, vectorStoreId); // Create and capture ID of the conversation/thread with the AI Assistant
  if (!threadId) {
    ui.alert("Failed to create OpenAI thread."); // Notify if the function fails
    return;
  }

  // Set Column X header (where OpenAI answers will go)
  sheet.getRange(x, xx).setValue("OpenAI Answers"); 

  // Get all questions from Column Y at once
  const questions = sheet.getRange(x, xx, lastRow - x, x).getValues();

  // Prepare an array to store answers
  let answers = [];

  // Iterate over the questions to get answers for those that are not blank
  for (let i = 0; i < questions.length; i++) {
    const question = questions[i][0];
    if (question) {
      Logger.log(`Processing question: ${question}`);
      const answer = getOpenAIAnswer(headers, threadId, question);
      Logger.log(`Received answer: ${answer}`);
      answers.push([answer || "No response"]);
    } else {
      // If the row is blank, just push a blank entry
      answers.push([""]);
    }
  }

  // Now write all answers at once into Column Y (starting at row x)
  sheet.getRange(x, xx, answers.length, x).setValues(answers);

  SpreadsheetApp.getActiveSpreadsheet().toast("Finished!", "Status", 3); // Change the status of the small window when the script finishes running and close it
  SpreadsheetApp.getUi().alert("The script has completed successfully!"); // Notify when the script finishes running
}

//Nested functions operating within the main function
// Extract Folder ID from Google Drive URL
function extractFolderId(url) {
  const match = url.match(/[-\w]{25,}/); // Regular expression enabling to capture the ID
  return match ? match[0] : null;
}

// Create Vector Store and upload PDFs to OpenAI
function uploadPdfsToOpenAI(folderId, headers) {
  const folder = DriveApp.getFolderById(folderId);
  const files = folder.getFilesByType(MimeType.PDF);
  const uploadedFileIds = [];

  while (files.hasNext()) {
    const file = files.next();
    const blob = file.getBlob();
    
    const response = UrlFetchApp.fetch("https://api.openai.com/v1/files", {
      'method': 'post',
      'headers': {
        'Authorization': 'Bearer sk-proj-xxx'//,
        //'Content-Type': 'multipart/form-data'
      },
      'payload': {
        'purpose': 'assistants',
        'file': blob.getAs('application/pdf')
      },
      'muteHttpExceptions': true
    });

    Logger.log("File Upload Response: " + response.getContentText());

    if (response.getResponseCode() === 200) {
      const fileData = JSON.parse(response.getContentText());
      uploadedFileIds.push(fileData.id);
    }
  }

  if (uploadedFileIds.length === 0) {
    throw new Error("No files were uploaded successfully.");
  }

  Logger.log("Uploaded File IDs: " + uploadedFileIds);

  // Step 2: Create a vector store with the uploaded files
  const vectorStorePayload = {
    file_ids: uploadedFileIds
  };

  const vectorStoreResponse = UrlFetchApp.fetch("https://api.openai.com/v1/vector_stores", {
    method: "post",
    headers: headers,
    payload: JSON.stringify(vectorStorePayload),
    muteHttpExceptions: true
  });

  if (vectorStoreResponse.getResponseCode() !== 200) {
    throw new Error("Failed to create vector store: " + vectorStoreResponse.getContentText());
  }

  const vectorStoreData = JSON.parse(vectorStoreResponse.getContentText());
  const vectorStoreId = vectorStoreData.id;
  Logger.log("Created Vector Store ID: " + vectorStoreId);

  return vectorStoreId; // Return vector store ID
}

// Create OpenAI Thread
function createOpenAIThread(headers, vectorStoreId) {
  const threadParams = {
    messages: [],  // Empty, as documents will be used for future Q&A
    tool_resources: {
      file_search: {
        vector_store_ids: [vectorStoreId] // Attach vector store
      }
    }
  };

  Logger.log("Thread creation payload: " + JSON.stringify(threadParams));

  const response = UrlFetchApp.fetch("https://api.openai.com/v1/threads", {
    method: "post",
    headers: headers,
    payload: JSON.stringify(threadParams),
    muteHttpExceptions: true
  });

  Logger.log("Thread Creation Response: " + response.getContentText());

  if (response.getResponseCode() !== 200) {
    Logger.log("Error Creating Thread: " + response.getContentText());
    return null;
  }

  return JSON.parse(response.getContentText()).id;
}

// Get Answer from OpenAI API
function getOpenAIAnswer(headers, threadId, question) {
  //Logger.log(`Asking question: ${question}`);

  // Ensure no active run is blocking (i.e. if a run has not finished when another starts, the script will fail) 
  let backoffTime = 5000; // start with 5 seconds
  while (true) {
    const activeRunResponse = UrlFetchApp.fetch(`https://api.openai.com/v1/threads/${threadId}/runs`, {
      method: "get",
      headers: headers,
      muteHttpExceptions: true
    });

    if (activeRunResponse.getResponseCode() !== 200) {
      Logger.log("Error checking active runs: " + activeRunResponse.getContentText());
      return "Error checking active runs.";
    }

    const activeRuns = JSON.parse(activeRunResponse.getContentText());
    if (!activeRuns.data.some(run => run.status === "queued" || run.status === "in_progress")) {
      break; // No active runs, safe to proceed
    }

    Logger.log("Existing run detected. Waiting...");
    Utilities.sleep(backoffTime);
    backoffTime = Math.min(backoffTime * 1.5, 30000); // Increase delay exponentially, max 30s (poll with exponential backoff)
  }


  // Now prepare question to thread (the payload you are sending the API)
  const messagePayload = {
    'role': 'user', // You take the user role
    'content': [{ 'type': 'text', 'text': `Question: ${question}` }]
  };

  // Send the message (question) to the OpenAI thread
  const messageQuestion = UrlFetchApp.fetch(`https://api.openai.com/v1/threads/${threadId}/messages`, {
    'method': 'post',
    'headers': headers,
    'payload': JSON.stringify(messagePayload),
    'muteHttpExceptions': true
  });

  //Logger.log("Message Response: " + messageQuestion.getContentText());

  // Check if response was successful (ResponseCode is 200)
  if (messageQuestion.getResponseCode() !== 200) { 
    return `Error sending question: ${messageQuestion.getContentText()}`;
  }

  // Run the thread using the assistant, that answers your last question
  const runResponse = UrlFetchApp.fetch(`https://api.openai.com/v1/threads/${threadId}/runs`, {
    'method': 'post',
    'headers': headers,
    'payload': JSON.stringify({
      assistant_id: 'asst_xxx', // Assistant ID
      tools: [{ type: "file_search" }],  // Ensures assistant uses uploaded files
      stream: false // Non-streaming mode
    }),
    'muteHttpExceptions': true
  });

  //Logger.log("Run Response: " + runResponse.getContentText());

  // Check if the response was successfully run
  if (runResponse.getResponseCode() !== 200) { 
    return `Error starting assistant run: ${runResponse.getContentText()}`;
  }

  const runData = JSON.parse(runResponse.getContentText());
  const runId = runData.id;
  let status = runData.status;
  let startTime = new Date().getTime();

  // Polling until completion (the assistant finishes processing)
  while (status === 'queued' || status === 'in_progress') {
    if (new Date().getTime() - startTime > 120000) return "Timeout waiting for response.";

    const checkResponse = UrlFetchApp.fetch(`https://api.openai.com/v1/threads/${threadId}/runs/${runId}`, {
      'method': 'get',
      'headers': headers,
      'muteHttpExceptions': true
    });

    const checkData = JSON.parse(checkResponse.getContentText());
    status = checkData.status;
    //Logger.log("Run Status: " + status);

    // Exponential backoff
    Utilities.sleep(Math.min(2000 * Math.pow(2, Math.floor((new Date().getTime() - startTime) / 10000)), 30000));
  }

  // Fetch the assistant's final response once completed
  const messagesResponse = UrlFetchApp.fetch(`https://api.openai.com/v1/threads/${threadId}/messages`, {
    'method': 'get',
    'headers': headers,
    'muteHttpExceptions': true
  });

  if (messagesResponse.getResponseCode() === 200) {
    const messagesData = JSON.parse(messagesResponse.getContentText());
    const assistantMessage = messagesData.data.find(m => m.role === 'assistant' && m.content);
    if (assistantMessage) {
      const contentItem = assistantMessage.content.find(c => c.type === 'text');
      if (contentItem && contentItem.text && contentItem.text.value) {
        return contentItem.text.value.trim();
      }
    }
  }

  return "Failed to get response.";
}

// Upload PDFs to OpenAI
function uploadPdfsToOpenAIwithRetry(folderId, headers) {
  const folder = DriveApp.getFolderById(folderId);
  const files = folder.getFilesByType(MimeType.PDF);
  const uploadedFileIds = [];

  while (files.hasNext()) {
    const file = files.next();
    const blob = file.getBlob();
    
    let fileId = uploadFileWithRetry(blob, headers);
    if (fileId) {
      uploadedFileIds.push(fileId);
    }
  }

  if (uploadedFileIds.length === 0) {
    throw new Error("No files were uploaded successfully.");
  }

  Logger.log("Uploaded File IDs: " + uploadedFileIds);

  // Step 2: Create a vector store with the uploaded files
  const vectorStorePayload = {
    file_ids: uploadedFileIds
  };

  const vectorStoreResponse = UrlFetchApp.fetch("https://api.openai.com/v1/vector_stores", {
    method: "post",
    headers: headers,
    payload: JSON.stringify(vectorStorePayload),
    muteHttpExceptions: true
  });

  if (vectorStoreResponse.getResponseCode() !== 200) {
    throw new Error("Failed to create vector store: " + vectorStoreResponse.getContentText());
  }

  const vectorStoreData = JSON.parse(vectorStoreResponse.getContentText());
  const vectorStoreId = vectorStoreData.id;
  Logger.log("Created Vector Store ID: " + vectorStoreId);

  return vectorStoreId; // âś… Return vector store ID
}

// Function to upload a single file with retry logic
function uploadFileWithRetry(blob, headers) {
  const url = "https://api.openai.com/v1/files";
  let maxRetries = 5;
  let baseDelay = 1000; // Start with 1 second
  let attempt = 0;

  while (attempt < maxRetries) {
    try {
      const options = {
        'method': 'post',
        'headers': {
          'Authorization': 'Bearer sk-proj-xxx'
        },
        'payload': {
          'purpose': 'assistants',
          'file': blob.getAs('application/pdf')
        },
        'muteHttpExceptions': true
      };

      let response = UrlFetchApp.fetch(url, options);
      let responseText = response.getContentText();
      
      Logger.log("File Upload Response: " + responseText);

      let jsonResponse = JSON.parse(responseText);
      if (response.getResponseCode() === 200 && jsonResponse.id) {
        return jsonResponse.id; // Successfully uploaded, return file ID
      } else {
        Logger.log("Error: " + jsonResponse.error?.message);
      }
      
    } catch (error) {
      Logger.log("Attempt " + (attempt + 1) + " failed: " + error.toString());
    }

    attempt++;
    if (attempt < maxRetries) {
      let waitTime = baseDelay * Math.pow(2, attempt) + Math.random() * 500; // Exponential backoff with jitter
      Logger.log("Retrying in " + waitTime + " ms...");
      Utilities.sleep(waitTime);
    }
  }

  Logger.log("File upload failed after " + maxRetries + " attempts.");
  return null;
}

Here are some key points to consider to achieve better performance:

  • Batch Processing
    • Group questions into smaller batches (5-10)
    • Process batches in parallel
    • Reduce API calls
  • Time Management
    • Implement time-based triggers to handle timeouts
    • Break execution into multiple runs
    • Track progress in ScriptProperties
  • API Optimization
    • Reduce polling frequency
    • Use smaller chunk sizes for vectors
    • Implement exponential backoff with lower intervals
  • Error Handling
    • Save progress after each batch
    • Implement recovery mechanism
    • Log failures for retry
  • Alternative Execution
    • Split into multiple scheduled triggers
    • Process 50 questions per run
    • Auto-schedule next batch

Cheers. :hugs:

The assistants platform often has performance problems. The code of the backend can queue the AI inference calls for unknown reasons, or perhaps purposefully to distribute load.

You are also using vector stores repeatedly, which needs document extraction, chunking, and embeddings run on them to be put into a vector database. That is time that you should make efficient through re-use of existing assets.

Chat Completions is sub-second response streaming.

The only thing you would need is to re-implement the usage of vector stores with your own RAG vector dB. You can keep your own embeddings per-document so you never have to run extraction or more calls again, you just re-use the per-document vector store info you already have.

Then you can think about why you need to run this remotely, and how to run better in parallel in a manner which can be resumed.

Thank you. Apologies if this is a dumb question, me not having developer background, but how would batching work here? I have read the documentation for the Batch API, and it seems to indicate this is “ideal for processing jobs that don’t require immediate responses”, with “a clear 24-hour turnaround time”, but isn’t that the opposite of what I am trying to achieve (faster response time)?

Hi @helloworld123 and welcome to the community!

No, Batch API will not help you here since as you say, it’s best effort with job execution completing within 24 hrs in the worst case.

There are no dumb questions, it is great you are trying to code your own solution in the first place, so I commend you for that! :innocent:

You’re completely right, and as @platypus pointed out, the batch API would be significantly slower.

What I meant hower was that you could “Group” for example 10 questions in 1 request and get an answer to all of them.
This could lead to less accurate results but you would get 10 answers with just 1 request, reducing the amount of requests you would have to do in the first place.

I hope this clears that point up a bit! :hugs:

1 Like

Thank you! I see, I guess that could work if I am able to have the assistant separate its answers consistently in a way that I can scrap them one by one and position them in the right cell. Given I have different themes in my questions, it might be sensible to group them by theme too. Will report back if it works.

1 Like

Thank you for the insightful answer. Unfortunately using an external RAG vector dB potentially surpasses my skills and knowledge, as well as the scope of the project, by quite a bit.

1 Like