Optimize assistant costs that work for users of my business through WhatsApp

I am creating an open AI assistant that answers questions from my business users through WhatsApp. I am using gpt4-turbo which is supposed to have lower costs than gpt4 but I am experiencing slightly high costs (and I am only in testing) I don’t know why this is, I am managing a wizard solely by its id, the independent threads For each user I save them in Firebase and obtain them or create them if they do not exist, I have a waiting queue for a single message to arrive at the chatbot at a time. How can I reduce these costs, the “runs” must be closed, from what I understand a thread and a run close automatically, but I don’t know if doing it manually saves costs, if so I would like an example of how to close them once the chatbot has responded, I would like to optimize costs, I have the fewest possible functions for the bot to work correctly.

the backend now has a way to view threads as well - so if you enable that, you can see the each thread and the cost (tokens in and out) per thread. I find that very helpful.

Do you mean the view all threads? Where can this be found?

  1. Reduce the size of instructions
  2. Reduce the file sizes
  3. Reduce the function calling parameters
  4. Downgrade to GPT-3.5

You don’t “manually close” threads. I believe there is a misconception happening here regarding the Assistants framework.

I also operate a Whatsapp Chatbot. One thing that works for me is (if I understand you correctly)

This is a great solution. People in chat apps tend to send batches of messages like

“Hmmm…idk”
“I guess maybe this”
“Or that”

So great move. I have done the same (waiting a specified period and then grouping it all together.

Other then that, and the already mentioned, there’s not much left to do.

It would be possible for you to help me check if my code is well structured, this is the part where the questions arrive after the queue. I found another problem that if in one chat I write to him, hello this is my name “Steve”, and in another chat with a totally different thread, sometimes he knows my name in this case “Steve” which was said in another chat with another thread.


const openai = new OpenAI({
  apiKey: process.env.APIOPENAI,
});

async function getAssistantResponse(userQuestion, thread, phoneNumber) {
  global.scrapeCourseDetails = scrapeCourseDetails;
  global.getThemes = getThemes;
  global.searchCourses = searchCourses;
  global.relatedCourse = relatedCourse;
  global.priceCourse = priceCourse;
  global.flagUserAsCallRequired = flagUserAsCallRequired;
  global.flagKanbanInterest = flagKanbanInterest;
  global.flagKanbanDiscussion = flagKanbanDiscussion;
  global.flagKanbanDesicion = flagKanbanDesicion;
  global.conversionCurrency = conversionCurrency;
  global.requiresIntervention = requiresIntervention;

  await openai.beta.threads.messages.create(thread, {
    role: "user",
    content: userQuestion,
  });

  const run = await openai.beta.threads.runs.create(thread, {
    assistant_id: "id del asistente",
  });

  let runStatus = await openai.beta.threads.runs.retrieve(thread, run.id);

  console.log("ID ASISTENTE" + assistant.id);

  while (runStatus.status !== "completed") {
    await new Promise((resolve) => setTimeout(resolve, 2000));
    runStatus = await openai.beta.threads.runs.retrieve(thread, run.id);

    while (runStatus.status === "in_progress") {
      console.log("Esperando respuesta");
      await new Promise((resolve) => setTimeout(resolve, 2000));
      runStatus = await openai.beta.threads.runs.retrieve(thread, run.id);
    }

    if (runStatus.status === "requires_action") {
      console.log(
        "Funcion requerida: " +
          JSON.stringify(
            runStatus.required_action.submit_tool_outputs.tool_calls[0].function
              .name
          )
      );

      const toolCalls =
        runStatus.required_action.submit_tool_outputs.tool_calls;
      const toolOutputs = [];

      for (const toolCall of toolCalls) {
        const functionName = toolCall.function.name;

        const args = JSON.parse(toolCall.function.arguments);
        // Si la funcion es flagKanbanInterest, se le pasa el phoneNumber
        if (
          functionName === "flagKanbanInterest" ||
          "flagKanbanDiscussion" ||
          "flagUserAsCallRequired" ||
          "requiresIntervention"
        ) {
          args.phoneNumber = phoneNumber;
        }

        const output = await global[functionName].apply(null, [args]);

        toolOutputs.push({
          tool_call_id: toolCall.id,
          output: output,
        });
      }

      await openai.beta.threads.runs.submitToolOutputs(thread, run.id, {
        tool_outputs: toolOutputs,
      });
      continue;
    }
  }

  const messages = await openai.beta.threads.messages.list(thread);
  const lastMessageForRun = messages.data
    .filter(
      (message) => message.run_id === run.id && message.role === "assistant"
    )
    .pop();

  return lastMessageForRun.content[0].text.value;
}

async function bot(question, thread, phoneNumber) {
  try {
    const userQuestion = question;
    const response = await getAssistantResponse(
      userQuestion,
      thread,
      phoneNumber
    );
    return {
      response: response,
      availableBot: true,
    };
  } catch (error) {
    console.error(error);
  }
}
// prueba bot
module.exports = bot;

It would be possible for you to help me check if my code is well structured, this is the part where the questions arrive after the queue. I found another problem that if in one chat I write to him, hello this is my name “Steve”, and in another chat with a totally different thread, sometimes he knows my name in this case “Steve” which was said in another chat with another thread.


const openai = new OpenAI({
  apiKey: process.env.APIOPENAI,
});

async function getAssistantResponse(userQuestion, thread, phoneNumber) {
  global.scrapeCourseDetails = scrapeCourseDetails;
  global.getThemes = getThemes;
  global.searchCourses = searchCourses;
  global.relatedCourse = relatedCourse;
  global.priceCourse = priceCourse;
  global.flagUserAsCallRequired = flagUserAsCallRequired;
  global.flagKanbanInterest = flagKanbanInterest;
  global.flagKanbanDiscussion = flagKanbanDiscussion;
  global.flagKanbanDesicion = flagKanbanDesicion;
  global.conversionCurrency = conversionCurrency;
  global.requiresIntervention = requiresIntervention;

  await openai.beta.threads.messages.create(thread, {
    role: "user",
    content: userQuestion,
  });

  const run = await openai.beta.threads.runs.create(thread, {
    assistant_id: "id del asistente",
  });

  let runStatus = await openai.beta.threads.runs.retrieve(thread, run.id);

  console.log("ID ASISTENTE" + assistant.id);

  while (runStatus.status !== "completed") {
    await new Promise((resolve) => setTimeout(resolve, 2000));
    runStatus = await openai.beta.threads.runs.retrieve(thread, run.id);

    while (runStatus.status === "in_progress") {
      console.log("Esperando respuesta");
      await new Promise((resolve) => setTimeout(resolve, 2000));
      runStatus = await openai.beta.threads.runs.retrieve(thread, run.id);
    }

    if (runStatus.status === "requires_action") {
      console.log(
        "Funcion requerida: " +
          JSON.stringify(
            runStatus.required_action.submit_tool_outputs.tool_calls[0].function
              .name
          )
      );

      const toolCalls =
        runStatus.required_action.submit_tool_outputs.tool_calls;
      const toolOutputs = [];

      for (const toolCall of toolCalls) {
        const functionName = toolCall.function.name;

        const args = JSON.parse(toolCall.function.arguments);
        // Si la funcion es flagKanbanInterest, se le pasa el phoneNumber
        if (
          functionName === "flagKanbanInterest" ||
          "flagKanbanDiscussion" ||
          "flagUserAsCallRequired" ||
          "requiresIntervention"
        ) {
          args.phoneNumber = phoneNumber;
        }

        const output = await global[functionName].apply(null, [args]);

        toolOutputs.push({
          tool_call_id: toolCall.id,
          output: output,
        });
      }

      await openai.beta.threads.runs.submitToolOutputs(thread, run.id, {
        tool_outputs: toolOutputs,
      });
      continue;
    }
  }

  const messages = await openai.beta.threads.messages.list(thread);
  const lastMessageForRun = messages.data
    .filter(
      (message) => message.run_id === run.id && message.role === "assistant"
    )
    .pop();

  return lastMessageForRun.content[0].text.value;
}

async function bot(question, thread, phoneNumber) {
  try {
    const userQuestion = question;
    const response = await getAssistantResponse(
      userQuestion,
      thread,
      phoneNumber
    );
    return {
      response: response,
      availableBot: true,
    };
  } catch (error) {
    console.error(error);
  }
}
// prueba bot
module.exports = bot;

What are you using to host this?

A huge red flag for me is the global variables. Depending on what host you’re using this could leave behind artifacts and cause a huge pile-up of strange issues.

Ideally you want your function to not have side-effects caused by other functions/variables.

Based on your code it doesn’t really seem to be the issue but considering that I cannot see how question is created I just figured that’s where I would start.

For troubleshooting you can go to the playground and view the thread to see where the Assistant learned the name “Steve”.

https://platform.openai.com/playground?assistant={assistant_id}&mode=assistant&thread={thread_id}

2 Likes

go to settings → Organization
under Threads make sure to pick the right option.

Then they will show in the main side bar where Assistants are also shown!

How do create your threads? That would be my guess where this ‘knowledge’ leak happens, using the same thread twice. Once you start looking in the threads in the backend you will solve that problem easily I think.

Oh and your function call check should be

if (functionName === “flagKanbanInterest” || functionName === “flagKanbanDiscussion” || functionName === “flagUserAsCallRequired” || functionName === “requiresIntervention”)

(But you could also drop the whole check as well and simply always inject the phone number :)

Thanks for sharing.

Although it says that it’s available for API I cannot see any documentation on it.

What code did you use?

This is in platform.openai.com where you have the Assistants as well. Once you go to Settings and enable threads visibility you can see them from the main sidebar menu

1 Like

Yeah I see that I just was wondering if you knew how to call it via API. Thanks for showing though. Did not know

I don;t think there is a ‘list threads’ options at the moment. You can only retrieve a know thread. BUT - now that you can do it in the backend I’m sure it won’t be long before it will show up in the API?