Swapping Models in Assistant's API

karl.schelhammer2 · April 9, 2024, 2:11pm

I have a question. I’m using GPT4 as the primary LLM in my chat application. We’re using the Assistants API.

Once the Assistant has decided to do a tool call, but before we submit the results of the tool call, I would like to swap to a faster model like GPT3.5 to do the summarization of the retrieved data.

The reason for this is because the Assistant’s API is sort of slow, and I suspect that using a faster model might speed it up.

Does anyone know if this sort of thing is even possible, or any other tricks for speeding up Assistant’s API?

_j · April 9, 2024, 5:27pm

I don’t see how the additional AI inference would speed things up. Submitting more input context to the model does not meaningfully delay the beginning of generation. You can benchmark gpt-4 on Chat Completions endpoint and see how long it takes for the first and final token with small and large input (you can send irrelevant text and mark it irrelevant to minimize the alteration of output).

Transforming or amending the data product that you send to the AI as knowledge or tool return is certainly possible. A tool return value is just language placed in a “function” role message the AI shall need to understand as natural language the response to the query terms it sent. Just be aware that the AI with lower cognitive ability may lose some of the finer points of the language that may be important for answering.

karl.schelhammer2 · April 23, 2024, 4:24pm

I appreciate the thoughtful reply. We wound up going in a different direction anyways.

Thanks!

Topic		Replies	Views
Request: Assistant API - backward compatible with GPT3.5-turbo API	2	774	December 6, 2023
Switching from Assistants API to Chat Completion? API gpt-35-turbo , api , chat-completion , assistants-api	3	3144	March 6, 2024
Incremental Fine-Tuning and Maintaining Conversation History API fine-tuning	3	890	March 17, 2024
Hwo to use assistant API for conversational speeds API assistants-api , performance	3	362	July 28, 2024
Assistant API performance Feedback gpt-4-turbo , assistants-api	3	849	June 5, 2024

Swapping Models in Assistant's API

Related topics