N chat completion choices with follow-up replies

leventov · August 17, 2023, 8:21am

I want to get the following chats:

user: [large first prompt]
assistant: [first response]
user: [second short follow-up prompt, static, doesn’t depend on the contents of the first response]
assistant: [second response]

I also want to have N choices for the first response, but, let’s say, only a single choice (N=1) for the second response.

Currently, there is no way to achieve this without re-processing the large first prompt, which is a waste.

Foxalabs · August 17, 2023, 8:31am

The way to think of this is that each API call is stateless, OpenAI do not keep track of your particular API calls in any meaningful way computationally. Large language models need to be told all of history each time they are called.

leventov · August 17, 2023, 8:54am

I understand of course that this is not implemented in the current API, I just point out that this would be a nice feature to have. Also, to implement this feature, the API doesn’t need to become stateful, it just needs to add a way to send more complex requests, where I specify the follow-up prompt beforehand.

anon22939549 · August 17, 2023, 4:58pm

You do not appear to know how the models work.

What you are describing is not possible with the current architecture.

The models process the entire context all at once, that is the only way they work.

leventov · August 17, 2023, 8:13pm

@anon22939549 of course this is possible, see e.g. github com/guidance-ai/guidance#guidance-acceleration-notebook

anon22939549 · August 17, 2023, 9:25pm

First, thank you for linking that project. I looks like an interesting project so I’m reposting it here as a clickable link for others,

Having said that, while there are some interesting things in there, I’m not sure it does exactly what you are proposing.

But the project supports the OpenAI models, so if you are convinced it does work the way you think, I would encourage you to experiment with it and report back with examples showing equivalent results with fewer tokens used.

leventov · August 18, 2023, 4:23am

@anon22939549 sorry for repeated low-context responses, which have led to misunderstanding. I could have done better.

This is not implemented in OpenAI API at th moment. The whole point of my post on this Forum board is to suggest that it would be a nice thing for OpenAI to implement.

I meant that Transformer architecture doesn’t preclude such interleaved prompts and LLM generation. Guidance acceleration currently supports only open-source models because, evidently, no cloud API provider (neither OpenAI nor Anthropic) implements the necessary API. This doesn’t require making API itself stateful, it would stay stateless, just become more complex.

Guidance developers explicitly say that API developers should do this in this comment: github com/guidance-ai/guidance/issues/115#issuecomment-1563378295

Topic		Replies	Views
Efficient stateful completion chatbot API	10	5301	July 9, 2024
We need to send multiple prompts in one request for the chat endpoint API	4	4137	December 17, 2023
A conversation using the API API	6	2939	December 16, 2023
Feature request: token injection during streaming for structured output generation API	4	1357	May 17, 2023
Hi there I'm still exploring the basics and I have a problem with implementing a open ended conversation with GPT-3 in the python console API	4	1327	December 19, 2023

N chat completion choices with follow-up replies

Related topics