Modifying Mid-Response Generated Output in GPT-3.5 / GPT-4

max11rsl · May 17, 2023, 2:34pm

I am currently facing a challenge with GPT-3.5 and GPT-4 models. As part of my use-case, I need to modify a part of the response that the model is generating while it is in the process of generating it. The goal is to have the model continue generating the remainder of its response, taking into account the modifications I’ve applied.

However, I am encountering an issue. When I intervene in the middle of the response generation, make a change to the last generated token for example, and then request the model to continue, it often restarts the sentence from the beginning, ignoring the change I made. Here’s an example to illustrate the issue:

Initial Interaction:

User: “Hi, What’s 3^1.4?”
GPT-4 (temperature=0): “3 raised to the power of 1.4 is”

At this point, I stop the model, and modify the last token "is " with "should ". I then call the model to continue its response.

Continued Interaction:

GPT-4: “3 raised to the power of 1.4 is approximately …”

As you can see, instead of continuing from where it was left off, the model begins the sentence again. While there are instances when the model does continue from where it was interrupted, it seems to be inconsistent. My hypothesis is that the model is more likely to restart the sentence when the token I’ve modified or inserted is not a likely token to occur in the given context (which would have caused bad performance anyway and OpenAI tuned it that way?).

My question is: Is there a way to compel the model to continue its response after the modifications have been made, regardless of the likelihood of injected tokens? I tried to play with frequency and presence penalties and while it improves the situation in some cases, even at their maximum values, they don’t seem to work reliably so I’m probably missing a point on how the output gets generated. Any insights or suggestions would be greatly appreciated.

One of the use cases is when you want to inject an output of an API call to the model’s output while making the user experience as realtime as possible (and you want the model to also be aware of the injected data). For example:

GPT-4: "The weather today is {{ get_weather() }} " → “24 degrees…” - the {{ get_weather() }} needs to get modified into “24” in realtime as the model will continue using that data in its “same” response).

Thank you for your assistance in advance.

sps · May 17, 2023, 3:04pm

Hi @max11rsl

Welcome to the community.

what code are you doing this with?

max11rsl · May 17, 2023, 3:09pm

I tried doing it both in the playground and in python. Ultimately I’m doing the parsing through python.

firtina · May 17, 2023, 3:12pm

You’ll need to combine it with an instruction.

“Continue this sentence, without repeating it”, and so on.

I have, in my own code, this, to try to “continue” after a length has been reached, for another reason:

{role: 'assistant', content: 'content from before'},
{role: 'assistant', content: `Oops! Looks like my last message was cut off.

I will continue from where I left off in my response after my last line as if nothing happened, ensuring I will not repeat anything, here is the rest of my response:`

max11rsl · May 17, 2023, 3:21pm

Works quite well so far ! Many thanks @firtina.

bruce.dambrosio · May 17, 2023, 4:04pm

Have you seen the recently announced microsoft ‘Guidance’? Claims to do this and much more. Works with both GPT and HG Transformers. Uses the Azure API for gpt, but since they have released the source, I imagine you could modify to use OpenAI api.
(Available on github)

firtina · May 17, 2023, 4:18pm

That sounds quite interesting @bruce.dambrosio , can you please share a link to the repo?

bruce.dambrosio · May 17, 2023, 4:23pm

max11rsl · May 17, 2023, 6:52pm

Wow ! I’ve been reading this for the past 2 hours. There is quite a depth to it and I’m glad you shared this as I was about to get started with pretty much a simpler version of the same project. This is super interesting.

bruce.dambrosio · May 17, 2023, 6:58pm

One thing I found frustrating so far (I haven’t tried converting the gpt interface yet) is that when trying to test it with a the Transformer option, it seems to take forever to load the model (even after it has been downloaded and cached). It gets there eventually, but wow. I’m planning to next try converting the gpt api to use openAI instead of Azure, and/or converting the Transformer interface to use a fastchat server I have running.
I’ll post progress here.

max11rsl · May 17, 2023, 7:17pm

Sounds good, I’ll be exploring the functionality as well in the meantime. Curious if it’s in any way possible to achieve guidance acceleration type of caching behaviour from OpenAI api because with GPT4, the speed optimizations to me are extremely important.

bruce.dambrosio · May 17, 2023, 7:37pm

my error, it looks like openai is directly supported, with this caveat

“When calling OpenAI chat models you must generate only directly inside the assistant role! The OpenAI API does not currently support partial assistant prompting.”

Topic		Replies	Views
Custom chatbot says that it's developed by OpenAI API gpt-4	33	1984	April 2, 2024
MS Guidance - real-time interactive prompts. Anyone else exploring this? Prompting api	3	2457	May 23, 2023
How to clip "bubble wrap" from the end of responses? Prompting	18	1318	March 22, 2023
How to get responses without the added "chat" when converting from davinci-003 to ChatGPT API gpt-3.5-turbo API	10	2726	March 6, 2023
What to do when fine-tuning is not working? API	21	7868	December 24, 2023

Modifying Mid-Response Generated Output in GPT-3.5 / GPT-4

Related topics