Handling Multiple Responses from the API in different ways

mark_humphries · December 2, 2023, 3:36am

I want to generate two responses to the same prompt using the n hyperparameter. But I want to stop the second response early using the stop hyperparameter. This is part of a quality check procedure. Is there a way to stop one response early and not the other? I assume not but am hopeful about the token savings…

Foxalabs · December 2, 2023, 3:52am

If you use the stream=true parameter you can close the connection at any time and thus stopping the generation of the second response.

https://platform.openai.com/docs/api-reference/streaming

mark_humphries · December 2, 2023, 4:06am

Good idea! Thanks. Now these are parallel calls being used to populate a data frame automatically by extracting text. So I could stream response 1 and then check each chunk in response 2 as it comes in for my stop phrase? This is what I am assuming anyway. How would I end the stream?

Foxalabs · December 2, 2023, 4:34am

You simply close the connection, via a .close method or simply disconnecting the socket if using some non API solution.

_j · December 2, 2023, 5:23am

First some clarification so we are on the same page:

hyperparameters refers to fine-tune learning settings when retraining

I know it’s fun to say, but the API just uses parameters, or even json key:value pairs when you get down to what is sent.

With runtime API calls, the stop parameter can be set to any string sequence where you want to terminate the AI generation. For example, if you want to only get one line or paragraph, you could use"\n" as a string, and the AI generation will be stopped at that point.

The stop parameters are not tokens, but strings, so that makes it easier to specify, and the string you specify will not be sent to you at the end of output.

So in the AI language, you’d have to figure out what in the response is going to be a repeatable part of the generation you can identify.

However, there is no setting a “stop” to work on only one of n>1 generations.

So it sounds like monitoring the chunks is a good idea, however then the content you receive will mostly be in “token” form, so for identifying longer sequences, you’ll need to be building the response and then scan over the end to see if it was output (and there may be more after the end of your string so you can’t just check only the end).

(Plus, note, the point of the n is the different random sampling of tokens you can get. Set top_p=0.0001 and also set a seed if you want them the same, or ensure near default temperature if variety is what your are after.)

mark_humphries · December 2, 2023, 9:20pm

Thanks for all this. But how do I differentiate the two streams? Here is my code for which works for early stopping when n=1. But I am unclear about is how do I separate out the chunks from n1 and n2 in order to handle them differently. Is this even possible?

    messages = [
        {"role": "system", "content": sys_prompt},
        {"role": "user", "content": user_prompt}
    ]
    
    response = ""
    
    stream = openai.ChatCompletion.create(
            model=engine_pass,
            temperature=0.3,
            messages=messages,
            max_tokens=1500,
            n=1,
            stream=True
        )

    try:
        for chunk in stream:       
            content = chunk["choices"][0].get("delta", {}).get("content", "")    
            response += content

            if "Insert_Stop_Phrase_Here" in response:
                print("Specific phrase found, stopping stream.")
                break
    except Exception as e:
        print(f"An error occurred while processing row {i}: {e}")
    finally:
        stream.close()

    return response, sys_prompt

mark_humphries · December 2, 2023, 9:37pm

As a thought…it looks like the chunks come back sequentially: n1, n2, n1, n2…How reliable is this pattern? Can I use it to assign alternating chunks to response1 and response2, monitoring response2 for my stop-words but not response 1?

Topic		Replies	Views
Interrupting completion stream in Python API	12	13080	December 14, 2023
Implementing a 'Stop Generating' Function for OpenAI Streams API api	0	1565	March 29, 2024
ChatGPT's "Stop Generating" function - how to implement? API	12	14278	December 14, 2023
Restarting partially completed chat completion API calls API	12	1504	February 23, 2024
ChatGPT Stops Mid-Sentence: Work Around? API	6	3506	November 17, 2023

Handling Multiple Responses from the API in different ways

Related topics