How can I speed up response using stream on python

yoochul69 · April 16, 2024, 7:57am

@app.route('/gpt/chat', methods=['POST'])
def chat():
    data = request.json
    thread_id = data.get('thread_id')
    announcement_id = data.get('announcement_id')
    message = data.get('message')

    if not thread_id:
        return jsonify({"error": "thread_id is not able"}), 400

    client.beta.threads.messages.create(thread_id=thread_id, role="user", content=message)

    def generate():
        stream = client.beta.threads.runs.create(thread_id=thread_id, assistant_id=assistant_id, stream=True)
        complete_message = ''
        
        for event in stream:
            if event.event == 'thread.message.delta':
                message_delta = event.data.delta
                for part in message_delta.content:
                    if part.type == 'text':
                        complete_message += part.text.value
                yield complete_message 
                complete_message = ''
            elif event.event == 'thread.run.requires_action':
                tool_call_id = event.data.required_action.submit_tool_outputs.tool_calls[0].id
                output = functions.information_from_pdf_server(announcement_id)
                tool_stream = client.beta.threads.runs.submit_tool_outputs(thread_id=thread_id,
                                                                            run_id=event.data.id,
                                                                            stream=True,
                                                                            tool_outputs=[{
                                                                                "tool_call_id": tool_call_id,
                                                                                "output": json.dumps(output)
                                                                            }])
                for event in tool_stream:
                    if event.event == 'thread.message.delta':
                        message_delta = event.data.delta
                        for part in message_delta.content:
                            if part.type == 'text':
                                complete_message += part.text.value
                        yield complete_message
                        complete_message = ''
            time.sleep(1)  # Delay for better streaming

    response = Response(generate(), content_type='text/event-stream')
    response.headers['X-Accel-Buffering'] = 'no'
    return response

-Using Model:
gpt-3.5-turbo-1106

-Code Explain:
I’m using assistant api.
When user writes a message and send, add the message and start the run.
In run process, there is two step. First, when it can send a respond without a tool call, receive a ‘delta.message’ and send it to client by SSE. Second, when it needs ‘thread.run.requires_action’, it runs ‘submit_tool_outputs’ and reply by using stream.

-Main problem:
There’s no problem with the code but the message for gpt api speed is too slow.
On average, it takes more than 20 seconds to complete the answer, and as long as a minute.

When I don’t use stream, the answer comes on 10 ~20 Sec.
But when I able the stream option, the answer speed is too slow.

Is there any option or solution to solve this problem?

Topic		Replies	Views
ChatGPT API Very Slow at generating Responses API gpt-4 , api	8	4588	December 25, 2023
Chatgpt-3.5 turbo model takes long time to respond. Is there any way to speed this up? API gpt-35-turbo , api-speed	7	6371	December 19, 2023
Completion vs. chat performance API api-speed	3	3085	December 24, 2023
Improve response time of GPT API gpt-4	1	907	December 30, 2023
Discrepancy in Response Speed between GPT-3.5-turbo API and ChatGPT UI API gpt-35-turbo , chatgpt , api	4	2887	December 24, 2023

How can I speed up response using stream on python

Related Topics