@app.route('/gpt/chat', methods=['POST'])
def chat():
data = request.json
thread_id = data.get('thread_id')
announcement_id = data.get('announcement_id')
message = data.get('message')
if not thread_id:
return jsonify({"error": "thread_id is not able"}), 400
client.beta.threads.messages.create(thread_id=thread_id, role="user", content=message)
def generate():
stream = client.beta.threads.runs.create(thread_id=thread_id, assistant_id=assistant_id, stream=True)
complete_message = ''
for event in stream:
if event.event == 'thread.message.delta':
message_delta = event.data.delta
for part in message_delta.content:
if part.type == 'text':
complete_message += part.text.value
yield complete_message
complete_message = ''
elif event.event == 'thread.run.requires_action':
tool_call_id = event.data.required_action.submit_tool_outputs.tool_calls[0].id
output = functions.information_from_pdf_server(announcement_id)
tool_stream = client.beta.threads.runs.submit_tool_outputs(thread_id=thread_id,
run_id=event.data.id,
stream=True,
tool_outputs=[{
"tool_call_id": tool_call_id,
"output": json.dumps(output)
}])
for event in tool_stream:
if event.event == 'thread.message.delta':
message_delta = event.data.delta
for part in message_delta.content:
if part.type == 'text':
complete_message += part.text.value
yield complete_message
complete_message = ''
time.sleep(1) # Delay for better streaming
response = Response(generate(), content_type='text/event-stream')
response.headers['X-Accel-Buffering'] = 'no'
return response
-Using Model:
gpt-3.5-turbo-1106
-Code Explain:
I’m using assistant api.
When user writes a message and send, add the message and start the run.
In run process, there is two step. First, when it can send a respond without a tool call, receive a ‘delta.message’ and send it to client by SSE. Second, when it needs ‘thread.run.requires_action’, it runs ‘submit_tool_outputs’ and reply by using stream.
-Main problem:
There’s no problem with the code but the message for gpt api speed is too slow.
On average, it takes more than 20 seconds to complete the answer, and as long as a minute.
When I don’t use stream, the answer comes on 10 ~20 Sec.
But when I able the stream option, the answer speed is too slow.
Is there any option or solution to solve this problem?