Hello everyone,
Previously, I asked on this forum about how to write code using the OpenAI Assistant API for batch summarization and integration. Thanks to **David Smith’**s advice, I obtained a basic script template.
After reviewing some additional tutorials and making modifications, I still have a few issues.
-
Since the code already saves the assistant_id, subsequent uses can leverage the template available on OpenAI Assistants, making it convenient to modify and update.
-
However, I’m unclear about the purpose of saving the thread_id. During the first run of the program, my intention is to ensure that all inputs are seen as continuous context by the assistant. But when I use the same thread_id during subsequent runs, does this mean the contents of my previous messages are remembered and saved by OpenAI, though I’m not sure where they are stored?
-
I later split the interview content into a list of segments for batch input into the assistant. I am not sure if my backend code is correct due to the loop structure, which could potentially lead to multiple processing of the same content or even infinite loops, resulting in a spike in charges.
I hope everyone can offer me some advice, or if my code is fortunate enough, it can provide a foundation for those who haven’t used it yet.
Task: My workflow is planned as follows: I start by entering a system prompt that explains the task to the assistant and outlines the content I will input subsequently. Then, I provide the assistant with pre-segmented text fragments from the interview for focused summarization.
---------------my code-----------------------------
system_prompt = “”"
Due to the large size of the original interview recordings, which cannot be processed all at once, I will gradually provide segmented content, each segment of the conversation marked with ‘Q:’ for questions from the host and ‘A:’ for answers from the guest.
Based on these interview segments, please objectively write a narrative summary paragraph. Key points to consider when writing the summary include:
“”"
API connection
with open(‘key.txt’, ‘r’) as f:
OPENAI_API_KEY = f.readline().strip()
client = OpenAI(api_key=OPENAI_API_KEY)
Load or create an assistant
try:
with open(‘assistant_id.txt’, ‘r’) as f:
assistant_id = f.readline().strip()
print(“Using existing assistant ID:”, assistant_id)
except FileNotFoundError:
assistant = client.beta.assistants.create(
name=“Conversation Analysis”,
instructions=“You are a professional assistant whose task is to summarize patient interview records.”,
model=“gpt-3.5-turbo-0125”, # or gpt-4-turbo-preview
)
assistant_id = assistant.id
with open(‘assistant_id.txt’, ‘w’) as f:
f.write(assistant_id)
print(“Created new assistant ID:”, assistant_id)
# Load or create a conversation thread
If I use it next time, will the previous time content still be there?
try:
with open(‘thread_id.txt’, ‘r’) as f:
thread_id = f.readline().strip()
print(“Using existing thread ID:”, thread_id)
except FileNotFoundError:
thread = client.beta.threads.create()
thread_id = thread.id
with open(‘thread_id.txt’, ‘w’) as f:
f.write(thread_id)
print(“Created new thread ID:”, thread_id)
# Initially send a system prompt to the conversation thread
client.beta.threads.messages.create(
thread_id=thread_id,
role=“user”,
content=system_prompt
)
# Loop through each text segment
countm = 0
for segment in segments:
countm += 1
Send message to the conversation thread
client.beta.threads.messages.create(
thread_id=thread_id,
role=“user”,
content=segment # Send current paragraph text
)
# Run assistant on current segment
run = client.beta.threads.runs.create(
thread_id=thread_id,
assistant_id=assistant_id
)
# Check the run result
From this point onwards, I’m not sure if what I’ve written is correct. I’ve tested printing it out, and it does produce something, but the format of the content is complex and variable. I’m not sure if it’s all in the same context, or if my system_prompt needs to be entered every time with each segmented file for the AI to remember it, or if I should use gpt-4 (which gpt-4 model is the best right now? There are many options in the assistant menu).
while True:
run = client.beta.threads.runs.retrieve(
thread_id=thread_id,
run_id=run.id
)
if run.status == “completed”:
print(f"Run completed. Segment {countm}")
break
elif run.status == “failed”:
print(“Run failed with error:”, run.last_error)
break
time.sleep(2)
messages = client.beta.threads.messages.list(
thread_id=thread_id
)
message = messages.data[0].content[0].text.value
print(message)