Hello! I’m a bit new to this, so I’m sorry if the question is quite stupid.
I’m creating an assistant in Colab with the goal of retrieving data from user’s PDF and returning it in a JSON format. I’ve pieced together a few tutorials, but I’m getting lost in what’s supposed to be happening. Here’s my code:
from openai import OpenAI
client = OpenAI(api_key=key)
# Upload a file with an "assistants" purpose
file = client.files.create(
file=open(file_path, "rb"),
purpose='assistants'
)
# Create an assistant using the file ID
assistant = client.beta.assistants.create(
instructions=("You are an assistant in a task of retrieving table data from PDF files. "
"Important: always use the response tool to respond to the user. "
"Return the responce in JSON format. User may specify what the structure should be. "
"Never add any other text to the response."),
model="gpt-3.5-turbo-1106",
tools=[{"type": "retrieval"}],
)
thread = client.beta.threads.create()
message = client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="Extract a JSON list of people from here. Each entry must be a person and it has to include country name, role name (head of state, head of government, foreign minister), name of the person, title of the person, date of appointment for each person mentioned",
file_ids=[file.id]
)
message_content = message.content[0].text
annotations = message_content.annotations
citations = []
for index, annotation in enumerate(annotations):
# Replace the text with a footnote
message_content.value = message_content.value.replace(annotation.text, f' [{index}]')
# Gather citations based on annotation attributes
if (file_citation := getattr(annotation, 'file_citation', None)):
cited_file = client.files.retrieve(file_citation.file_id)
citations.append(f'[{index}] {file_citation.quote} from {cited_file.filename}')
elif (file_path := getattr(annotation, 'file_path', None)):
cited_file = client.files.retrieve(file_path.file_id)
citations.append(f'[{index}] Click <here> to download {cited_file.filename}')
# Note: File download functionality not implemented above for brevity
# Add footnotes to the end of the message before displaying to user
message_content.value += '\n' + '\n'.join(citations)
run = client.beta.threads.runs.create(
thread_id=thread.id,
assistant_id=assistant.id
)
result = client.beta.threads.runs.retrieve(
thread_id=thread.id,
run_id=run.id
)
print(result)
and it gets me:
Run(id='run_EEzSVXtWHEpm5j8Yb7qTWYU0', assistant_id='asst_tjew0PUAutPVgElWCzDpRjBD', cancelled_at=None, completed_at=1705567223, created_at=1705567194, expires_at=None, failed_at=None, file_ids=[], instructions='You are an assistant in a task of retrieving table data from PDF files. Important: always use the response tool to respond to the user. Return the responce in JSON format. User may specify what the structure should be. Never add any other text to the response.', last_error=None, metadata={}, model='gpt-3.5-turbo-1106', object='thread.run', required_action=None, started_at=1705567195, status='completed', thread_id='thread_Lt7vNX4o4RrnFdrdX7AApQ31', tools=[ToolAssistantToolsRetrieval(type='retrieval')])
I believe the run was succesful and yielded a result. But can’t see how to get it? And there’s also empty file_ids=, does that mean no file was yielded or that my file upload didn’t happen?
Thank you!