Hey @Taci
I’ve recently been struggling with the same issue. Here is the best I have been able to get out of the annotations so far:
Annotation 1:
End Index: 269
Start Index: 254
Text: 【12:14†Dracula】
Type: file_citation
File Citation:
File ID: file-zwjNvQHztLT7cp4271lplM9L
Annotation 2:
End Index: 378
Start Index: 363
Text: 【12:14†Dracula】
Type: file_citation
File Citation:
File ID: file-zwjNvQHztLT7cp4271lplM9L
To accomplish this, I used this code:
# Extract the message content and annotations
message_text_object = message.content[0]
message_text_content = message_text_object.text.value # Access the value attribute for the actual text
annotations = message_text_object.text.annotations # Access annotations directly
# Print the annotations in a cleaner format
for index, annotation in enumerate(annotations):
print(f"Annotation {index + 1}:")
print(f" End Index: {annotation.end_index}")
print(f" Start Index: {annotation.start_index}")
print(f" Text: {annotation.text}")
print(f" Type: {annotation.type}")
if hasattr(annotation, 'file_citation'):
file_citation = annotation.file_citation
print(f" File Citation:")
print(f" File ID: {file_citation.file_id}")
print("") # Add a blank line for readability
For formatting the annotations with output, I used this code:
# Retrieve the message object (replace this part with your actual message retrieval code)
message = client.beta.threads.messages.retrieve(
thread_id=thread.id,
message_id=client.beta.threads.messages.list(thread_id=thread.id, order="desc").data[0].id
)
# Extract the message content and annotations
message_text_object = message.content[0]
message_text_content = message_text_object.text.value # Access the value attribute for the actual text
annotations = message_text_object.text.annotations # Access annotations directly
# Create a list to store annotations with a dictionary for citation replacement
annotated_citations = []
citation_replacements = {}
# Iterate over the annotations, retrieve file names, and store the details
for index, annotation in enumerate(annotations):
annotation_number = index + 1
# Retrieve the file name using the file ID
file_info = client.files.retrieve(annotation.file_citation.file_id)
file_name = file_info.filename
annotation_details = {
"number": annotation_number,
"text": f"[{annotation_number}]",
"file_name": file_name,
"start_index": annotation.start_index,
"end_index": annotation.end_index,
}
annotated_citations.append(annotation_details)
citation_replacements[annotation.text] = f"[{annotation_number}]"
# Replace the inline citations in the message text with numbered identifiers
for original_text, replacement_text in citation_replacements.items():
message_text_content = message_text_content.replace(original_text, replacement_text)
# Print the message text with the annotations including file name and character positions
print("Message Text with Annotations:")
print(message_text_content)
print("\nAnnotations:")
for annotation in annotated_citations:
print(f"Annotation {annotation['number']}:")
print(f" File Name: {annotation['file_name']}")
print(f" Character Positions: {annotation['start_index']} - {annotation['end_index']}")
print("") # Add a blank line for readability
And get this result:
Message Text with Annotations:
Based on the search results, here are the main characters in “Dracula” along with the locations where they are first introduced:
-
Jonathan Harker: Introduced in the first chapter as he travels to Transylvania to meet Count Dracula.
-
Count Dracula: Introduced when Jonathan Harker arrives at his castle.
-
Mina Murray (later Mina Harker): Introduced through a letter concerning Jonathan’s condition.
-
Lucy Westenra: Mentioned early in the text when discussing letters and diary entries.
-
Abraham Van Helsing: He is first introduced through discussions and letters as a scholar and doctor who is called upon to help Lucy.
-
John Seward: Introduced as he describes his interactions with Renfield.
-
Arthur Holmwood: Introduced early in the novel as a suitor of Lucy Westenra.
-
Quincey Morris: Another of Lucy Westenra’s suitors, introduced early in the story.
-
Renfield: Introduced through Dr. John Seward’s diary entries as a patient in his insane asylum.
These citations come from various parts of the text, reflecting the locations where these characters are first mentioned or introduced.
Annotations:
Annotation 1:
File Name: Dracula.pdf
Character Positions: 254 - 269
Annotation 2:
File Name: Dracula.pdf
Character Positions: 363 - 378
Annotation 3:
File Name: Dracula.pdf
Character Positions: 496 - 511
Annotation 4:
File Name: Dracula.pdf
Character Positions: 621 - 636
Annotation 5:
File Name: Dracula.pdf
Character Positions: 793 - 808
Annotation 6:
File Name: Dracula.pdf
Character Positions: 904 - 918
Annotation 7:
File Name: Dracula.pdf
Character Positions: 1019 - 1033
Annotation 8:
File Name: Dracula.pdf
Character Positions: 1140 - 1154
Annotation 9:
File Name: Dracula.pdf
Character Positions: 1274 - 1289
Hope this helps.