Assistants API: replacing model generated substrings with annotations (No/lowcode Implementation)

Taci · January 10, 2024, 6:20am

(This question is within the context of Zapier/nocode tools. I’ve also asked on those forums without success.)

ChatGPT with retrieval turned on often returns annotations that are illegible, e.g. 【13†source】.

According to Open AI’s docs, there is a way to convert them to legible text in the annotations. But to see them, there’s a python code snippet they recommend:

Retrieve the message object

message = client.beta.threads.messages.retrieve(
thread_id=“…”,
message_id=“…”
)

Extract the message content

message_content = message.content[0].text
annotations = message_content.annotations
citations =

Iterate over the annotations and add footnotes

for index, annotation in enumerate(annotations):
# Replace the text with a footnote
message_content.value = message_content.value.replace(annotation.text, f’ [{index}]')

# Gather citations based on annotation attributes
if (file_citation := getattr(annotation, 'file_citation', None)):
    cited_file = client.files.retrieve(file_citation.file_id)
    citations.append(f'[{index}] {file_citation.quote} from {cited_file.filename}')
elif (file_path := getattr(annotation, 'file_path', None)):
    cited_file = client.files.retrieve(file_path.file_id)
    citations.append(f'[{index}] Click <here> to download {cited_file.filename}')
    # Note: File download functionality not implemented above for brevity

Add footnotes to the end of the message before displaying to user

message_content.value += ‘\n’ + ‘\n’.join(citations)

Now, I’m trying to implement this primarily in Zapier, and I’m not a coder (though ChatGPT helps me with any coding work I might come across). My question is: is there a way to use the above information (taken from the documentation here) to convert the illegible annotations to readable text annotations in our responses and do this within Zapier or other no/lowcode interfaces?

suspicious_cow · June 26, 2024, 6:18am

Hey @Taci

I’ve recently been struggling with the same issue. Here is the best I have been able to get out of the annotations so far:
Annotation 1:
End Index: 269
Start Index: 254
Text: 【12:14†Dracula】
Type: file_citation
File Citation:
File ID: file-zwjNvQHztLT7cp4271lplM9L

Annotation 2:
End Index: 378
Start Index: 363
Text: 【12:14†Dracula】
Type: file_citation
File Citation:
File ID: file-zwjNvQHztLT7cp4271lplM9L

To accomplish this, I used this code:

# Extract the message content and annotations
message_text_object = message.content[0]
message_text_content = message_text_object.text.value  # Access the value attribute for the actual text
annotations = message_text_object.text.annotations  # Access annotations directly

# Print the annotations in a cleaner format
for index, annotation in enumerate(annotations):
    print(f"Annotation {index + 1}:")
    print(f"  End Index: {annotation.end_index}")
    print(f"  Start Index: {annotation.start_index}")
    print(f"  Text: {annotation.text}")
    print(f"  Type: {annotation.type}")
    if hasattr(annotation, 'file_citation'):
        file_citation = annotation.file_citation
        print(f"  File Citation:")
        print(f"    File ID: {file_citation.file_id}")
    print("")  # Add a blank line for readability

For formatting the annotations with output, I used this code:

# Retrieve the message object (replace this part with your actual message retrieval code)
message = client.beta.threads.messages.retrieve(
    thread_id=thread.id,
    message_id=client.beta.threads.messages.list(thread_id=thread.id, order="desc").data[0].id
)

# Extract the message content and annotations
message_text_object = message.content[0]
message_text_content = message_text_object.text.value  # Access the value attribute for the actual text
annotations = message_text_object.text.annotations  # Access annotations directly

# Create a list to store annotations with a dictionary for citation replacement
annotated_citations = []
citation_replacements = {}

# Iterate over the annotations, retrieve file names, and store the details
for index, annotation in enumerate(annotations):
    annotation_number = index + 1

    # Retrieve the file name using the file ID
    file_info = client.files.retrieve(annotation.file_citation.file_id)
    file_name = file_info.filename

    annotation_details = {
        "number": annotation_number,
        "text": f"[{annotation_number}]",
        "file_name": file_name,
        "start_index": annotation.start_index,
        "end_index": annotation.end_index,
    }
    annotated_citations.append(annotation_details)
    citation_replacements[annotation.text] = f"[{annotation_number}]"

# Replace the inline citations in the message text with numbered identifiers
for original_text, replacement_text in citation_replacements.items():
    message_text_content = message_text_content.replace(original_text, replacement_text)

# Print the message text with the annotations including file name and character positions
print("Message Text with Annotations:")
print(message_text_content)
print("\nAnnotations:")
for annotation in annotated_citations:
    print(f"Annotation {annotation['number']}:")
    print(f"  File Name: {annotation['file_name']}")
    print(f"  Character Positions: {annotation['start_index']} - {annotation['end_index']}")
    print("")  # Add a blank line for readability

And get this result:
Message Text with Annotations:
Based on the search results, here are the main characters in “Dracula” along with the locations where they are first introduced:

Jonathan Harker: Introduced in the first chapter as he travels to Transylvania to meet Count Dracula.
- Citation: [2]
Count Dracula: Introduced when Jonathan Harker arrives at his castle.
- Citation: [2]
Mina Murray (later Mina Harker): Introduced through a letter concerning Jonathan’s condition.
- Citation: [4]
Lucy Westenra: Mentioned early in the text when discussing letters and diary entries.
- Citation: [4]
Abraham Van Helsing: He is first introduced through discussions and letters as a scholar and doctor who is called upon to help Lucy.
- Citation: [5]
John Seward: Introduced as he describes his interactions with Renfield.
- Citation: [6]
Arthur Holmwood: Introduced early in the novel as a suitor of Lucy Westenra.
- Citation: [7]
Quincey Morris: Another of Lucy Westenra’s suitors, introduced early in the story.
- Citation: [8]
Renfield: Introduced through Dr. John Seward’s diary entries as a patient in his insane asylum.
- Citation: [9]

These citations come from various parts of the text, reflecting the locations where these characters are first mentioned or introduced.

Annotations:
Annotation 1:
File Name: Dracula.pdf
Character Positions: 254 - 269

Annotation 2:
File Name: Dracula.pdf
Character Positions: 363 - 378

Annotation 3:
File Name: Dracula.pdf
Character Positions: 496 - 511

Annotation 4:
File Name: Dracula.pdf
Character Positions: 621 - 636

Annotation 5:
File Name: Dracula.pdf
Character Positions: 793 - 808

Annotation 6:
File Name: Dracula.pdf
Character Positions: 904 - 918

Annotation 7:
File Name: Dracula.pdf
Character Positions: 1019 - 1033

Annotation 8:
File Name: Dracula.pdf
Character Positions: 1140 - 1154

Annotation 9:
File Name: Dracula.pdf
Character Positions: 1274 - 1289

Hope this helps.

Topic		Replies	Views
Mapping assistants API annotations back to the location in the source file API assistants , assistants-api	5	2978	September 20, 2024
What to do with Generated Citations? API assistants-api	9	2145	February 25, 2025
Streaming markdown text and images from assistant using code interpreter API	11	2147	January 30, 2025
Remove 【35†source】from Assistant Response API	19	8368	December 3, 2024
Overcoming AI Response Issues: Unwanted Codes in Text -【59†source】 Community gpt-4 , assistants	4	1443	February 6, 2024

Assistants API: replacing model generated substrings with annotations (No/lowcode Implementation)

Retrieve the message object

Extract the message content

Iterate over the annotations and add footnotes

Add footnotes to the end of the message before displaying to user

Related topics