Remove 【35†source】from Assistant Response

josh.gain · January 10, 2024, 5:53pm

Hi,

When using retrieval on Assistants my answers are returned with a source annotation, eg: 【35†source】… is it possible for the response to be received without such annotation? I’ve tried prompting it not to send it back but it doesn’t seem to work?

TIA

lemontec · January 10, 2024, 6:12pm

Same problem here… I have added in the prompt not to cite anything. Sometimes the output is good sometimes not.

_j · January 10, 2024, 6:29pm

Here is OpenAI code example for dealing with annotations, adding them as footnotes.

https://platform.openai.com/docs/assistants/how-it-works/message-annotations

If you want them gone, you can just eliminate the contents of response message within the Japanese brackets with a regex or pattern match in your code.

Here’s some quickie code for an idea.

import re

message_text = ("This is a sample text【9†source】 with source "
"tags using annotations【10†source】and some other content.")

# Use re.sub() to remove the matched source
# tags and their contents
pattern = r'【\d+†source】'
cleaned_text = re.sub(pattern, '', message_text)

print(cleaned_text)

You might need to close up extra spaces depending on where they appear if your renderer doesn’t already not show extra white space.

josh.gain · January 10, 2024, 7:23pm

Thanks for the help! Much appreciated!

aaron7k · January 24, 2024, 3:24am

Wow, seriously, thank you, it helped me, blessings

jmelanconesq · March 14, 2024, 11:00pm

For newbies, where and how do you place this code to delete the annotations?

Diet · March 14, 2024, 11:17pm

how are you using assistants? I imagine some no-code tool?

you put the text transformation anywhere after you’ve received your response

sgdsunit · June 16, 2024, 6:58am

Hi @_j ,
thanks for your answer, it worked in the first conversation, but when I continue the conversation it can’t replace it what should I do?
I’m using gradio for the UI.

logerdalvik · July 2, 2024, 8:52am

Any ideas how to do it with stream response ?

I tried in the stream : as stream:
for text in stream.text_deltas:
cleaned_text = re.sub(pattern, ‘’, text)
yield cleaned_text

And in the event handler but it doesn’t work yet

_j · July 2, 2024, 9:26am

I have an idea of how you could do it.

A FIFO buffer of about 10 chunks.
Every new chunk, run a stripper on the contents of the buffer as a single string. A Regex needs brains to tell you the slice to be chopped out.
Under normal operations, maintaining the buffer as a string, but popping out the original chunk positions by tracking the length of all additions.

Similar is needed for your own limited markdown handler if not rewriting display, such as when you receive a markdown code block start on its own line, then tracking display write status being code (although markdown really needs a spec-compliant library that can handle AI mistakes, when output can be HTML).

Longer buffer can correspond to an amount of output you can block if you are concurrently sending the total response to moderations for particularly untrusted users.

Otherwise, you can strip the tail of your display text of them if you have UI object control. Doing it up proper is to rewrite the entirety with links when finalized.

Diet · July 2, 2024, 12:22pm

Replace 【 with <span class=“source”> and 】with </source> ?

CSS: span.source {display: none}

You can do that without buffering

Alt:

Good ol’ FSM

let mode = ‘default’

on.chunk:
    for char in chunk:
        switch mode:
            case default:
                if char === 【 : mode = ‘source’; continue;
                    channel.send(char)
            case source:
                 if char ===  】: mode = ‘default’;

_j · July 2, 2024, 12:42pm

You cannot send the un-replaced contents of the annotation. It’s just nonsence. You discount that there are those that don’t like using your CSS. That’s how the web works.

It also completely overlooks that 隅付き括弧 are common Japanese typographic symbols in use, and the context of “source” inside and the numbers is needed to replace them in only specific usage.

Diet · July 2, 2024, 1:06pm

I’m just on my phone so every line of code is a pain, but you can certainly do more involved stuff, such as buffering the brace content in the FSM and translate it into something else.

With the html you could also consider using an auto incremented id instead and resolve the source as it becomes available.

The Chinese/japanese etc case might indeed be an issue. If that’s part of your target audience you might want to consider expanding the FSM to match the brace dagger pattern.

Or, as you suggest, you just buffer the output by 10-15 characters - fairly straight forward with reactiveX - but potentially increases your ttft; time to first token

Of course you could also consider to buffer dynamically: use the FSM to determine the current buffer length.

All tradeoffs I guess

——

Found a pretty clever solution using partial regex (if the language doesn’t support it)

Consider withholding output while this matches

_j · July 2, 2024, 1:33pm

All interesting ideas.

Another stream-based idea not in a cookbook is to replace them with an A HREF to a in-page index point not yet produced. Perhaps incrementing since the start of the session.

Then you can place footnotes at the end after having the full response. Bing AI Chat used to have a similar interface.

logerdalvik · July 2, 2024, 1:57pm

Thank’s guys to support this case, I’m trying your ideas but not yet success. The stream-based idea with an href to in-page index point could be great !

marcolivierbouch · July 2, 2024, 2:06pm

I would recommend doing something like this in your code

Make sure to have the ? instead of “source” in the regex because when you use different languages the word source changes.

message.content.replace(/\【.*?】/g, “”)

You can find my full code here: OpenAssistantGPT/components/chat-message.tsx at main · marcolivierbouch/OpenAssistantGPT · GitHub

ibrahimasory1996 · July 29, 2024, 12:43pm

for me the above did not work but changing it slightly to this pattern = r'【\d+:\d+†source】' cleaned_text = re.sub(pattern, '', msg.text.value) worked for me.

logerdalvik · August 2, 2024, 9:05am

Thank’s it’s working for me too !

srinathjheddu · October 15, 2024, 4:04am

Actually it gives citation if it answers from the uploaded documents

keymy00njae · December 3, 2024, 6:37am

with stream response, you can do this with customizing event handler:

class EventHandler(AssistantEventHandler):
    ...
    @override
    def on_text_delta(self, delta, snapshot):
        from openai.types.beta.threads import FileCitationDeltaAnnotation

        if delta.annotations and type(delta.annotations[0]) == FileCitationDeltaAnnotation:
            return
        print(delta.value, end="", flush=True)
    ...

Topic		Replies	Views
Prevent the assistant from citing sources API assistants-api	8	6385	March 21, 2025
What are 【N†source】in OpenAI's Assistant API response? API gpt-4 , api , assistants-api	16	10587	August 21, 2024
What to do with Generated Citations? API assistants-api	9	2276	February 25, 2025
When using the API answers show citations - How do I remove them? API api , assistants-api	3	1201	August 5, 2024
Assistant always returns annotations when retrieval is active API	4	827	December 8, 2023

Remove 【35†source】from Assistant Response

Related topics