When using retrieval on Assistants my answers are returned with a source annotation, eg: 【35†source】… is it possible for the response to be received without such annotation? I’ve tried prompting it not to send it back but it doesn’t seem to work?
If you want them gone, you can just eliminate the contents of response message within the Japanese brackets with a regex or pattern match in your code.
Here’s some quickie code for an idea.
import re
message_text = ("This is a sample text【9†source】 with source "
"tags using annotations【10†source】and some other content.")
# Use re.sub() to remove the matched source
# tags and their contents
pattern = r'【\d+†source】'
cleaned_text = re.sub(pattern, '', message_text)
print(cleaned_text)
You might need to close up extra spaces depending on where they appear if your renderer doesn’t already not show extra white space.
Hi @_j ,
thanks for your answer, it worked in the first conversation, but when I continue the conversation it can’t replace it what should I do?
I’m using gradio for the UI.
A FIFO buffer of about 10 chunks.
Every new chunk, run a stripper on the contents of the buffer as a single string. A Regex needs brains to tell you the slice to be chopped out.
Under normal operations, maintaining the buffer as a string, but popping out the original chunk positions by tracking the length of all additions.
Similar is needed for your own limited markdown handler if not rewriting display, such as when you receive a markdown code block start on its own line, then tracking display write status being code (although markdown really needs a spec-compliant library that can handle AI mistakes, when output can be HTML).
Longer buffer can correspond to an amount of output you can block if you are concurrently sending the total response to moderations for particularly untrusted users.
Otherwise, you can strip the tail of your display text of them if you have UI object control. Doing it up proper is to rewrite the entirety with links when finalized.
Replace 【 with <span class=“source”> and 】with </source> ?
CSS: span.source {display: none}
You can do that without buffering
Alt:
Good ol’ FSM
let mode = ‘default’
on.chunk:
for char in chunk:
switch mode:
case default:
if char === 【 : mode = ‘source’; continue;
channel.send(char)
case source:
if char === 】: mode = ‘default’;
You cannot send the un-replaced contents of the annotation. It’s just nonsence. You discount that there are those that don’t like using your CSS. That’s how the web works.
It also completely overlooks that 隅付き括弧 are common Japanese typographic symbols in use, and the context of “source” inside and the numbers is needed to replace them in only specific usage.
I’m just on my phone so every line of code is a pain, but you can certainly do more involved stuff, such as buffering the brace content in the FSM and translate it into something else.
With the html you could also consider using an auto incremented id instead and resolve the source as it becomes available.
The Chinese/japanese etc case might indeed be an issue. If that’s part of your target audience you might want to consider expanding the FSM to match the brace dagger pattern.
Or, as you suggest, you just buffer the output by 10-15 characters - fairly straight forward with reactiveX - but potentially increases your ttft; time to first token
Of course you could also consider to buffer dynamically: use the FSM to determine the current buffer length.
All tradeoffs I guess
——
Found a pretty clever solution using partial regex (if the language doesn’t support it)
Another stream-based idea not in a cookbook is to replace them with an A HREF to a in-page index point not yet produced. Perhaps incrementing since the start of the session.
Then you can place footnotes at the end after having the full response. Bing AI Chat used to have a similar interface.
Thank’s guys to support this case, I’m trying your ideas but not yet success. The stream-based idea with an href to in-page index point could be great !
for me the above did not work but changing it slightly to this pattern = r'【\d+:\d+†source】' cleaned_text = re.sub(pattern, '', msg.text.value) worked for me.