Hi, weird citation artifacts still appear when using gpt 5.4 mini with file search.
![]()
Here you fixed a similar issue for 5.5:
Hi, weird citation artifacts still appear when using gpt 5.4 mini with file search.
![]()
Here you fixed a similar issue for 5.5:
Thanks for flagging, @dragonek
I’ve raised this.
Can you supply a snippet of the calling code please.
This is a model generation behavior, not an API code symptom.
The AI is writing its citation wrong, repeating internally in the container that the API backend would parse out - instead of closing it.
It seems this could be solved once-and-for-all by a grammar, where: after the citation sequence up to the middle separator, there is only the logit option of writing “turn”, then one of 1000 number logits (or less number token options if you bring the application level into it), then “file”, then another number then finally dictates the end token to create an internal annotation and make error nearly impossible.
This happens too with gpt 5.5 . With the upcoming deprecation, we are thinking seriously on switching providers, as the RAG answer will be plaged by nonsense and the annotations empty doesnt allow me to fill the identification thing.
It is also apparent to me that this OpenAI fault in the model not closing the container after a single “file{number}” with \ue201, but instead restarting with \ue202 and going into repeats, could be fixed in parsing.
A valid citation (Python code point escaping):
\ue200filecite\ue202turn2file13\ue201
Bad model behavior, a malformed citation not recognized:
\ue200filecite\ue202turn2file13\ue202turn2file11\ue201
The second citation “file#” in the bad case is usually hallucinated, because the AI continued in a loop that forces writing something plausible.
A backend parsing metacode:
import re
FILECITE = re.compile(r"\ue200filecite\ue202(?P<ref>turn\d+file\d+)\ue201")
refs = []
text = FILECITE.sub(lambda m: refs.append(m["ref"]) or "", text)
Robust parsing, rejecting additional repeats of mid-separator and “turn” output where instead the closing marker token should have been produced:
import re
FILECITE = re.compile(
r"\ue200filecite\ue202"
r"(?P<ref>turn\d+file\d+)"
r"(?:\ue202turn\d+file\d+)*"
r"\ue201"
)
refs = []
text = FILECITE.sub(lambda m: refs.append(m["ref"]) or "", text)
Until a solution is actually delivered, there could be further reinforcement in developer instructions, writing examples of citations over and over correctly, with prohibitions on the \ue202 star…star pattern, thus encouraging the output of the rather obscure closing Unicode needed.