Are citation URLS tokenized similar to typical language

fgreco · July 9, 2024, 6:08pm

Historically, getting an LLM to give you authentic citations was a problem. Certainly with a RAG system, you could give a list of citations related to a completion (accurate citation to completion-chunk footnotes are another story). And I did notice Anthropic’s Claude 3.5 is actually pretty good at returning authentic citations.

Just curious if the reason for hallucinated citations is due the tokenization of URLs? When a URL is tokenized, is it broken up similar to other language tokenization or is it handled atomically… or handled some other way?

Appreciate any insights.

Topic		Replies	Views
Can ChatGPT respond with citations? API	4	7116	March 17, 2023
Obtaining cited literature API	1	310	September 23, 2021
Reliable Citations in API API api , rag , citations	2	171	February 5, 2025
All languages are NOT created (tokenized) equal Community token , app , comparison , statistics	9	4763	December 17, 2023
Citation feature for document based answering API api , rag	1	967	July 11, 2024

Are citation URLS tokenized similar to typical language

Related topics