Historically, getting an LLM to give you authentic citations was a problem. Certainly with a RAG system, you could give a list of citations related to a completion (accurate citation to completion-chunk footnotes are another story). And I did notice Anthropic’s Claude 3.5 is actually pretty good at returning authentic citations.
Just curious if the reason for hallucinated citations is due the tokenization of URLs? When a URL is tokenized, is it broken up similar to other language tokenization or is it handled atomically… or handled some other way?
Appreciate any insights.