Are citation URLS tokenized similar to typical language

Historically, getting an LLM to give you authentic citations was a problem. Certainly with a RAG system, you could give a list of citations related to a completion (accurate citation to completion-chunk footnotes are another story). And I did notice Anthropic’s Claude 3.5 is actually pretty good at returning authentic citations.

Just curious if the reason for hallucinated citations is due the tokenization of URLs? When a URL is tokenized, is it broken up similar to other language tokenization or is it handled atomically… or handled some other way?

Appreciate any insights.