I’m building a custom GPT for my website, but it keeps auto-generating/guessing URLs that don’t exist in our sitemap or directory, leading to 4xx/5xx errors.
I want it to return only verified URLs and data strictly from our website. How can I restrict it to use only approved site content and prevent URL hallucination?
Hey @aws.bu, yeah, that’s frustrating. By default, the model will generate “plausible” URLs unless you force it to retrieve from an approved source.
Two things that usually help:
Upload an allowlist as Knowledge: Add your sitemap (or a curated list of valid URLs) as Knowledge, and include a clear instruction like: “Never invent URLs. Only return URLs that appear in Knowledge. If no exact match is found, ask the user for clarification.”
Tighten the system instructions: Be explicit that generating or guessing URLs is not allowed, even if they look correct. The more direct and absolute you are here, the better the model behaves.
It significantly reduces URL hallucination in practice. If you still see invalid links being generated after this, share an example and we can dig into what’s slipping through.