I’m seeing what looks like irrelevant source contamination with ads during a Deep Research task that is supposed to be restricted to a Google Drive archive.
Task type:
-
Read a connected Google Drive folder
-
Read manifest first
-
Read multipart shard files in sequence
-
Extract claims/dates/quotes from the archive only
Expected behavior:
-
Stay inside the specified Google Drive folder and named files
-
Use direct file reads after the manifest identifies the sequence
-
Avoid unrelated public web retrieval unless explicitly requested
Observed behavior:
The Sources panel included unrelated external sites such as:
These were not relevant to the task and appeared alongside Google Drive / Google developer documentation retrieval. In the sample I captured, there were at least 23 clearly irrelevant external sources, plus a large amount of Google support/API noise.
Why this is a problem:
-
It adds irrelevant citations/sources to a source-restricted archive task
-
It reduces confidence that the model is actually reading the intended archive files
-
It wastes activity budget on unrelated retrieval
-
It makes it difficult to audit what was truly read versus what was merely surfaced
Suggestion:
Please add a stricter source restriction mode for Deep Research, especially for connector-based archive ingestion tasks. A useful mode would:
-
Restrict retrieval to specified connector sources only
-
Disallow public web search unless explicitly enabled
-
Log actual access level per file:
-
listed only
-
metadata only
-
partial text access
-
full text access
-
Counts from the captured source list:
commercial product pages
• support.hp.com: 3
• hp.com: 1
• dell.com: 3
• bestbuy.com: 3
• Subtotal: 10
random off-task pages
• apexnc.org: 2
• sanbernardino.gov: 1
• elcajon.gov: 1
• redfin.com: 1
• wunderground.com: 1
• mathway.com: 5
• coinbase.com: 2
• Subtotal: 13
Combined external sources
• Total: 23
Repeated duplicate examples
• Dell XPS 17 9710 System BIOS: 3
• Apex, NC - Official Website: 2
• Mathway pages: 5
• Coinbase calculator pages: 2
This happened twice so not random.
Has anyone else seen Deep Research pull unrelated commercial or random public-web sources during a task that should have remained restricted to a connected document archive?