Issues with consuming text from a webpage

tadpolehop · January 10, 2025, 5:43pm

I provide a webpage link to chat/completion api and ask targeted questions based on the tags/paragraph headers (which are in bullet format) in the hyperlink. What I observe is that the response inherently misses on 1-2 bullet points from the list. Based on the response it feels like gpt is making the best guess and not actually capturing the actual content. Does anyone have experience in this area. ?
I m using gpt-3.5-turbo.

phyde1001 · January 10, 2025, 5:52pm

Hi,

Welcome to the forum.

Indeed this is your issue, the llm is not collecting the URL you requested.

Also even supplying it the HTML webpage would potentially also return the wrong results too.

The result is ‘non-deterministic’ meaning that when you send the same question you will often receive a different result.

LLMs make a predictive next step decision based on probability.

To do this task 100% accurately you would do this task in logical code.

sps · January 10, 2025, 7:08pm

Welcome to the dev forum @tadpolehop

sergeliatko · January 10, 2025, 7:21pm

I ended up setting up a tool with a call to jina.ai reader endpoint, as a quick hack initially… Then it’s kind of ok to run on the minor projects I have, when it doesn’t work, I still can do custom code or brightdata proxy’s.

Topic		Replies	Views
ChatGPT's API returns worse web search results than it's web UI and it can't explain to me why API chatgpt , api , web-browsing , web-search	2	547	April 22, 2025
[Need help] Issue with the API's results API	1	419	February 24, 2024
GPT-4o issue with YouTube link parsing API gpt-4	1	335	July 29, 2024
Reading and analyzing webpages content by GPT-4 API API gpt-4 , plugin-development , api , chatgpt-plugin	1	7677	January 22, 2024
Gpt-4o web browsing capability where? API api	18	18416	June 28, 2024

Issues with consuming text from a webpage

Related topics