Building an Autonomous Agent that can actually answer questions using the web

I’m creating this post as a bit of diary for building an Autonomous Agent that can answer questions using the web. I’m building this agent using TypeScript and my Alpha Wave library and if you want to follow along code wise, you can find the sample here.

So in an effort to accelerate my development, I’ve spent the last day or so adding support for LangChain.js to AlphaWave… It’s actually two-way support. You can pretty much use any LangChain component in an AlphaWave project and I package AlphaWave itself up as a new model that can be used in LangChain. So if you want add validating and self repairing models to your LangChain project you can now do that (for JS at least.)

I also created Command wrappers for all the most interesting LangChain tools so that I could use them from Alpha Wave based Agents. One nice benefit is that I was able to add JSON Schema validation to all of LangChains tools which makes it impossible for a tool to be called with missing parameters. So with all that done… On to my actual goal. Building an agent that can use these new tools to answer the users questions using stuff from the web.

tl;dr; we’re not there yet. I was asking a simple question “number of a plumber near seattle” and I sort of got it to work (and I’ll break that down below) but I had to completely re-write both the Serp and WebBrowser tools to even get it to sort of work… I have a lot of ideas for how to get it there but we’re not there yet.

So let me start with a screen shot of my last run of “search-agent” (this is gpt-3.5-turbo by the way):

We can see that it starts off pretty well. It asks what I’m looking for, then uses the bingSearch command to find some pages to read, then it starts using the webBrowser command to start reading the links on the pages (these are the two tools I had to completely re-write) and eventually it finds a phone number on the Better Business Bureau website which already has me suspect. So I follow up asking the name of that plumber and it does what I would have hopped… It looks to be using the webBrowser command to ask a follow-up question of the BBB site and sure enough the phone number was hallucinated. I could turn on more detailed logging to see what’s happening here but I can already tell you that the top level agent essentially felt like it was running out of options to answer the question so it made something up… I could go into much greater depth here but let me just give some initial observations and follow-up approaches I’m planning to try:

Observations:

  • If you give an agent both a Serp and a webBroswer command it can’t always decide which one it wants to use. There too similar. It will often pick the search tool but it will occasionally jump straight to the browser. including both is a dicey proposition.
  • When it calls the Serp tool and gets back a list of links it’s somewhat 50/50 as to whether it will do the work to go through all the links… I’ve already seen that asking these things to do anything in a mechanical loop is dicey. Arguably, GPT-3.5 has a pensions for stopping tasks short but you should generally avoid asking them to run in mechanical loops.
  • If the top level Agent sees the answer it will convey that answer truthfully. But as the task progresses there’s an ever growing risk that it will just make something up. I tried updating the top level agents prompt to better ground it in the truth of the data being returned from the commands but that stopped it from looping :frowning:

On the plus side… Alpha Waves automatic repair logic is working flawlessly. Every run I did had 1 - 2 bad responses and Alpha Wave repaired every single one. I did notice that the agent would sometimes change its mind about what command it wanted to do next when a repair occurred but it seemed to always change its mind in a more desirable direction. It makes me wonder whether a lot of the response errors the model makes are just because its undecided as to how it should respond or, more likely, it goes down a response branch it starts to regret.

Next Steps for Search Agent:

  • I’m planning to to essentially combine the Serp and WebBrowser commands into one single command that can do both web search and browse results. This will do 2 things; it will remove some of the confusion for the top level agent as to which tool to chose, and it will remove the need for the agent to run a mechanical loop as the Serp tool can drive that loop via code.
  • The other issue is the WebBrowser tool doesn’t go deep enough so I have a bunch of ideas for how to get it to browse a site looking for answers more like a human does.

I’ll report back to this thread as I make progress… I feel like this is solvable but it’s all in how you break the task apart for the agent. It’s going to take a blend of code for the parts that code is good at (loops and such) and LLM magic for the parts its good at, reasoning.

4 Likes

Looking good. Thanks for sharing so much. I haven’t had a chance to take AlphaWave for a spin personally, but it sounds very useful.

1 Like

Yeah the more I think about it the more I wonder if errors the model makes in its output aren’t just it second guessing it’s choices.

If you think about it, as humans we can stop mid sentence and take back our words to a degree . The model can’t do that. Once it picks a token it’s stuck with that decision.

1 Like

So I’m honestly not trying to hype AlphaWave as a thing that you should use, but the feedback technique it employs is working so well that if you are expecting any sort of structured output from the model, you should probably be using AlphaWave or something that employs similar techniques. If all you want back is text then you’re good, but if you need it to return JSON or XML or some other semi structured output I’d say it’s essential. It simply works. I’ve had 2 bad responses in going on 1000+ model calls and both of those are issues I fixed

Progress! It can actually answers questions now and can even answer follow up questions:

I removed the WebBrowser command from the top level agent and gave it to the BingSearch command to use. The BingSearch command now has a new feature called deep_search (code link) where you can instruct it to answer questions by reading each search result page:

image

I also added a max_search_time param which lets you control how long the BingSearch command as allowed to look for.

Next step will be to add a deep_search option to the WebBrowser command itself. This will let it look for relevant links that it can drill into as it reads the page. Again, my goal is to make this agent look for information like a human does. We all know that yelp probably had a great answer but you would need to click on the top link to find the phone number I was asking for.

Another detail… The BingSearch command has to make a model call to evaluate whether the WebBrowser command returned an answer to the users question or should it keep looking. I ended up having to use an INSTRUCT prompt for that which looks something like this:

 page url: https://www.yelp.com/search?cflt=plumbing&find_loc=Seattle%2C+WA
page text:
The text mentions "The Best 10 Plumbing near me in Seattle, Washington" and provides a link to a Yelp search for plumbing services in Seattle. Therefore, the answer is: There is a Yelp search for plumbing services in Seattle, but no specific plumber is mentioned.

user question:
number of a plumber near Seattle

steps:
- Think about whether the page text answers the users question
- If not we should continue looking at other search results pages. 
- Return your response using this JSON {"continueSearch": <true|false>}

Do steps 1, 2, and 3 and show your work for each step.

And the model will generate a response that looks like this:

Step 1: 
The page text does not provide the number of a specific plumber near Seattle. It only mentions a Yelp search for plumbing services in Seattle.

Step 2:
We can try searching for specific plumbers in Seattle using search engines or directories such as Google Maps, Yellow Pages, or Angie's List.

Step 3:
{"continueSearch": true}

There’s actually no reason the WebBrowser command shouldn’t be able to fold this reasoning into it’s processing so I’m going to work on returning a more structured response from the WebBrowser command when I re-write it. This will prevent the BingSearch command from having to make an additional model call for each page.

More progress… I completely re-wrote the WebBrowserCommand and add a new WebPageSearchCommand that the WebBrowserCommand uses to search individual pages. The WebPageSearchCommand returns either an answer or a link to a page it thinks it might have the answer and the WebBrowserCommand will drill into that page. The problem is, GPT-3.5 is horrible at picking the page to search next so I have that feature turned off for now.

Even without that feature the experience is improved. It’s faster because I eliminated a model call and I also have the model returning better and more detailed answers across the board:

It’s not perfect though… Here’s a session where the model first tells me Taylor Swift has 9 #1 hits but then when I ask what they are, only comes up with 8 of them:

There are in fact 9 according to the source page:

But the issue is that it doesn’t go back to that source page with the followup question and has to hunt a bit before landing on what I’m assuming is an older result. I’ll work on that next. It has the link to the answers source page but I don’t want to give it direct access to the WebBrowserCommand so i’ll probably try adding a “site” parameter to the BingSearchCommand.

Bonus Interaction:

I’m glad that GPT-4 is better at “maths” :slight_smile: This is gpt-3.5-turbo by the way…