Error when using Langchain WebResearchRetriever – RuntimeError: asyncio.run() cannot be called from a running event loop

arnimk · August 31, 2023, 12:59am

I’ve been attempting to use WebResearchRetriever from Langchain in Python, and I’m running a segment of code that works for other people, but I keep getting this error:

RuntimeError: asyncio.run() cannot be called from a running event loop

I think the issue may be with my computer and not with the code itself, but here’s the code:

from langchain.retrievers.web_research import WebResearchRetriever
import os
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.chat_models.openai import ChatOpenAI
from langchain.utilities import GoogleSearchAPIWrapper

os.environ[“OPENAI_API_KEY”] = ‘my_key’

vectorstore = Chroma(embedding_function=OpenAIEmbeddings(),persist_directory=“./chroma_db_oai”)

llm = ChatOpenAI(temperature=0)

os.environ[“GOOGLE_CSE_ID”] = “my_key”
os.environ[“GOOGLE_API_KEY”] = “my_key”
search = GoogleSearchAPIWrapper()

web_research_retriever = WebResearchRetriever.from_llm(
vectorstore=vectorstore,
llm=llm,
search=search,
)

from langchain.chains import RetrievalQAWithSourcesChain
user_input = “How do LLM Powered Autonomous Agents work?”
qa_chain = RetrievalQAWithSourcesChain.from_chain_type(llm,retriever=web_research_retriever)
result = qa_chain({“question”: user_input})
print (result)

Can anyone help me resolve this error? Any help would be much appreciated.

novaphil · August 31, 2023, 3:59pm

You’ll probably have better luck asking on the LangChain forums since the issue is with their code.

hydeta · September 1, 2023, 1:12am

Are you running this in Jupyter? If so they already run an event loop in the background.

arnimk · September 2, 2023, 11:35pm

I’ve tried it in Jupyter and on Google Colab. Do you know what I should use instead so that there isn’t already an event loop?

BabellDev · September 7, 2023, 2:31pm

I had this same issue in Jupyter. You should be able to solve it by tweaking your code similar to this:

# Make sure nest_asyncio is installed
!pip install nest_asyncio

# Allow nested asyncio loops
import nest_asyncio
nest_asyncio.apply()

# Move your use of qa_chain into an async function
async def main():
result = qa_chain({“question”: user_input})
print (result)

# Run your async function in the existing event loop
loop = asyncio.get_event_loop()
loop.run_until_complete(main())

arnimk · September 12, 2023, 10:03pm

Thank you so much! The error is gone. However, I now have another error which I have no clue how to fix. Any idea how to fix this?

ClientConnectorCertificateError: Cannot connect to host python.langchain.com:443 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1002)')]

BabellDev · September 13, 2023, 2:32pm

Actually I’ve been getting that same SSL error while utilizing WebResearchRetriever. From what I’ve read, I think the true fix involves tweaking local Python SSL certificate config/handling, but things I’ve tried haven’t worked.

However, I did come up with a monkey-patch you can use to modify the behavior of WebResearchRetriever._get_relevant_documents to disable SSL cert verification. This isn’t ideal from a security point-of-view, but it works. This updated logic also adds a check for empty docs list before adding to the vector database, which solves a tuple exception when no docs could be decoded (if they are all pdfs, for example). I’ve added comments with my name # (BabellDev) so you can see the changes. Otherwise the code is the same as the current version on github.

Add this function declaration somewhere before your existing code, and then call it before you call
WebResearchRetriever.from_llm.

def patch_web_research_retriever():
    import logging
    from typing import List
    from langchain.retrievers.web_research import WebResearchRetriever
    from langchain.callbacks.manager import CallbackManagerForRetrieverRun
    from langchain.document_loaders import AsyncHtmlLoader
    from langchain.document_transformers import Html2TextTransformer
    from langchain.schema import Document

    logger = logging.getLogger(__name__)

    def _patched_get_relevant_documents(
        self,
        query: str,
        *,
        run_manager: CallbackManagerForRetrieverRun,
    ) -> List[Document]:

        # Get search questions
        logger.info("Generating questions for Google Search ...")
        result = self.llm_chain({"question": query})
        logger.info(f"Questions for Google Search (raw): {result}")
        questions = getattr(result["text"], "lines", [])
        logger.info(f"Questions for Google Search: {questions}")

        # Get urls
        logger.info("Searching for relevant urls...")
        urls_to_look = []
        for query in questions:
            # Google search
            search_results = self.search_tool(query, self.num_search_results)
            logger.info("Searching for relevant urls...")
            logger.info(f"Search results: {search_results}")
            for res in search_results:
                if res.get("link", None):
                    urls_to_look.append(res["link"])

        # Relevant urls
        urls = set(urls_to_look)

        # Check for any new urls that we have not processed
        new_urls = list(urls.difference(self.url_database))

        logger.info(f"New URLs to load: {new_urls}")
        # Load, split, and add new urls to vectorstore
        if new_urls:

            # (BabellDev) changed verify_ssl to False
            loader = AsyncHtmlLoader(new_urls, verify_ssl=False)
            html2text = Html2TextTransformer()
            logger.info("Indexing new urls...")
            docs = loader.load()
            docs = list(html2text.transform_documents(docs))
            docs = self.text_splitter.split_documents(docs)

            # (BabellDev) do not add if docs is empty (avoid tuple error)
            if docs is not None and len(docs) > 0:
                self.vectorstore.add_documents(docs)

            self.url_database.extend(new_urls)

        # Search for relevant splits
        logger.info("Grabbing most relevant splits from urls...")
        docs = []
        for query in questions:
            docs.extend(self.vectorstore.similarity_search(query))

        # Get unique docs
        unique_documents_dict = {
            (doc.page_content, tuple(sorted(doc.metadata.items()))): doc for doc in docs
        }
        unique_documents = list(unique_documents_dict.values())
        return unique_documents

    WebResearchRetriever._get_relevant_documents = _patched_get_relevant_documents

If you don’t like monkey-patching, you could derive your own class from WebResearchRetriever and override the _get_relevant_documents method.

Hope it helps!

addarcher · October 3, 2023, 2:50pm

had same issue. I think the solution is to adjust the async_html.py in \Lib\site-packages\langchain\document_loaders.

first add these two:
import ssl
import certifi

then change the code block at the bottom to:

ssl_context = ssl.create_default_context(cafile=certifi.where())
conn = aiohttp.TCPConnector(ssl=ssl_context)
async with aiohttp.ClientSession(connector=conn) as session:

BabellDev · October 3, 2023, 3:42pm

FYI. Regarding the original question, it looks like this has been fixed in the latest version of langchain. If you take a look at load() in async_html it is now handling an already-running loop:

github.com

langchain-ai/langchain/blob/master/libs/langchain/langchain/document_loaders/async_html.py

import asyncio
import logging
import warnings
from concurrent.futures import ThreadPoolExecutor
from typing import Any, Dict, Iterator, List, Optional, Union, cast

import aiohttp
import requests

from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader

logger = logging.getLogger(__name__)

default_header_template = {
    "User-Agent": "",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*"
    ";q=0.8",
    "Accept-Language": "en-US,en;q=0.5",
    "Referer": "https://www.google.com/",

This file has been truncated. show original

Topic		Replies	Views
Error while using WebResearchRetriever from Langchain API gpt-4	3	1619	May 14, 2024
Need help: HTTPSConnectionPool(host='api.openai.com', port=443) API api , langchain	3	5451	December 7, 2023
SSL: certificate_verify_failed API	78	128302	December 2, 2023
OpenAI API Stuck During Loop Execution, Resumes After Delay (I'm a Development Noob) API	4	3827	January 3, 2024
AttributeError: module 'openai' has no attribute 'error' API openapi , langchain	27	69464	February 27, 2024

Error when using Langchain WebResearchRetriever – RuntimeError: asyncio.run() cannot be called from a running event loop

Related topics