Hi!
I’m hoping I could get some help on my problem.
I’ve managed to build a chat engine using RAG with a simple directory reader & a PG Vector Store.
When asking questions, in a back and forth way (chat engine style), there’s a very strange but consistent behavior.
When I send a first message, I get an answer from OpenAI. But when I send a second message, I run into Connection error
s:
INFO: Loading index from storage...
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO: Finished loading index from storage
INFO:llama_index.core.chat_engine.condense_plus_context:Condensed question: <condensed_question>
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
/.venv/lib/python3.11/site-packages/vecs/collection.py:502: UserWarning: Query does not have a covering index for IndexMeasure.cosine_distance. See Collection.create_index
warnings.warn(
INFO: 127.0.0.1:59430 - "POST /api/chat/ HTTP/1.1" 200 OK
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO: 127.0.0.1:59442 - "POST /api/chat HTTP/1.1" 307 Temporary Redirect
INFO: Loading index from storage...
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO: Finished loading index from storage
INFO:openai._base_client:Retrying request to /chat/completions in 0.928694 seconds
INFO:openai._base_client:Retrying request to /chat/completions in 1.522838 seconds
INFO:openai._base_client:Retrying request to /chat/completions in 3.389680 seconds
ERROR:root:Error in chat generation: Connection error.
INFO: 127.0.0.1:59442 - "POST /api/chat/ HTTP/1.1" 500 Internal Server Error
I set up my chat engine the following way:
def get_chat_engine():
model = os.getenv("MODEL")
llm = OpenAI(model, temperature=0)
memory = ChatMemoryBuffer.from_defaults(token_limit=10000)
return get_index().as_chat_engine(
similarity_top_k=3,
memory=memory,
chat_mode="condense_plus_context",
llm=llm,
verbose=False,
)
With get_index()
defined the following way:
def get_index():
# check if storage already exists
if not os.path.exists(STORAGE_DIR):
raise Exception(
"StorageContext is empty - call 'python app/engine/generate.py' to generate the storage first"
)
logger = logging.getLogger("uvicorn")
# load the existing index
vector_store = get_vector_store()
logger.info(f"Loading index from {STORAGE_DIR}...")
storage_context = StorageContext.from_defaults(
persist_dir=STORAGE_DIR, vector_store=vector_store
)
index = VectorStoreIndex.from_documents(
documents=get_documents(), storage_context=storage_context
)
logger.info(f"Finished loading index from {STORAGE_DIR}")
return index
And I’m calling the OpenAI using Streaming mode:
@retry(stop=stop_after_attempt(5), wait=wait_fixed(3))
async def call_openai_api(
chat_engine: BaseChatEngine, message: _Message, messages: List[_Message]
):
try:
response = await chat_engine.astream_chat(message, messages)
return response
except Exception as e:
print(f"Error in API call: {e}")
raise
MODEL=gpt-3.5-turbo-0125
It’s been very consistent and systematic and I don’t understand why it happens. A short term solution would be to reboot the server but it’s definitely not sustainable…
Would anyone know why?
EDIT:
Adding request IDs:
— First communication —
- Successful embedding requests:
- req_f5453dc74ec0972731cd922c6548a00d
- req_b600dce8f70d7e5e0919cfab235bbb9b
- Successful completion request:
- req_4f7441a7793c496a4dd9bdfb8b62a9fe
— Second communication —
- Successful embedding request
- req_ae6ff57945d69de09caf0a2d1a05d062
- Failed completion request
- no request id