Langchain - Improve prompt latency with map_reduce

raphael5 · October 23, 2023, 1:51pm

Hi everyone,

Hope you are well.

I am looking for some advice regarding a small use case I have. Basically, I am building a small application that is using a combination of RAG & GPT4 prompts.

With the RAG part, I retrieve documents that usually have a length of 2 to 10 pages. Then I submit them as a context into my prompt to answer a specific question.

Considering the size of context, I am using chain_type = map_reduce to process the whole context progressively & execute my prompt with GPT4 (I am using langchain).

def gpt4_query_mapreduce(prompt, input_documents):
    
    from langchain.chat_models import ChatOpenAI
    from langchain.chains.question_answering import load_qa_chain
    import time
    
    time.sleep(3)  #Pause to avoid overloading open ai API

    #Initiate the LLM model access
    llm = ChatOpenAI(openai_api_key="xxxxx", model = "gpt-4")
  
    #Initiative query chain mode. 
    chain = load_qa_chain(llm, chain_type="map_reduce",verbose=True) #"Stuff" chain type is used because we will only submit a small piece of text that is prefiletered with the semantic search
    answer =  chain.run(input_documents=input_documents, question=prompt, return_only_outputs=True) #Submit the analysis to GPT4 for final check / relevancy verification / Semantic cleaning
    
    return answer

Here is the deal. The time to process is extremely long, sometimes it takes more than 15 minutes to execute the prompt. I can understand large documents used as a context in a prompt can create latency, I can also understand that GPT4 is not the fastest, still, I feel this execution time is particularly huge.

Therefore, I was wondering if there was any method to speed up the process? Is there a way to use map_reduce with a parallel process? Maybe the RAG documents that are too large? Anything else?

Thank you in advance for your help

Topic		Replies	Views
Using ChatGPT 3.5 Turbo with Langchain is excessively slow API chatgpt , langchain	3	3111	October 21, 2023
How to optimize chunked grammar/spell-check processing with LLaMA.cpp in Node.js? Bugs llm , rag , development , llama-index , nodejs	1	47	August 21, 2025
Slow API Response Time w/ LangChain & Redis API gpt-4	0	475	April 30, 2024
Best practice to increase GPT-4 speed API gpt-4	13	13959	October 15, 2024
How can I improve response times from the OpenAI API while generating responses based on our knowledge base? API chatgpt , api	3	23739	November 9, 2023

Langchain - Improve prompt latency with map_reduce

Related topics