Langchain - Improve prompt latency with map_reduce

Hi everyone,

Hope you are well.

I am looking for some advice regarding a small use case I have. Basically, I am building a small application that is using a combination of RAG & GPT4 prompts.

With the RAG part, I retrieve documents that usually have a length of 2 to 10 pages. Then I submit them as a context into my prompt to answer a specific question.

Considering the size of context, I am using chain_type = map_reduce to process the whole context progressively & execute my prompt with GPT4 (I am using langchain).

def gpt4_query_mapreduce(prompt, input_documents):
    from langchain.chat_models import ChatOpenAI
    from langchain.chains.question_answering import load_qa_chain
    import time
    time.sleep(3)  #Pause to avoid overloading open ai API

    #Initiate the LLM model access
    llm = ChatOpenAI(openai_api_key="xxxxx", model = "gpt-4")
    #Initiative query chain mode. 
    chain = load_qa_chain(llm, chain_type="map_reduce",verbose=True) #"Stuff" chain type is used because we will only submit a small piece of text that is prefiletered with the semantic search
    answer =, question=prompt, return_only_outputs=True) #Submit the analysis to GPT4 for final check / relevancy verification / Semantic cleaning
    return answer

Here is the deal. The time to process is extremely long, sometimes it takes more than 15 minutes to execute the prompt. I can understand large documents used as a context in a prompt can create latency, I can also understand that GPT4 is not the fastest, still, I feel this execution time is particularly huge.

Therefore, I was wondering if there was any method to speed up the process? Is there a way to use map_reduce with a parallel process? Maybe the RAG documents that are too large? Anything else?

Thank you in advance for your help