ChatGPT 4 shows very bad and incomplete responses

I can confirm that the results have become worse. I have felt that way for some time. However, I have now taken the time to compare a recent example (code refactoring).

Background

This is a feature from my project “Chat with Documents”, where a human asks a question and GPT answers based on a collection of documents. Since I was processing a lot of medical literature, I thought it would be quite good to add some translation functionality. I started with the integration of “argostranslate”. However, I was not completely satisfied with the results, so I integrated deepl. Since there are only limited free credits available for deepl and I didn’t want to sign up for another subscription, I also added GPT for translation.
The “translation_mode” was integrated because of the not quite satisfying results of argostranslate, because I thought that for a short question directed to embeddings and LLM the quality of the translation is less important than for the answer.

Language of human and machine are defined manually. Language of machine was added to stay flexible (language specific embeddings, language specific LLMs).

The following “translation_mode” exist:

  • “no translation”: neither question nor answer will be translated
  • “human2machine”: human question is translated into the language of the machine
  • machine2human": answer of the machine is translated into the language of the human being
  • twoway": both question of the human and answer of the machine are translated

As you can see, it is code grown over time and the whole project needs refactoring. To save time, I wanted to do this in a hurry with GPT-4, but with this result, I have little confidence in the correctness.

Settings

  • Temperature: 0
  • Maximum length: 2048
  • Top P: 1
  • Frequency penalty: 0
  • Presence penalty: 0

Prompt

Refactor the following python code by making it more readable and by using best practices and document the code. Don’t show import statements.

def chat_with_docs(question, previous, chain, human_language, computer_language, translation_mode, translation_model, name_mapping):
  if translation_model == "argostranslate":
    # "no translation", "human2machine", "machine2human", "twoway"
    if translation_mode == "no translation":
      answer = chain({"question": question})
    elif translation_mode == "human2machine":
      answer = chain({"question": translate(question, human_language, computer_language)})
    elif translation_mode == "machine2human":
      answer = chain({"question": question})
      answer["answer"] = translate(answer["answer"], computer_language, human_language)
    elif translation_mode == "twoway":
      answer = chain({"question": translate(question, human_language, computer_language)})
      answer["answer"] = translate(answer["answer"], computer_language, human_language)
  elif translation_model == "deepl":
    human_language = str(human_language).upper()
    computer_language = str(computer_language).upper()
    if human_language == "EN":
       human_language = "EN-US"
    if computer_language == "EN":
      computer_language = "EN-US"
    translator = deepl.Translator(os.environ["DEEPL_API_KEY"])
    if translation_mode == "no translation":
      answer = chain({"question": question})
    elif translation_mode == "human2machine":
      answer = chain({"question": str(translator.translate_text(question, target_lang=computer_language))})
    elif translation_mode == "machine2human":
      answer = chain({"question": question})
      answer["answer"] = str(translator.translate_text(answer["answer"], target_lang=human_language))
    elif translation_mode == "twoway":
      answer = chain({"question": str(translator.translate_text(question, target_lang=computer_language))})
      answer["answer"] = str(translator.translate_text(answer["answer"], target_lang=human_language))
  else:
     # "no translation", "human2machine", "machine2human", "twoway"
    if translation_mode == "no translation":
      answer = chain({"question": question})
    elif translation_mode == "human2machine":
      answer = chain({"question": openai_translate(question, human_language, computer_language)})
    elif translation_mode == "machine2human":
      answer = chain({"question": question})
      answer["answer"] = openai_translate(answer["answer"], computer_language, human_language)
    elif translation_mode == "twoway":
      answer = chain({"question": openai_translate(question, human_language, computer_language)})
      answer["answer"] = openai_translate(answer["answer"], computer_language, human_language)
    
  the_type = type(answer["sources"])

  if answer["sources"] != "":
    sources = extract_patterns(answer["sources"], "_id_")

    sources_translated = []
    sources_counter = 1
    for source in sources:
      print(source)
      source = "_id_" + source + "_id_"
      if name_mapping != "":
        uuid_dict = json.loads(name_mapping)
        if source in uuid_dict:
          the_source = uuid_dict[source]
          print(the_source)
          sources_translated.append(str(sources_counter) + ") " + the_source)
          sources_counter += 1
        else:
          print(str(source) + " not in name_mapping")
      else:
          print("no name mapping provided")
    
    if len(sources_translated) > 0:
      sources_translated = "\n".join(sources_translated)
      print(sources_translated)
      answer["sources"] = sources_translated 

  return "Question: " + question + "\nAnswer:\n" + answer["answer"] + "\nsources: " + answer["sources"] + "\n\n" + previous

Attention:
translate_mode can be one of the following: [“no translation”, “human2machine”, “machine2human”, “twoway”]
So there might be no translation necessary.

Current gpt-4 answer

def chat_with_docs(question, previous, chain, human_language, computer_language, translation_mode, translation_model, name_mapping):
    def translate_question(question, mode, human_lang, computer_lang, translator=None):
        if mode == "no translation":
            return question
        elif mode == "human2machine":
            return translate(question, human_lang, computer_lang, translator)
        else:
            return None

    def translate_answer(answer, mode, human_lang, computer_lang, translator=None):
        if mode == "no translation" or mode == "human2machine":
            return answer
        elif mode == "machine2human" or mode == "twoway":
            return translate(answer, computer_lang, human_lang, translator)
        else:
            return None

    def translate(text, source_lang, target_lang, translator=None):
        if translator:
            return str(translator.translate_text(text, target_lang=target_lang))
        else:
            return openai_translate(text, source_lang, target_lang)

    def get_translator(model, human_lang, computer_lang):
        if model == "deepl":
            human_lang = human_lang.upper()
            computer_lang = computer_lang.upper()
            if human_lang == "EN":
                human_lang = "EN-US"
            if computer_lang == "EN":
                computer_lang = "EN-US"
            return deepl.Translator(os.environ["DEEPL_API_KEY"])
        else:
            return None

    def format_sources(sources, name_mapping):
        if not sources:
            return ""

        extracted_sources = extract_patterns(sources, "_id_")
        translated_sources = []
        counter = 1

        if name_mapping:
            uuid_dict = json.loads(name_mapping)

        for source in extracted_sources:
            source = "_id_" + source + "_id_"
            if name_mapping and source in uuid_dict:
                translated_sources.append(f"{counter}) {uuid_dict[source]}")
                counter += 1

        return "\n".join(translated_sources)

    translator = get_translator(translation_model, human_language, computer_language)
    translated_question = translate_question(question, translation_mode, human_language, computer_language, translator)
    answer = chain({"question": translated_question})
    answer["answer"] = translate_answer(answer["answer"], translation_mode, human_language, computer_language, translator)
    answer["sources"] = format_sources(answer["sources"], name_mapping)

    return f"Question: {question}\nAnswer:\n{answer['answer']}\nsources: {answer['sources']}\n\n{previous}"

In this refactored code, I have broken down the original function into smaller, more manageable functions. Each function has a specific purpose, making the code more readable and easier to maintain. The main function, chat_with_docs, now calls these smaller functions to perform the necessary tasks. This approach follows the best practices of modular programming and improves the overall readability of the code.

gpt-4-0314 answer

def chat_with_docs(question, previous, chain, human_language, computer_language, translation_mode, translation_model, name_mapping):
    def translate_text(text, source_lang, target_lang):
        if translation_model == "argostranslate":
            return translate(text, source_lang, target_lang)
        elif translation_model == "deepl":
            source_lang = str(source_lang).upper()
            target_lang = str(target_lang).upper()
            if source_lang == "EN":
                source_lang = "EN-US"
            if target_lang == "EN":
                target_lang = "EN-US"
            translator = deepl.Translator(os.environ["DEEPL_API_KEY"])
            return str(translator.translate_text(text, target_lang=target_lang))
        else:
            return openai_translate(text, source_lang, target_lang)

    def process_question(question, translation_mode):
        if translation_mode == "no translation":
            return question
        elif translation_mode == "human2machine":
            return translate_text(question, human_language, computer_language)
        elif translation_mode == "machine2human":
            return question
        elif translation_mode == "twoway":
            return translate_text(question, human_language, computer_language)

    def process_answer(answer, translation_mode):
        if translation_mode in ["no translation", "human2machine"]:
            return answer
        elif translation_mode in ["machine2human", "twoway"]:
            return translate_text(answer, computer_language, human_language)

    def process_sources(sources, name_mapping):
        if sources == "":
            return ""

        extracted_sources = extract_patterns(sources, "_id_")
        translated_sources = []
        counter = 1

        for source in extracted_sources:
            source = "_id_" + source + "_id_"
            if name_mapping != "":
                uuid_dict = json.loads(name_mapping)
                if source in uuid_dict:
                    translated_sources.append(f"{counter}) {uuid_dict[source]}")
                    counter += 1

        return "\n".join(translated_sources)

    processed_question = process_question(question, translation_mode)
    answer = chain({"question": processed_question})
    answer["answer"] = process_answer(answer["answer"], translation_mode)
    answer["sources"] = process_sources(answer["sources"], name_mapping)

    return f"Question: {question}\nAnswer:\n{answer['answer']}\nsources: {answer['sources']}\n\n{previous}"

In this refactored code, I have broken down the original function into smaller helper functions to make the code more readable and modular. The main function now calls these helper functions to process the question, answer, and sources based on the translation mode and model. This makes the code easier to understand and maintain.

One Comparison

gpt-4

    def translate_question(question, mode, human_lang, computer_lang, translator=None):
        if mode == "no translation":
            return question
        elif mode == "human2machine":
            return translate(question, human_lang, computer_lang, translator)
        else:
            return None

gpt-4-0314

    def process_question(question, translation_mode):
        if translation_mode == "no translation":
            return question
        elif translation_mode == "human2machine":
            return translate_text(question, human_language, computer_language)
        elif translation_mode == "machine2human":
            return question
        elif translation_mode == "twoway":
            return translate_text(question, human_language, computer_language)

The gpt-4-0314 answer is the desirable answer, which covers all cases correctly.

6 Likes