ChatGPT 4 shows very bad and incomplete responses

flowzy · June 15, 2023, 7:40pm

I can confirm that the results have become worse. I have felt that way for some time. However, I have now taken the time to compare a recent example (code refactoring).

Background

This is a feature from my project “Chat with Documents”, where a human asks a question and GPT answers based on a collection of documents. Since I was processing a lot of medical literature, I thought it would be quite good to add some translation functionality. I started with the integration of “argostranslate”. However, I was not completely satisfied with the results, so I integrated deepl. Since there are only limited free credits available for deepl and I didn’t want to sign up for another subscription, I also added GPT for translation.
The “translation_mode” was integrated because of the not quite satisfying results of argostranslate, because I thought that for a short question directed to embeddings and LLM the quality of the translation is less important than for the answer.

Language of human and machine are defined manually. Language of machine was added to stay flexible (language specific embeddings, language specific LLMs).

The following “translation_mode” exist:

“no translation”: neither question nor answer will be translated
“human2machine”: human question is translated into the language of the machine
machine2human": answer of the machine is translated into the language of the human being
twoway": both question of the human and answer of the machine are translated

As you can see, it is code grown over time and the whole project needs refactoring. To save time, I wanted to do this in a hurry with GPT-4, but with this result, I have little confidence in the correctness.

Settings

Temperature: 0
Maximum length: 2048
Top P: 1
Frequency penalty: 0
Presence penalty: 0

Prompt

Refactor the following python code by making it more readable and by using best practices and document the code. Don’t show import statements.

def chat_with_docs(question, previous, chain, human_language, computer_language, translation_mode, translation_model, name_mapping):
  if translation_model == "argostranslate":
    # "no translation", "human2machine", "machine2human", "twoway"
    if translation_mode == "no translation":
      answer = chain({"question": question})
    elif translation_mode == "human2machine":
      answer = chain({"question": translate(question, human_language, computer_language)})
    elif translation_mode == "machine2human":
      answer = chain({"question": question})
      answer["answer"] = translate(answer["answer"], computer_language, human_language)
    elif translation_mode == "twoway":
      answer = chain({"question": translate(question, human_language, computer_language)})
      answer["answer"] = translate(answer["answer"], computer_language, human_language)
  elif translation_model == "deepl":
    human_language = str(human_language).upper()
    computer_language = str(computer_language).upper()
    if human_language == "EN":
       human_language = "EN-US"
    if computer_language == "EN":
      computer_language = "EN-US"
    translator = deepl.Translator(os.environ["DEEPL_API_KEY"])
    if translation_mode == "no translation":
      answer = chain({"question": question})
    elif translation_mode == "human2machine":
      answer = chain({"question": str(translator.translate_text(question, target_lang=computer_language))})
    elif translation_mode == "machine2human":
      answer = chain({"question": question})
      answer["answer"] = str(translator.translate_text(answer["answer"], target_lang=human_language))
    elif translation_mode == "twoway":
      answer = chain({"question": str(translator.translate_text(question, target_lang=computer_language))})
      answer["answer"] = str(translator.translate_text(answer["answer"], target_lang=human_language))
  else:
     # "no translation", "human2machine", "machine2human", "twoway"
    if translation_mode == "no translation":
      answer = chain({"question": question})
    elif translation_mode == "human2machine":
      answer = chain({"question": openai_translate(question, human_language, computer_language)})
    elif translation_mode == "machine2human":
      answer = chain({"question": question})
      answer["answer"] = openai_translate(answer["answer"], computer_language, human_language)
    elif translation_mode == "twoway":
      answer = chain({"question": openai_translate(question, human_language, computer_language)})
      answer["answer"] = openai_translate(answer["answer"], computer_language, human_language)
    
  the_type = type(answer["sources"])

  if answer["sources"] != "":
    sources = extract_patterns(answer["sources"], "_id_")

    sources_translated = []
    sources_counter = 1
    for source in sources:
      print(source)
      source = "_id_" + source + "_id_"
      if name_mapping != "":
        uuid_dict = json.loads(name_mapping)
        if source in uuid_dict:
          the_source = uuid_dict[source]
          print(the_source)
          sources_translated.append(str(sources_counter) + ") " + the_source)
          sources_counter += 1
        else:
          print(str(source) + " not in name_mapping")
      else:
          print("no name mapping provided")
    
    if len(sources_translated) > 0:
      sources_translated = "\n".join(sources_translated)
      print(sources_translated)
      answer["sources"] = sources_translated 

  return "Question: " + question + "\nAnswer:\n" + answer["answer"] + "\nsources: " + answer["sources"] + "\n\n" + previous

Attention:
translate_mode can be one of the following: [“no translation”, “human2machine”, “machine2human”, “twoway”]
So there might be no translation necessary.

Current gpt-4 answer

def chat_with_docs(question, previous, chain, human_language, computer_language, translation_mode, translation_model, name_mapping):
    def translate_question(question, mode, human_lang, computer_lang, translator=None):
        if mode == "no translation":
            return question
        elif mode == "human2machine":
            return translate(question, human_lang, computer_lang, translator)
        else:
            return None

    def translate_answer(answer, mode, human_lang, computer_lang, translator=None):
        if mode == "no translation" or mode == "human2machine":
            return answer
        elif mode == "machine2human" or mode == "twoway":
            return translate(answer, computer_lang, human_lang, translator)
        else:
            return None

    def translate(text, source_lang, target_lang, translator=None):
        if translator:
            return str(translator.translate_text(text, target_lang=target_lang))
        else:
            return openai_translate(text, source_lang, target_lang)

    def get_translator(model, human_lang, computer_lang):
        if model == "deepl":
            human_lang = human_lang.upper()
            computer_lang = computer_lang.upper()
            if human_lang == "EN":
                human_lang = "EN-US"
            if computer_lang == "EN":
                computer_lang = "EN-US"
            return deepl.Translator(os.environ["DEEPL_API_KEY"])
        else:
            return None

    def format_sources(sources, name_mapping):
        if not sources:
            return ""

        extracted_sources = extract_patterns(sources, "_id_")
        translated_sources = []
        counter = 1

        if name_mapping:
            uuid_dict = json.loads(name_mapping)

        for source in extracted_sources:
            source = "_id_" + source + "_id_"
            if name_mapping and source in uuid_dict:
                translated_sources.append(f"{counter}) {uuid_dict[source]}")
                counter += 1

        return "\n".join(translated_sources)

    translator = get_translator(translation_model, human_language, computer_language)
    translated_question = translate_question(question, translation_mode, human_language, computer_language, translator)
    answer = chain({"question": translated_question})
    answer["answer"] = translate_answer(answer["answer"], translation_mode, human_language, computer_language, translator)
    answer["sources"] = format_sources(answer["sources"], name_mapping)

    return f"Question: {question}\nAnswer:\n{answer['answer']}\nsources: {answer['sources']}\n\n{previous}"

In this refactored code, I have broken down the original function into smaller, more manageable functions. Each function has a specific purpose, making the code more readable and easier to maintain. The main function, chat_with_docs, now calls these smaller functions to perform the necessary tasks. This approach follows the best practices of modular programming and improves the overall readability of the code.

gpt-4-0314 answer

def chat_with_docs(question, previous, chain, human_language, computer_language, translation_mode, translation_model, name_mapping):
    def translate_text(text, source_lang, target_lang):
        if translation_model == "argostranslate":
            return translate(text, source_lang, target_lang)
        elif translation_model == "deepl":
            source_lang = str(source_lang).upper()
            target_lang = str(target_lang).upper()
            if source_lang == "EN":
                source_lang = "EN-US"
            if target_lang == "EN":
                target_lang = "EN-US"
            translator = deepl.Translator(os.environ["DEEPL_API_KEY"])
            return str(translator.translate_text(text, target_lang=target_lang))
        else:
            return openai_translate(text, source_lang, target_lang)

    def process_question(question, translation_mode):
        if translation_mode == "no translation":
            return question
        elif translation_mode == "human2machine":
            return translate_text(question, human_language, computer_language)
        elif translation_mode == "machine2human":
            return question
        elif translation_mode == "twoway":
            return translate_text(question, human_language, computer_language)

    def process_answer(answer, translation_mode):
        if translation_mode in ["no translation", "human2machine"]:
            return answer
        elif translation_mode in ["machine2human", "twoway"]:
            return translate_text(answer, computer_language, human_language)

    def process_sources(sources, name_mapping):
        if sources == "":
            return ""

        extracted_sources = extract_patterns(sources, "_id_")
        translated_sources = []
        counter = 1

        for source in extracted_sources:
            source = "_id_" + source + "_id_"
            if name_mapping != "":
                uuid_dict = json.loads(name_mapping)
                if source in uuid_dict:
                    translated_sources.append(f"{counter}) {uuid_dict[source]}")
                    counter += 1

        return "\n".join(translated_sources)

    processed_question = process_question(question, translation_mode)
    answer = chain({"question": processed_question})
    answer["answer"] = process_answer(answer["answer"], translation_mode)
    answer["sources"] = process_sources(answer["sources"], name_mapping)

    return f"Question: {question}\nAnswer:\n{answer['answer']}\nsources: {answer['sources']}\n\n{previous}"

In this refactored code, I have broken down the original function into smaller helper functions to make the code more readable and modular. The main function now calls these helper functions to process the question, answer, and sources based on the translation mode and model. This makes the code easier to understand and maintain.

One Comparison

gpt-4

    def translate_question(question, mode, human_lang, computer_lang, translator=None):
        if mode == "no translation":
            return question
        elif mode == "human2machine":
            return translate(question, human_lang, computer_lang, translator)
        else:
            return None

gpt-4-0314

    def process_question(question, translation_mode):
        if translation_mode == "no translation":
            return question
        elif translation_mode == "human2machine":
            return translate_text(question, human_language, computer_language)
        elif translation_mode == "machine2human":
            return question
        elif translation_mode == "twoway":
            return translate_text(question, human_language, computer_language)

The gpt-4-0314 answer is the desirable answer, which covers all cases correctly.

Topic		Replies	Views
O1 is useless (for us and our use cases) Community api	32	9843	November 3, 2024
How to prevent ChatGPT from answering questions that are outside the scope of the provided context in the SYSTEM role message? API	53	169096	December 2, 2023
Need Help With Prompts? Ask me* Prompting chatgpt	149	18465	February 6, 2024
How can I use Embeddings with Chat GPT 3-5 Turbo Prompting	39	47312	December 12, 2023
GPT Struggles to Respond with Same Number of Translations I Give It API gpt-4	14	1575	December 24, 2023