I need help creating a prompt for GPT to complete a specific column on a datasheet

Hi,
I have a datasheet from a multiple-choice question bank. The sheet has a few columns, but the more important ones are ‘question’, ‘choices’, ‘correct answer’ and ‘explanation’. The ‘explanation’ column is empty and I need chatGPT to write it based on the correct answer given on the other column. There are roughly one thousand lines, so doing it manually would be a nightmare so I’d like to try to automate it. I’m happy to share the file if needed.

Thanks!

This is something that an AI can do, but not something that can be fully automated with the ChatBot that is ChatGPT on the web. You would have to paste the job in chunks that are the size of the AI comprehension and output length, each time starting anew with another prompt an another set of data.

It is something you can ask ChatGPT - about how to use the programmatic API to perform this task, where you pay for the actual data input and output. After warming up the AI to several tasks and generating several synthetic files…this is what the AI will complete when given complete specification, knowing what is in the realm of possible.


To create a fully operational solution as per your requirements, I’ve developed a comprehensive Python script. This script includes all the necessary functionalities:

  1. Reads the input CSV and splits the data into two separate files based on whether an explanation exists.
  2. Chunks the data needing explanations into individual CSV files, each with a maximum of 20 rows, including headers.
  3. Converts these chunks into markdown table format.
  4. Hypothetically sends this markdown to an AI function that completes the table with explanations.
  5. Converts the markdown with explanations back into CSV format, ensuring the column schema is preserved.
  6. Processes all chunks in this manner.
  7. Assembles the final CSV, combining rows that initially had explanations with those that have been processed by the AI.
import csv
import os
import pandas as pd

def read_csv(file_name):
    return pd.read_csv(file_name)

def write_csv(df, file_name):
    df.to_csv(file_name, index=False)

def split_data(df, explanation_col='Explanation'):
    has_explanation = df.dropna(subset=[explanation_col])
    needs_explanation = df[df[explanation_col].isna()]
    return has_explanation, needs_explanation

def chunk_data(df, chunk_size=20):
    return [df[i:i + chunk_size] for i in range(0, df.shape[0], chunk_size)]

def df_to_markdown(df):
    markdown = df.to_markdown(index=False)
    return markdown

def markdown_to_df(markdown, cols):
    from io import StringIO
    data = pd.read_csv(StringIO(markdown), sep="|").iloc[:, 1:-1]  # remove padding columns
    data.columns = data.columns.str.strip()  # strip whitespace from headers
    return data[cols]  # ensure original column order

def hypothetical_ai_function(markdown):
    # Placeholder for the actual AI function call
    # This function is supposed to update the markdown with explanations
    # For demonstration, we simply return the original markdown
    return markdown

def validate_and_merge_chunks(chunks, original_cols):
    validated_chunks = []
    for chunk in chunks:
        if set(chunk.columns) == set(original_cols):
            validated_chunks.append(chunk)
        else:
            print("Chunk validation failed due to mismatched columns")
    return pd.concat(validated_chunks, ignore_index=True)

def process_chunks(needs_explanation, original_cols):
    chunks = chunk_data(needs_explanation)
    processed_chunks = []
    for chunk in chunks:
        markdown = df_to_markdown(chunk)
        updated_markdown = hypothetical_ai_function(markdown)
        updated_df = markdown_to_df(updated_markdown, original_cols)
        processed_chunks.append(updated_df)
    return processed_chunks

def main():
    input_file = 'questions.csv'
    output_file = 'completed_questions.csv'
    has_explanation_file = 'has_explanation.csv'
    needs_explanation_file = 'needs_explanation.csv'

    df = read_csv(input_file)
    original_cols = df.columns.tolist()

    has_explanation, needs_explanation = split_data(df)
    write_csv(has_explanation, has_explanation_file)
    write_csv(needs_explanation, needs_explanation_file)

    processed_chunks = process_chunks(needs_explanation, original_cols)
    validated_needs_explanation = validate_and_merge_chunks(processed_chunks, original_cols)

    final_df = pd.concat([has_explanation, validated_needs_explanation], ignore_index=True)
    write_csv(final_df, output_file)

if __name__ == "__main__":
    main()

This script uses the Pandas library for efficient data manipulation, which you should install if not already available in your environment (pip install pandas). Remember to replace the hypothetical_ai_function with the actual AI function call and ensure that the AI function correctly updates the markdown with explanations.

The script ensures that the original column order is maintained throughout the process and validates the schema of the chunks after converting them back from markdown to CSV. If a chunk’s columns do not match the original, it’s excluded from the final assembly to prevent data corruption.

Finally, the script assembles the completed CSV by combining rows that initially had explanations with the newly processed rows, ensuring the integrity of the final dataset.


That kind of AI code needs debugging of course.

Then the next part of the function would be increasing the understanding that your AI has of the task. Markdown is more native and trained to recent AI than raw CSV files, which is why I convert. Here for example we can use as part of training and demonstrating for an AI our own markdown (which the forum formats into tables:

Topic Source Skill Level Question Choices Correct Answer Explanation
Math Book A Beginner What is 2 + 2? A. 3, B. 4, C. 5 B
Sci Web B Intermediate What gas do plants produce? A. CO2, B. O2, C. N2 B
Hist Book C Advanced Who discovered America? A. Columbus, B. Vespucci, C. Cook A
Geo Web D Beginner Capital of France? A. Rome, B. Paris, C. Berlin B
Eng Book E Intermediate Synonym of ‘Quick’? A. Fast, B. Slow, C. Moderate A
Math Web F Advanced Square root of 81? A. 7, B. 8, C. 9 C
Sci Book G Beginner Water’s freezing point? A. 0°C, B. 32°F, C. 100°C A
Hist Web H Intermediate First US president? A. Adams, B. Washington, C. Jefferson B
Geo Book I Advanced Longest river? A. Nile, B. Amazon, C. Yangtze B
Eng Web J Beginner Antonym of ‘Ancient’? A. Old, B. Modern, C. Traditional B
Math Book K Intermediate 5 * 5? A. 20, B. 25, C. 30 B
Sci Web L Advanced Heaviest planet? A. Jupiter, B. Mars, C. Earth A
Hist Book M Beginner City of the Titanic launch? A. London, B. Belfast, C. Liverpool B
Geo Web N Intermediate Largest desert? A. Sahara, B. Arctic, C. Gobi A
Eng Book O Advanced ‘Incredible’ means? A. Unbelievable, B. Normal, C. Weak A
Math Web P Beginner Half of 14? A. 5, B. 6, C. 7 C
Sci Book Q Intermediate Main component of the sun? A. Oxygen, B. Hydrogen, C. Nitrogen B
Hist Web R Advanced Invented the telephone? A. Edison, B. Bell, C. Tesla B
Geo Book S Beginner Smallest ocean? A. Indian, B. Arctic, C. Atlantic B
Eng Web T Intermediate ‘Ephemeral’ means? A. Eternal, B. Fleeting, C. Strong B
Math Book U Advanced 9^2 equals? A. 71, B. 81, C. 91 B
Sci Web V Beginner Humans breathe in? A. Oxygen, B. Carbon Dioxide, C. Nitrogen A
Hist Book W Intermediate Built the Great Wall? A. Ming Dynasty, B. Qing Dynasty, C. Han Dynasty A
Geo Web X Advanced Highest mountain? A. Everest, B. K2, C. Kilimanjaro A
Eng Book Y Beginner ‘Lucid’ means? A. Clear, B. Dark, C. Complicated A

And then even have it create the final form…

Topic Source Skill Level Question Choices Correct Answer Explanation
Math Book A Beginner What is 2 + 2? A. 3, B. 4, C. 5 B The sum of 2 and 2 is 4, which is basic arithmetic.
Sci Web B Intermediate What gas do plants produce? A. CO2, B. O2, C. N2 B During photosynthesis, plants take in CO2 and release O2 as a byproduct.
Hist Book C Advanced Who discovered America? A. Columbus, B. Vespucci, C. Cook A Columbus is credited with discovering America in 1492, though it was already inhabited.
Geo Web D Beginner Capital of France? A. Rome, B. Paris, C. Berlin B Paris is the capital and most populous city of France.
Eng Book E Intermediate Synonym of ‘Quick’? A. Fast, B. Slow, C. Moderate A ‘Fast’ is a synonym of ‘Quick’, both meaning rapid in movement or action.
Math Web F Advanced Square root of 81? A. 7, B. 8, C. 9 C The square root of 81 is 9, as 9 times 9 equals 81.
Sci Book G Beginner Water’s freezing point? A. 0°C, B. 32°F, C. 100°C A The freezing point of water is 0°C or 32°F, marking the transition from liquid to solid state.
Hist Web H Intermediate First US president? A. Adams, B. Washington, C. Jefferson B George Washington was the first President of the United States, serving from 1789 to 1797.
Geo Book I Advanced Longest river? A. Nile, B. Amazon, C. Yangtze B The Amazon River is considered the longest river in the world by some measurements.
Eng Web J Beginner Antonym of ‘Ancient’? A. Old, B. Modern, C. Traditional B ‘Modern’ is the opposite of ‘Ancient’, referring to something that is current or recent.
Math Book K Intermediate 5 * 5? A. 20, B. 25, C. 30 B Multiplying 5 by 5 gives 25, which is basic multiplication.
Sci Web L Advanced Heaviest planet? A. Jupiter, B. Mars, C. Earth A Jupiter is the heaviest planet in our solar system, with a mass more than two and a half times that of all other planets combined.
Hist Book M Beginner City of the Titanic launch? A. London, B. Belfast, C. Liverpool B The Titanic was built and launched from Belfast, Northern Ireland.
Geo Web N Intermediate Largest desert? A. Sahara, B. Arctic, C. Gobi A The Sahara is the largest hot desert in the world, covering much of North Africa.
Eng Book O Advanced ‘Incredible’ means? A. Unbelievable, B. Normal, C. Weak A ‘Incredible’ means something so extraordinary that it is hard to believe.
Math Web P Beginner Half of 14? A. 5, B. 6, C. 7 C Half of 14 is 7, which is a simple division.
Sci Book Q Intermediate Main component of the sun? A. Oxygen, B. Hydrogen, C. Nitrogen B The Sun is primarily composed of hydrogen, which undergoes nuclear fusion to produce energy.
Hist Web R Advanced Invented the telephone? A. Edison, B. Bell, C. Tesla B

| Alexander Graham Bell is credited with inventing the first practical telephone in 1876. |
| Geo | Book S | Beginner | Smallest ocean? | A. Indian, B. Arctic, C. Atlantic| B | The Arctic Ocean is the smallest of the world’s five major oceans. |
| Eng | Web T | Intermediate| ‘Ephemeral’ means? | A. Eternal, B. Fleeting, C. Strong| B| ‘Ephemeral’ means lasting for a very short time, highlighting transience. |

These explanations provide a brief rationale for why each answer is correct, fitting the educational context of a question bank.


That simulated data gives us several different methods to “prompt” the AI. We can just ask and get the data. We can demonstrate the successful completion of the task with a prior user input and how the AI fixes it up, so the AI learns better before starting.

So then: is this something that you want to paste in 50 different chats with what you’ve got in ChatGPT, or do you want to begin programming the API and buy credits for its use?

2 Likes