Forcing consistency of ChatGPT API in the acquisition and consolidation of TSV Files

I am trying to get TSV files from the ChatGPT API with the following prompt for instance: “I need a TSV file with the following columns…”

I do not receive consistent tables with the same syntax so it is more challenging to parse and consolidate lists of tables not consistent in their syntax.

Is there a way to always have the same kind of TSV results - like to force ChatGPT to be consistent in the TSV syntax returned?

Thanks for your help!

1 Like

Just use 1 example of the output.

Well, on my side, my results are not consistent… sometimes I have commas, sometimes it’s tabs, sometimes pipes… with or without spaces or other characters… Let me know if you tested it with a loop and if you have always the same syntax… it’s not my case! :sweat: It’s like ChatGPT has a free/flexible interpretation of what is a TSV format.

If you are to put off by past failures, you could always just make it CSV, split by comma, strip leading and trailing white space, then convert to TSV.

1 Like

I did go to the TSV format because of same kind of bad results with CSV… :sweat_smile: I think it is more a problem of interpretation and consistency than really a problem of format… Maybe, it can be related to the dynamic/changing learning/generation features of ChatGPT… Consistent returned TSV would be efficient as feature I think!

Try this

I will provide you with a list of comma separated columns. You will need to output them as a list of tab separated values.

Here is a valid example of the output
output: name\tsalary\tage

Continue from this point in pre-training the AI, and see if you don’t get better results:

ChatGPT with TSV pretraining

Following this idea, I built a Python function. The consistency is not solved on my side. Please let me know if you have better results or if you have magical ideas! :slight_smile: :

def ask_chatgpt_a_semantic_triplestore_about_topics(topics, api_key):
    import openai
    import pandas as pd
    from io import StringIO

    # Function to generate a semantic triplestore in TSV format using GPT-3
    def generate_semantic_triplestore(topic):
        prompt = f"Tab-separated values (TSV) is a simple, text-based file format for storing tabular data. Records are separated by newlines, and values within a record are separated by tab characters. Build an extensive semantic triplestore of 3 columns (subject, predicate, object) about {topic} in TSV format. Subjects and objects should be noun phrases and predicates should be verb phrases."
        response = openai.Completion.create(
            max_tokens=150,  # Adjust max tokens as needed
        return response.choices[0].text.strip()

    triplestore_df = pd.DataFrame()
    for t in topics:
        answer = generate_semantic_triplestore(t)
        temp_df = pd.read_csv(StringIO(answer), sep='\t', header=0, index_col=0)
        triplestore_df = pd.concat([triplestore_df, temp_df])

The challenge you will have, and what I tried to pretrain with my chat share, is that tab separated value files are not strictly about inserting a single tab character instead of a comma and space character. It is that they should include the presentation where the columns are aligned if seen on a monospace display.

This is made difficult by tab tokens not representing a fixed number of spaces when rendered, and being often combined or not combined with more tabs or other characters when encoded as tokens. The AI is basically destined to fail at the task.

You might write a function “csv_to_tsv” that the AI can call and from which it must repeat verbatim (or just grab its output). It would have two distinct uses, if you want just literally “tab-separated” or if you need columns aligned when printed to a device with a particular indent position per tab character.

I just had a chatbot answer about what python facilities are available for the latter:

Python’s built-in csv module can be used to create TSV files. Here’s how you can do it:

  1. To simply insert tabs instead of commas in tables, you can use the csv.writer object with the delimiter parameter set to ‘\t’ (which represents a tab character). Here’s an example:
import csv

rows = [['Name', 'Age', 'Profession'], ['John', '25', 'Engineer'], ['Doe', '30', 'Doctor']]

with open('output.tsv', 'w', newline='') as f_output:
    tsv_output = csv.writer(f_output, delimiter='\t')
  1. For the output format that considers the monospace width of text columns, Python does not provide a built-in way to do this. However, you can write a custom function to calculate the length of each cell data and insert the required number of tabs. Here’s a simple example:
def write_tsv_with_formatting(rows, filename):
    max_lengths = [max(map(len, col)) for col in zip(*rows)]
    with open(filename, 'w', newline='') as f_output:
        for row in rows:
            line = '\t'.join(f"{val}\t{':' * (length - len(val) // 4)}"
                              for val, length in zip(row, max_lengths))
            f_output.write(line + '\n')

rows = [['Name', 'Age', 'Profession'], ['John', '25', 'Engineer'], ['Doe', '30', 'Doctor']]
write_tsv_with_formatting(rows, 'output.tsv')

In this example, we first calculate the maximum length of each column, then for each row, we add extra tabs based on the difference between the maximum length and the length of the current cell. The // 4 part is to account for the fact that a tab character is typically equivalent to 4 spaces.

Also, AI answers knowledge retrieved from the back of my brain upon pondering output devices and obsolete formats you could punch card and run to a chain printer:

Bot: blah blah…

Here are some examples of how to set tab stops on these printers:

  • On a dot matrix or daisy wheel printer, you might send a control code like ESC D n1 n2 n3 0, where ESC D is the command to set horizontal tab stops, and n1, n2, n3, etc., are the column positions for the tab stops.
  • On a laser or inkjet printer using PostScript, you might set tab stops by including a command like /tabs [n1 n2 n3] def in the PostScript code, where n1, n2, n3, etc., are the positions for the tab stops.

Following this discussion : ⛺️ Cheat Sheet: Mastering Temperature and Top_p in ChatGPT API (a few tips and tricks on controlling the creativity/deterministic output of prompt responses.)

I added these parameters:

        temperature = 0.2, 
        top_p = 0.1,