Error when upload files using CLI or when trying to use API

mattrosine · November 9, 2023, 2:47pm

I’m not sure where to post this, so I’ll try here. I’m having problems using both the new File upload and Fine-Tuning interface, as well as trying to do the same tasks using python on the API.

I am trying to use my dataset of copywriting to train a model.

I have prepared it using the guidelines in the documentation and have saved as jsonl. Here is an example of a small section of the dataset:

{“messages”: [
{“role”: “system”, “content”: “As a specialist copywriter, write a detailed, well-researched B2B article about the given title.”},
{“role”: “user”, “content”: “The title is: Specialist Network Services and the Digital Transformation Revolution”},
{“role”: “assistant”, “content”: “Whether they realised it or not, strict work-from-home mandates during the pandemic have forced most businesses into the early stages of digital transformation. This generally involved little more than ensuring line-of-business applications could be made available to employees remotely. Very little serious thought was given to the specialist network services required for optimal performance.\n\nPermanent changes ahead\nThe shift to homeworking has had a profound effect on the workforce. Many employees are now demanding more flexible conditions, spending at least some of the week away from the office. Some organisations may be resistant, but as demand for skilled workers continues to climb, ignoring the issue is no longer an option.\n\n”}
]}

When I try to fine-tune using the new interface, or whether I try and upload a file, I get this error message:

There was an error uploading the file: Unexpected file format, expected either prompt/completion pairs or chat messages.

I’m trying to understand what’s going on. The file is in the correct jsonl format and I have used the new data preparation guidelines for gpt3-turbo which specifically say it shouldn’t be in prompt/completion pairs anymore. This is the example data from the documentation:

{
  "messages": [
    { "role": "system", "content": "You are an assistant that occasionally misspells words" },
    { "role": "user", "content": "Tell me a story." },
    { "role": "assistant", "content": "One day a student went to schoool." }
  ]
}

So am I doing something wrong, or is this related to the outages?

Similarly, when I try to do this using the API instead, I can’t even get past the token counting too. This is the script I’m using (with my personal data removed obviously):

from openai import OpenAI
import json

# Instantiate the OpenAI client with your API key
client = OpenAI(api_key='your_api_key_here')

# Function to count tokens in a JSONL file using the OpenAI API for chat models
def count_tokens(file_path):
    total_tokens = 0
    with open(file_path, 'r', encoding='utf-8') as file:
        for line in file:
            data = json.loads(line)
            text = data['content']
            # Call the chat completions create method on the client instance
            response = client.ChatCompletion.create(
                model="gpt-3.5-turbo",
                messages=[{"role": "system", "content": text}], # System role is used for the context setup
                max_tokens=0
            )
            total_tokens += response['usage']['total_tokens']
    return total_tokens

# Replace 'your_training_file.jsonl' and 'your_validation_file.jsonl' with the paths to your files
training_tokens = count_tokens('your_training_file.jsonl')
validation_tokens = count_tokens('your_validation_file.jsonl')

print(f"Total training tokens: {training_tokens}")
print(f"Total validation tokens: {validation_tokens}")

From this I get back:

Preformatted textAttributeError: ‘OpenAI’ object has no attribute ‘ChatCompletion’

I ask GPT4 for help with all of the documentation upload so it can read it, and it doesnt understand whats going wrong either.

Can anyone plleeaasseee shed some light on this!

Topic		Replies	Views
Can't upload file for fine tuning 3.5. Data format is okay API	3	1298	December 17, 2023
Unable to Upload fine-tune file for gpt 3.5 turbo API fine-tuning	6	1842	December 15, 2023
Invalid fine tuning training file even with a 34 character file that validates API	2	199	May 25, 2024
Cannot Upload File with Python or cURL API	6	1859	March 22, 2023
Can someone help me (with fine-tuning) API fine-tuning , api , help-needed	13	2508	April 6, 2024

Error when upload files using CLI or when trying to use API

Related topics