Datafile for Fine-Tuning Chat models

_j · October 19, 2024, 1:14pm

If the data is proven to work - we must assume it is the code.

model: gpt-4o-mini-2024-07-18 is recommended, free to train right now up to 2M tokens, and will have a longer lifespan.

We should eliminate the openai library, which may be obsolete on another’s Python platform. Let’s use the requests module, and the environment API key.

데이터가 제대로 작동하는 것으로 확인되면, 문제는 코드에 있다고 가정해야 합니다.

모델: gpt-4o-mini-2024-07-18을 추천합니다. 무료로 학습 가능하며, 더 긴 수명을 가질 것입니다.

다른 사람의 Python 환경에서 openai 라이브러리가 오래되어 사용할 수 없을 가능성이 있으니 이를 제거하는 것이 좋습니다. 대신 requests 모듈과 환경 변수에 저장된 API 키를 사용합시다.

import os
import requests
import time

# File path to the training data
jsonl_file_path = '/content/drive/My Drive/ABC Crew 2조/nl_sql_pairs.jsonl'

# parameters
SUFFIX = "city"  # suffix to add to model name
MODEL = "gpt-4o-mini-2024-07-18"
EPOCHS = 1  # passes through file

# Get the API key from environment variable
api_key = os.getenv('OPENAI_API_KEY')
if not api_key:
    print("Error: OPENAI_API_KEY environment variable is not set.")
    exit(1)

# Define headers for authentication
headers = {
    "Authorization": f"Bearer {api_key}"
}


# Step 1: Upload the training file
def upload_file(file_path):
    upload_url = 'https://api.openai.com/v1/files'
    with open(file_path, 'rb') as f:
        files = {
            'file': (os.path.basename(file_path), f),
            'purpose': (None, 'fine-tune')
        }
        print("Uploading file...")
        response = requests.post(upload_url, headers=headers, files=files)
    if response.status_code != 200:
        print(f"Error uploading file: {response.status_code} {response.text}")
        exit(1)
    file_info = response.json()
    file_id = file_info['id']
    print(f"File uploaded successfully. File ID: {file_id}")
    return file_id

# Step 2: Initiate the fine-tuning job
def start_fine_tuning(file_id):
    fine_tune_url = 'https://api.openai.com/v1/fine_tuning/jobs'
    data = {
        "training_file": file_id,
        "model": MODEL,
        "suffix": SUFFIX,
        "hyperparameters": {
          "n_epochs": EPOCHS
        }
    }
    print("Starting fine-tuning job...")
    response = requests.post(fine_tune_url, headers=headers, json=data)
    if response.status_code != 200:
        print(f"Error starting fine-tuning job: {response.status_code} {response.text}")
        exit(1)
    job_info = response.json()
    job_id = job_info['id']
    print(f"Fine-tuning job started. Job ID: {job_id}")
    return job_id

# Step 3: Monitor the fine-tuning job status
def monitor_fine_tuning(job_id):
    job_status_url = f'https://api.openai.com/v1/fine_tuning/jobs/{job_id}'
    while True:
        response = requests.get(job_status_url, headers=headers)
        if response.status_code != 200:
            print(f"Error retrieving fine-tuning job status: {response.status_code} {response.text}")
            exit(1)
        job_status = response.json()
        status = job_status.get('status')
        print(f"Fine-tuning job status: {status}")
        if status in ['succeeded', 'failed', 'cancelled']:
            print("Fine-tuning job has completed.")
            if status == 'failed':
                error = job_status.get('error')
                if error:
                    print(f"Error details: {error}")
            break
        else:
            time.sleep(10)

# Main execution flow
if __name__ == '__main__':
    file_id = upload_file(jsonl_file_path)
    job_id = start_fine_tuning(file_id)
    monitor_fine_tuning(job_id)

This performs similarly on the file just demonstrated:

Uploading file...
File uploaded successfully. File ID: file-FAaYf0NmwxOHi3cP4q5wXot4
Starting fine-tuning job...
Fine-tuning job started. Job ID: ftjob-qK7jmI2234jd3fxi7GsLB8t8
Fine-tuning job status: validating_files
Fine-tuning job status: running
Fine-tuning job status: running
Fine-tuning job status: running
Fine-tuning job status: running
Fine-tuning job status: running
Fine-tuning job status: running
Fine-tuning job status: succeeded
Fine-tuning job has completed.

Topic		Replies	Views
Fine tuning a model for customer service for our specific app Prompting	23	14131	May 14, 2024
I want to get json format response which I can pass using gpt-4 model. Also I want to give my prompt to get json data Prompting gpt-4	14	20620	October 26, 2023
Finetuning not working? Prompting	8	2553	December 24, 2023
Fine-tuned model sometimes repeats itself verbatim Prompting	10	3510	November 6, 2023
Different results: ChatGPT3.5 vs API (gpt-3.5-turbo) API	53	26884	January 17, 2025

Datafile for Fine-Tuning Chat models

Related topics