Evaluate performance of fintuned GPT-4 mini Text data

Hi there,

I am currently fine-tuning GPT-4 Mini via the OpenAI dashboard for a binary classification project. Once the model is ready, I plan to evaluate its performance using a test dataset of 1,000 samples. However, since the free request limit is set to 200 requests per day, I have restricted my evaluation to 200 texts.

I encountered the following error:

RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4o-mini in organization org-jg2v1DbkC2MArlIpxJtnPnze on requests per day (RPD): Limit 200, Used 200, Requested 1. Please try again in 7m12s. Visit https://platform.openai.com/account/rate-limits to learn more. You can increase your rate limit by adding a payment method to your account at https://platform.openai.com/account/billing.', 'type': 'requests', 'param': None, 'code': 'rate_limit_exceeded'}}

I have attempted to use the delayed_completion function as described in the OpenAI Cookbook, but unfortunately, it did not resolve the issue.

Also, as recommended I put the max token to 1.

Is there any way to manage the rate limits effectively using free requests without increasing the limit? Any suggestions or alternatives would be greatly appreciated.

Thank you!

Here is my code:

import time


# Calculate the delay based on your rate limit
rate_limit_per_minute = 20
delay = 60.0 / rate_limit_per_minute

# Define a function that adds a delay to a Completion API call
def delayed_completion(delay_in_seconds: float = 1, **kwargs):
    """Delay a completion by a specified amount of time."""
    # Sleep for the delay
    time.sleep(delay_in_seconds)
    # Call the Completion API and return the result
    return client.chat.completions.create(**kwargs)

# Function to classify a text using the fine-tuned OpenAI model
def classify_text(text):
    response = delayed_completion(
        delay_in_seconds=delay,
        model=model_id,
        messages=[
            {"role": "system", "content": "Your task is to analyze the text and determine if it contains elements of propaganda. Based on the instructions, analyze the following 'text' and predict whether it contains the use of any propaganda technique. Return only predicted label. ['true', 'false']."},
            {"role": "user", "content": text}
        ],
        temperature=0,
        max_tokens=1
    )
    # Extract the prediction from the response
    prediction = response.choices[0].message.content
    return 1 if prediction.strip() == "true" else 0



# Collect predictions
predictions = [classify_text(text) for text in texts]



# Compute Precision, Recall, F1 for Macro and Micro averages
precision_macro, recall_macro, f1_macro, _ = precision_recall_fscore_support(true_labels, predictions, average='macro')
precision_micro, recall_micro, f1_micro, _ = precision_recall_fscore_support(true_labels, predictions, average='micro')

# Display the results
print(f"Macro-F1: {f1_macro:.4f}")
print(f"Micro-F1: {f1_micro:.4f}")

I appreciate any help!

Could anyone give me advice, please?

Hi!
I suggest using the OpenAI evals framework since it already has automated back-off and retry mechanisms implemented.

Your use case is relatively simple, considering you want to test a binary classifier.

There’s also a cookbook example to help you with basic eval templates and to get started quickly.

While the eval is running, you’ll have some time to focus on other tasks due to rate limits, but overall, it’s a straightforward solution.

Hi @vb

I’m new to using OpenAI models, so I apologize for asking what might seem like basic questions. I’m just starting my learning journey and am eager to learn more. I appreciate your time in helping me!

Here’s a summary of what I’ve done so far:

  1. I used the Basic Eval Templates since my problem is deterministic.
  2. I created the eval dataset by converting the test dataset into JSONL format to match the eval dataset format.
  3. I created the eval registry by structuring a YAML file and registered the eval by adding a file to /evals/<eval_name>.yaml under the registry folder (default path). The file path is: /usr/local/lib/python3.10/dist-packages/evals/registry/evals/binaryClassificationEval.yaml.
  4. I did the same for my test data and ensured it is located in the data folder: /evals/registry/data/classificationEval/LLMBinaryTestEval.jsonl.

Here my yaml file:

“” classificationEval:
id: classificationEval.dev.v0
description: Eval for binary classification problems using several metrics.
disclaimer: Problems are solved using fine tuning. Evaluation is currently done through exact Match.
metrics: [accuracy]
classificationEval.dev.v0:
class: evals.elsuite.basic.match:Match
args:
samples_jsonl: classificationEval/LLMBinaryTestEval.jsonl
eval_type: classify
“”

However, when I try to evaluate my YAML file using the command:

!oaieval eval 'classificationEval'

I receive the following error:

[2024-09-06 08:18:54,418] [registry.py:271] Loading registry from /usr/local/lib/python3.10/dist-packages/evals/registry/evals
[2024-09-06 08:18:55,328] [registry.py:271] Loading registry from /root/.evals/evals
Traceback (most recent call last):
  File "/usr/local/bin/oaieval", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/evals/cli/oaieval.py", line 304, in main
    run(args)
  File "/usr/local/lib/python3.10/dist-packages/evals/cli/oaieval.py", line 131, in run
    eval_spec = registry.get_eval(args.eval)
  File "/usr/local/lib/python3.10/dist-packages/evals/registry.py", line 211, in get_eval
    return self._dereference(name, self._evals, "eval", EvalSpec)
  File "/usr/local/lib/python3.10/dist-packages/evals/registry.py", line 177, in _dereference
    alias = get_alias()
  File "/usr/local/lib/python3.10/dist-packages/evals/registry.py", line 169, in get_alias
    if isinstance(d[name], str):
KeyError: 'classificationEval.dev.v0'

To resolve this issue, I have tried the following steps:

  1. Verified the YAML file structure and confirmed it is correct.
  2. Checked file and directory permissions, which are also fine.

Could you provide any guidance on how to resolve this issue?

Thank you!

1 Like

Hi @lubna.henaki,

It looks like you’ve made good progress!
You can check the difference in naming when trying to execute:

!oaieval eval ‘classificationEval’

and when registering the eval:

classificationEval.dev.v0

This is where I would start looking.
In the meantime, you can also look at the other evals included in the package to compare their implementations with yours.

I hope this helps for now.
If you need me to, I can take a look later today.

2 Likes

Hi @vb

I resolved the previous issue related to classificationEval.dev.v0, Which is a relatively minor issue, but I learned a lot from it :grinning:

but I am encountering a new problem: ValueError: Could not find CompletionFn/Solver in the registry with ID eval.

To address this, I registered my completion function and created a YAML file with the following content:

classification/gpt-4o-mini:
  class: evals.completion_fns.basic:BasicCompletionFn
  args:
    completion_fn: ft:gpt-4o-mini-2024-07-18:ksu:binarypropaganda:A2evi4H3

I saved this YAML file under /usr/local/lib/python3.10/dist-packages/evals/completion_fns/classificationCF.yaml.

Despite this, I received the same error: ValueError: Could not find CompletionFn/Solver in the registry with ID eval.

Additionally, I encountered the same issue when trying to use !oaieval eval 'GPT-model-text-detection.dev.v0', resulting in:

ValueError: Could not find CompletionFn/Solver in the registry with ID eval

I attempted to use my fine-tuned model and gpt-4o-mini for the completion_fn, but the error persisted.

When I ran the command:

!oaieval gpt-4o-mini classificationEval --max_samples 25

I encountered the following issues:

[2024-09-06 14:36:37,830] [registry.py:271] Loading registry from /usr/local/lib/python3.10/dist-packages/evals/registry/evals
[2024-09-06 14:36:38,589] [registry.py:271] Loading registry from /root/.evals/evals
[2024-09-06 14:36:38,923] [oaieval.py:215] Run started: 2409061436387TVPDVL3
[2024-09-06 14:36:39,007] [data.py:94] Fetching /usr/local/lib/python3.10/dist-packages/evals/registry/data/classificationEval/LLMBinaryTestEval.jsonl
[2024-09-06 14:36:39,030] [eval.py:36] Evaluating 25 samples
[2024-09-06 14:36:39,035] [eval.py:144] Running in threaded mode with 10 threads!
  0% 0/25 [00:00<?, ?it/s]
Traceback (most recent call last):
  ...
  openai.NotFoundError: Error code: 404 - {'error': {'message': 'This is a chat model and not supported in the v1/completions endpoint. Did you mean to use v1/chat/completions?', 'type': 'invalid_request_error', 'param': 'model', 'code': None}}

The error message suggests that the model being used might not be compatible with the v1/completions endpoint and may require the v1/chat/completions endpoint instead.

I would appreciate any guidance on resolving the registry issues and ensuring the model is correctly configured for evaluation.

Again, big thanks to your help!

Unfortunately, the official evals repository does not yet support evaluating gpt-4o or gpt-4o-mini. However, with just a few modifications to the official repository, it can be made to work.

A PR has already been submitted for gpt-4o.

By using it as a reference and adding gpt-4o-mini to the list, you will be able to evaluate it as well.

For example, you can make changes like the following:

    "gpt-4o",
    "gpt-4o-mini",
    "gemini-pro",
    elif "gpt-4o" in spec["completion_fns"][0]:
        return "gpt-4o"
    elif "gpt-4o-mini" in spec["completion_fns"][0]:
        return "gpt-4o-mini"
       "gpt-4o": 128_000
       "gpt-4o-mini": 128_000

It’s something like the above.

You need to edit the repository to evaluate, but the changes themselves are not too extensive.

1 Like

Hi @dignity_for_all,

After three hours :smile:, thanks to your help, I finally got the accuracy results!

I have a question:

I tried running the following code:

!oaieval eval 'classificationEval.dev.v0'

And also:

!oaieval eval 'classificationEval'

However, I keep getting the following error:

ValueError: Could not find CompletionFn/Solver in the registry with ID eval

Here is my current completion_fns configuration:

gpt-4o-mini:
  class: evals.completion_fns.basic:BasicCompletionFn
  args:
    completion_fn: gpt-4o-mini

I was able to get the accuracy by running:

!oaieval gpt-4o-mini classificationEval --max_samples 25

But I still need to understand how to use my fine-tuned model.

I have already added it to the list of models in the registry, but I know I also need to add it similarly to how gpt-4o-mini is set up. This part of the process is still unclear to me.

Once again, thank you so much for your help, @vb @dignity_for_all!

@dignity_for_all I attempted to use the following command as mentioned in your previous post:

! oaieval ft:gpt-3.5-turbo-1106:organization:************:----------

However, I encountered the same error:

openai.NotFoundError: Error code: 404