Issue with Consistent Use of Predefined Nomenclature in JSON Responses by OpenAI Assistant

I am encountering a significant challenge with an OpenAI Assistant that I have developed to identify and map categories of requests based on a predefined manual (JSON file). The assistant is supposed to utilize this JSON document as a reference to map the identified requests to the standardized nomenclatures. Here is a detailed description of the problem and the approach taken so far:

Problem Description

The assistant is designed to read documents, identify the categories of requests made, and map them to the exact nomenclatures predefined in a JSON file. The JSON file (categories.json) includes two key fields for each category:

  1. Category Name (“CategoryName”): This field contains the exact nomenclature that should be used in the responses.
  2. Examples/Instructions (“ExamplesInstructions”): This field provides context and examples for when the category is applicable.

Despite clearly instructing the assistant to use only the nomenclature from the “CategoryName” field, it sometimes uses terms or phrases from the “ExamplesInstructions” field in its responses. This inconsistency leads to the use of incorrect or unrecognized nomenclature, which is not accepted by our system.

Approach Taken

Instructions Provided to the Assistant

The assistant was given the following detailed instructions to ensure the correct mapping:

  1. Analyze the document details thoroughly and identify all the requests made.
  2. Compare each identified request with the entries in the provided JSON file.
  3. Use only the exact nomenclature from the “CategoryName” field for mapping.
  4. Avoid using any terms or phrases from the “ExamplesInstructions” field in the final JSON response.
  5. Ensure no duplicate or overlapping requests are included.
  6. Prefer double requests over single ones if specified.
  7. Format the values correctly, without punctuation for thousands and using two decimal places for cents.

Script Used

Here is the Python script used to map the requests:

import json

# Load the JSON file containing the categories
def load_categories(file_path):
    with open(file_path, 'r') as file:
        return json.load(file)

# Function to check correspondence
def check_correspondence(request, category):
    return category['CategoryName'].lower() in request['description'].lower()

# Function to map requests
def map_requests(document_requests, categories):
    mapped_requests = []
    for request in document_requests:
        found = False
        for category in categories:
            if check_correspondence(request, category):
                mapped_requests.append({
                    "CategoryName": category['CategoryName'],
                    "Value": request.get('value', 0.00) if request.get('value') is not None else 0.00
                })
                found = True
                break
        if not found:
            mapped_requests.append({
                "CategoryName": "Category Not Found",
                "Value": request.get('value', 0.00) if request.get('value') is not None else 0.00
            })
    return mapped_requests

# Path to the JSON file containing the categories
file_path = '/mnt/data/categories.json'

# Load the categories
categories = load_categories(file_path)

# Example of document requests
document_requests = [
    {"description": "Request for data provision", "value": None},
    {"description": "Compensation for moral damages", "value": 5000.00},
    # Add more requests as necessary
]

# Map the requests
mapped_requests = map_requests(document_requests, categories)

# Result in JSON format
result_json = json.dumps(mapped_requests, indent=4)
print(result_json)

Issues Faced

Despite the clear instructions and the script, the assistant continues to use incorrect nomenclature from the “ExamplesInstructions” field instead of strictly adhering to the “CategoryName” field. This inconsistency in responses is problematic as our system does not recognize these variations.

Request for Assistance

I am seeking advice or solutions from anyone who has successfully implemented a similar system. How can I ensure that the assistant consistently uses only the predefined nomenclature from the “CategoryName” field? Any suggestions, adjustments to the script, or alternative approaches would be greatly appreciated.

Thank you for your assistance.