For job descriptions like: “Snr Stage Hd” or “Ftr/Trnr” which have been inputted by humans using various versions of abbreviations I want to output as Python dictionary. E.g {“Snr Stage Hd”:“Senior Stage Hand”, “Ftr/Trnr”:“Fitter Turner”}
My code:
# jobDescriptionCompletions.py --- A script to supply a list of abbreviated/corrupted/errored job descriptions to the OpenAI API for completion
# The data is supplied in a text file and returned in a file formatted as a Python dictionary
import traceback
try:
# Imports
from openai import OpenAI
# File paths & envs
dictionaryPath = 'C:\\test\\temp\\'
# Writing processed Electoral Roll pages to this location
writeFilePath = dictionaryPath
# Read the instruction from the text file
with open("C:\\test\\test\\all_instruction.txt", 'r') as file:
user_instruction = file.read().strip()
# Read additional data from a text file (e.g., data.txt)
with open("C:\\test\\test\\data1.txt", 'r') as file:
additional_data = file.read().strip()
# Combine the instruction and the additional data
combined_input = f"{user_instruction}\n\nHere is the data to be operated on:\n{additional_data}"
client = OpenAI()
completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{
"role": "user",
"content": combined_input
}
]
)
writeFileName = writeFilePath + 'jobCompletionsDictionary.txt'
# Redirect the output to a file
with open(writeFileName, 'w') as output_file:
output_file.write(completion.choices[0].message.content)
# Optionally print a confirmation message
print("The completion content has been written to: " + writeFileName)
except Exception as e:
# to get detailed traceback
print("Traceback from jobDescriptionCompletions.py")
print(e)
traceback.print_exc()
Instructions:
Convert the data supplied into a Python dictionary. Use the supplied list as dictionary keys and a corrected version of each as dictionary values:
The output is basically correct. The issues are:
Variations - Sometimes fairly concise other times there are additional explanations, comments and suggestions
-
Format - Python dictionary one time. Python function including the dictionary another
-
Surplus data - Sometimes repeating the input data. Sometimes more, other cases, sometimes less additional commentary
-
Noise - Additional ‘chatty’ text not required
What should I do to get more consistent output?