Best approach for JSON generation

Relatively new to OpenAI.

I’d like to be able to provide a prompt (e.g. “George Washington”) and receive a JSON object back with various details (e.g. {“firstName”: “George”, “lastName”: “Washington”, “yearOfBirth”: 1732, “yearOfDeath”: 1799, “bio”: “First President of the United States”}).

ChatGPT seems to handle this well. I’ve tried fine-tuning GPT-3 to mixed results. Thinking this might be better suited for Codex but unaware of how to fine-tune that model. Curious as to what would be some best practices for this use case.

Thanks

1 Like

I’ve been trying to do the same using GPT-3 and prompting with a sample has helped a lot. The return from the API is a string format, which can easily be read as a JSON using json.loads if you’re using python.

2 Likes

Not sure how large your JSON objects are, but how many samples are you squeezing into the prompt? Which model are you prompting against?

I just using a single sample in the prompt, with around 7 key-value pairs. Prompting da-vinci-003

This is easy to do with the davinci base model. You cannot fine-tune any codex models at this time.

I releases a tutorial / lab experiment on how to fine-tune a single-line JSONL training file using the n_epochs parameter yesterday. You might find the results useful.

Hope this helps.

2 Likes

Can you provide sample training data? I’m trying to figure out how to train my model with sample json and the api doesn’t seem to like json as a property inside of the overall jsonL file.

I don’t provide them as a part of the jsonL file, but more so as a part of the prompt message.

Yeah, exactly. Include a reference JSON object in your prompt (or as many as you can fit) and it should conform to that. It might forget after a while, so you may need to periodically refresh context.