Best approach for JSON generation

Relatively new to OpenAI.

I’d like to be able to provide a prompt (e.g. “George Washington”) and receive a JSON object back with various details (e.g. {“firstName”: “George”, “lastName”: “Washington”, “yearOfBirth”: 1732, “yearOfDeath”: 1799, “bio”: “First President of the United States”}).

ChatGPT seems to handle this well. I’ve tried fine-tuning GPT-3 to mixed results. Thinking this might be better suited for Codex but unaware of how to fine-tune that model. Curious as to what would be some best practices for this use case.


1 Like

I’ve been trying to do the same using GPT-3 and prompting with a sample has helped a lot. The return from the API is a string format, which can easily be read as a JSON using json.loads if you’re using python.


Not sure how large your JSON objects are, but how many samples are you squeezing into the prompt? Which model are you prompting against?

I just using a single sample in the prompt, with around 7 key-value pairs. Prompting da-vinci-003

This is easy to do with the davinci base model. You cannot fine-tune any codex models at this time.

I releases a tutorial / lab experiment on how to fine-tune a single-line JSONL training file using the n_epochs parameter yesterday. You might find the results useful.

Hope this helps.


Can you provide sample training data? I’m trying to figure out how to train my model with sample json and the api doesn’t seem to like json as a property inside of the overall jsonL file.

I don’t provide them as a part of the jsonL file, but more so as a part of the prompt message.

Yeah, exactly. Include a reference JSON object in your prompt (or as many as you can fit) and it should conform to that. It might forget after a while, so you may need to periodically refresh context.

As this comes on top of search engines, I thought to link here

And also

JSON mode is currently enabled for gpt-4-turbo-preview and gpt-3.5-turbo-0125