Best approach for JSON generation

hermeticJay · February 14, 2023, 8:40am

Relatively new to OpenAI.

I’d like to be able to provide a prompt (e.g. “George Washington”) and receive a JSON object back with various details (e.g. {“firstName”: “George”, “lastName”: “Washington”, “yearOfBirth”: 1732, “yearOfDeath”: 1799, “bio”: “First President of the United States”}).

ChatGPT seems to handle this well. I’ve tried fine-tuning GPT-3 to mixed results. Thinking this might be better suited for Codex but unaware of how to fine-tune that model. Curious as to what would be some best practices for this use case.

Thanks

udm17 · February 14, 2023, 8:53am

I’ve been trying to do the same using GPT-3 and prompting with a sample has helped a lot. The return from the API is a string format, which can easily be read as a JSON using json.loads if you’re using python.

hermeticJay · February 14, 2023, 7:50pm

Not sure how large your JSON objects are, but how many samples are you squeezing into the prompt? Which model are you prompting against?

udm17 · February 15, 2023, 5:50am

I just using a single sample in the prompt, with around 7 key-value pairs. Prompting da-vinci-003

ruby_coder · February 15, 2023, 5:56am

This is easy to do with the davinci base model. You cannot fine-tune any codex models at this time.

I releases a tutorial / lab experiment on how to fine-tune a single-line JSONL training file using the n_epochs parameter yesterday. You might find the results useful.

Hope this helps.

earthsendangered · May 5, 2023, 11:37pm

Can you provide sample training data? I’m trying to figure out how to train my model with sample json and the api doesn’t seem to like json as a property inside of the overall jsonL file.

udm17 · May 6, 2023, 5:46am

I don’t provide them as a part of the jsonL file, but more so as a part of the prompt message.

hermeticJay · May 6, 2023, 3:48pm

Yeah, exactly. Include a reference JSON object in your prompt (or as many as you can fit) and it should conform to that. It might forget after a while, so you may need to periodically refresh context.

davide.fiocco · February 11, 2024, 10:46pm

As this comes on top of search engines, I thought to link here
platform.openai.com/docs/guides/text-generation/json-mode

And also
http://community.openai.com/t/how-do-i-use-the-new-json-mode

JSON mode is currently enabled for gpt-4-turbo-preview and gpt-3.5-turbo-0125

Topic		Replies	Views
Fine tuning models to generate JSON response Prompting codex , chatgpt , fine-tuning , api	6	6393	November 9, 2023
Can a model be trained to generate json? (If so, is my training data set up correctly?) API fine-tuning	6	4746	December 16, 2023
Valid json every time? Prompting	17	12484	January 3, 2024
JSON data in training file API	2	3523	December 16, 2023
Fine-tuning a Language Model to Generate dinamically specific JSON Structure without Prompting API openapi , fine-tuning , api	13	4562	May 24, 2023

Best approach for JSON generation

Related topics