Fine-tuning a model with structured output

I’ve been having a lot of trouble formatting my JSONL format. I have examples where the user provides text with paragraphs separated by \n, and the assistant provides JSON highlighting parts of the text.

In openai’s fine tuning guide, there is a single example with structured output that looks like it isn’t even formatted right. (why do the 2 dicts take up 4 lines?)

In their example also, no escape characters are used for double quotes in the JSON output. Is this correct?

Does anyone have tips for formatting the JSONL file with JSON output? This single example really doesn’t provide a good example for how JSON should be formatted inside JSONL, so I don’t even have a starting point for how I should resolve errors.

When I upload my dataset in JSONL I get: There was an error uploading the file: Expected file to have JSONL format, where every line is a valid JSON dictionary. Line 1 is not a dictionary (HINT: line starts with: "{“me…”). Isn’t "{“me…” exactly what every chat training data file should start with?

Any tips are appreciated.

Hi there and welcome to the forum!

Here’s an example JSON structured assistant message I successfully used for finetuning.

{“role”: “assistant”, “content”: “{‘Region’:‘Value 1, Value 2’,‘Topic’:‘Value 1, Value 2’}”}

I hope this helps.

1 Like

Thanks for the reply. It looks like those single quotes work, as well as escaped double quotes. I was able to successfully finetune a model by escaping all newlines and double quotes with single backslashes.

I wish the online examples were properly formatted!


Great to hear. The online examples are correct. They just don‘t contain specific guidance for JSON structured responses.

1 Like

Hey thanks for sharing. I am wondering if you have any suggestions for incorporating JSON array into both system and assistant prompts? Any guidance would be very much appreciated!

Hi - what are you specifically trying to achieve? In general, I think there should be no problem having a JSON in both system and user message as long as it is clear what you are trying to get the model to do and how they interconnect.

1 Like

As someone new to coding and fine-tuning, primarily self-taught through YouTube tutorials, I’m still navigating the nuances and might make some novice errors, so your patience is appreciated!

I’m working on two distinct tasks and aiming to utilize a structured output format for both (I’ve updated mine using your single quote format):

  1. Extracting Key Information: The goal here is to parse specific details from a text and represent them in a JSON array format like [ { 'Topic_1': 'topic1', 'Details_1': 'details1' }, { 'Topic_2': 'topic2', 'Details_2': 'details2' } ].

  2. Labeling Topics: Based on the extracted information, the next step is to assign labels to each topic and format the output similarly in JSON, for example [ { 'Topic_1': 'topic1', 'Label_1': 'label1' }, { 'Topic_2': 'topic2', 'Label_2': 'label2' } ] (topics are included for reference).

I’ve run into issues with formatting JSON arrays within the system and assistance prompts. However, I’m open to exploring more efficient alternatives if any are available. Thanks in advance for your guidance and assistance!

Thanks for sharing the additional information.

As for (1), this looks perfectly fine for a finetuning case. The way I would set up your training prompts is to use the system message to describe in general what you are asking the model to achieve. As part of this you can instruct it to return the response in the pre-defined JSON format and include your generic JSON schema as shown here. Your user message would then be the raw text from which to parse these information, while the assistant output message would be the actual JSON with the specific topics and details for this example. If you provide it with a sufficient number of examples, your finetuned should work pretty well. You can further enhance the reliability of your finetuned model by including in the system prompt the list of topics to choose from (provided that is defined). This will help reducing variability in outputs further.

I am not sure I 100% understand what you are trying to do under (2) but the logic for creating a finetuned model would be the same. System message with instructions, which could include both the generic schema for the “input JSON” from (1) and the new JSON schema to be generated through the model response. User message would then consist of examples of the actual JSON produced in step (1) while the assistant message would include the actual JSON generated.

Does this make sense?

You could either set it up as two separate finetuned models or you could try and combine in into one. The latter approach would involve creating examples with two interactions. It’s more complex and might require more testing to get right.


Yes, this makes perfect sense! Thank you for the clarification.

I’m indeed fine-tuning two models. However, I’m unsure if it would be helpful to provide an example text and a corresponding JSON array output in the system prompt. Also, I’ve prepared 100 examples for each task in terms of training. Do you think that’s a good number to start given the nature of the tasks?

Very much appreciate your willingness to help!

1 Like

Yeah, 100 is a good starting point. Take a look at how the model performs afterwards and then you make a call on whether to add more examples.

As for the point on the example in system message. As per my earlier message, instead of providing a specific example including the generic schema is more beneficial for the model to understand what you are after. User and assistant message in your training data illustrate how it’s applied in practice.


Understood! Thank you so much! Will come back if further issues pop up!


Sorry to bother you again. I’ve completed fine-tuning a model with the training dataset. However, upon testing in the playground, the model did not perform as expected. The outputs were inconsistent with the instructions in the training dataset, almost as if the fine-tuning had no effect. Could you provide any insights into why this might be happening?

BTW, when testing in the playground, I didn’t include a system prompt, which was used during training. Is it necessary to use the system prompt in actual use as well to align with the training setup?

1 Like

@tyrealqian You must include the same system prompt in your finetuned model that you used in your training, otherwise your model won’t work.

So this is most likely the root cause.


Gotcha, thanks for all the tips and clarification! Much appreciate it!

1 Like

Apologies for reaching out again. I ran into format issues when parsing values from the outputs of my fine-tuned models (the success rate was slightly higher than 50%). Notably, I had multiple cases with Error: Expecting ‘,’. Could this be due to insufficient training, or might it be related to the use of single quotes in the training data and thus the output? Your insights would be greatly appreciated. Thank you!

Hi - could you share a specific example of input/output when that happened? That would help to better narrow down the potential root cause.



Absolutely! Thank you so much!

This is one of the outputs with an error:

[ {
‘Aspect_1’: ‘Auburn football game’,
‘Details_1’: ‘There is just nothing like an Auburn football game’
‘Aspect_2’: ‘traditions’,
‘Details_2’: ‘The magic begins before kick off with the traditional eagle’s flight’
‘Aspect_3’: ‘band’,
‘Details_3’: ‘the band gets your spirit into high gear with the Auburn fight song’
‘Aspect_4’: ‘JUMBOtron’,
‘Details_4’: ‘There is not a bad seat in the house with the JUMBOtron - that thing is crazy’
‘Aspect_5’: ‘fans’,
‘Details_5’: ‘fun times can be had by all fans young and old’

The JSON formatter indicates:

Parse error on line 1:
[ {
‘Aspect_1’: 'Auburn
Expecting ‘STRING’, ‘}’, got ‘undefined’

There are many other similar outputs with different errors here and there.

1 Like

I wonder if I could have avoided the formatting issue by using triple single quotes.

For instance: {‘’‘Topic1’‘’: ‘’‘value’‘’, ‘’‘Topic2’‘’: ‘’‘value’‘’}

I don’t think that should have mattered. I can’t pinpoint the issue right away but I will give it some thought throughout the day.

1 Like

Sorry - I probably won’t be of too much help on this specific one. I have my own developer that I draw on to debug these more specific technical challenges.