JSON data in training file

esteban.felipe · April 7, 2023, 8:18pm

I’m fine-tuning a model to have chatGPT provide output in JSON documents that my application can understand.

For a specific prompt, I’d like chatGPT to answer something like this:

{
  "EmptyPageRouterSwitch_ID": {
    "id": "EmptyPageRouterSwitch_ID",
    "name": "RouterSwitchSymbol",
    "alias": "Empty Layout Page Switcher",
    "props": {
      "routes": {
        "nodes": [
          "EmptyPageRootRoute_ID",
          "ErrorPageRootRoute_ID",
          "NotFoundPageRootRoute_ID"
        ]
      },
      "redirects": {
        "nodes": []
      }
    },
    "states": [],
    "parentID": "BaseLayout_ID",
    "hiddenLayout": true,
    "schemaOverride": {
      "interaction": "only-editable"
    }
  }
}

Here are the steps I’m following to prepare my training JSONL file:

Write all the output DSL into a single line
Escape the json document
Surround it with " \n and \\n".

A line in my JSONL file looks like this:
{"prompt":"generate an empty DSL ->","completion":" ```\\n{\\\"appDSL\\\":{\\\"nodes\\\":{\\\"EmptyPageRouterSwitch_ID\\\":{...

I can provide thousands of prompts like this, but I’m not sure if this is the correct approach. Does anybody can provide some guidance?

earthsendangered · May 5, 2023, 11:33pm

I have a similar task and would love to know if you get the answer. I tested it using regular chat gbt in the web browser and gave him instructions and examples, and he seemed to do it fine, but takes the liberty of changing the json property names (I’d like him to use my strict guidelines for the property names.)

Topic		Replies	Views
Can a model be trained to generate json? (If so, is my training data set up correctly?) API fine-tuning	6	4237	December 16, 2023
Best approach for JSON generation API	8	5531	February 11, 2024
Fine tuning models to generate JSON response Prompting codex , chatgpt , fine-tuning , api	6	5961	November 9, 2023
How to train model to always return response in specific JSON format Prompting	3	2482	December 16, 2023
Valid json every time? Prompting	17	11847	January 3, 2024

JSON data in training file

Related topics