JSON data in training file

I’m fine-tuning a model to have chatGPT provide output in JSON documents that my application can understand.

For a specific prompt, I’d like chatGPT to answer something like this:

{
  "EmptyPageRouterSwitch_ID": {
    "id": "EmptyPageRouterSwitch_ID",
    "name": "RouterSwitchSymbol",
    "alias": "Empty Layout Page Switcher",
    "props": {
      "routes": {
        "nodes": [
          "EmptyPageRootRoute_ID",
          "ErrorPageRootRoute_ID",
          "NotFoundPageRootRoute_ID"
        ]
      },
      "redirects": {
        "nodes": []
      }
    },
    "states": [],
    "parentID": "BaseLayout_ID",
    "hiddenLayout": true,
    "schemaOverride": {
      "interaction": "only-editable"
    }
  }
}

Here are the steps I’m following to prepare my training JSONL file:

  • Write all the output DSL into a single line
  • Escape the json document
  • Surround it with " \n and \\n".

A line in my JSONL file looks like this:
{"prompt":"generate an empty DSL ->","completion":" ```\\n{\\\"appDSL\\\":{\\\"nodes\\\":{\\\"EmptyPageRouterSwitch_ID\\\":{...

I can provide thousands of prompts like this, but I’m not sure if this is the correct approach. Does anybody can provide some guidance?

I have a similar task and would love to know if you get the answer. I tested it using regular chat gbt in the web browser and gave him instructions and examples, and he seemed to do it fine, but takes the liberty of changing the json property names (I’d like him to use my strict guidelines for the property names.)