Fine Tuning with escaped strings

YourAverageDev · September 18, 2023, 3:10pm

I am trying to fine tune a model, but all fine tuned data is written in a string. Example: “Hello”. If I wanted to add like " to the fine tune data. Is my only option to unescape the json string? Do I have to use ", which also includes that I want to fine tune like next lines and etc. gpt-3.5-turbo is a fine-tuned version of gpt-3. So OpenAI must have done it so other way without unescaping the json string.

_j · September 18, 2023, 3:30pm

You have cleverly outsmarted the forum by not putting your escaped text within backticks to make it preformatted_text.

{ "tip": "Here's the \"truth\":\nYou can and must escape quotes (\") and triples(\"\"\")" }

YourAverageDev · September 30, 2023, 3:17pm

Does the fine-tune preparation automatically do that or do I have to do it myself?

_j · September 30, 2023, 3:26pm

You will have to escape quotes within strings as well as linefeeds. A quote that is not escaped in a JSON will have the effect of closing the string at that point. The text that follows would then be viewed as invalid JSON.

You can also put the whole set of jsonl (JSON list, although it is JSONs separated by lines and not a pythonic list) into a python script and run it. It will produce no errors if correct, but throw a syntax error on bad strings.

PS C:\Users\user\Documents\chat> .\teststrings.py
  File "C:\Users\user\Documents\chat\teststrings.py", line 3
    {"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers." Give or take a few, like that really matters."}]}
                                                                                                                                                                                                                                    ^
SyntaxError: invalid syntax

Topic		Replies	Views
How do i fine tune with code snippets? API fine-tuning , gpt-4o-mini	2	73	April 4, 2025
Unicode in jsonl dataset for fine tuning API fine-tuning , fine-tune	4	436	September 21, 2024
Fine-tuning with quotation marks API	3	1556	October 26, 2021
What happens if training data contains quotes? API	0	557	October 20, 2022
How do I ensure that JSON mode properly escapes quotation marks? API api , json , json-mode	5	5564	February 9, 2024

Fine Tuning with escaped strings

Related topics