Any idea on how to prevent double quotes inside of paragraphs?

cyruszei · July 15, 2023, 5:57pm

so I am creating a prompt where I want it to return a paragraph and then JSON parse the response from openAI chatGPT.

The problem I get a lot is that inside the string paragraph, there are double quotes and that breaks my json parsing.

Does anyone have a solution to either skip the double quotes or instruct GPT to remove them.

This is the prompt I have
“When writing the paragraphs, if you need to use a double quote character, remember to escape it. For example, instead of writing “She said, “Hello!”” you should write “She said, \“Hello!\””.”

and if you look at the image you can see that it don’t do it like that.

charleswilliams1120 · July 15, 2023, 6:11pm

In the prompt say You NEVER use " only ’
I made one for my daughter that never says yes only Yasss
I’m under the assumption it will play along as long as you follow the ToS … so if you aren’t asking it to do something crazy it usually does what you ask

_j · July 15, 2023, 6:19pm

“use triple-quotes” (this is also a Python doc string), OR
“escape all double-quote characters within string output with backslash”, OR
“within string output, you must replace all double-quotes you’d normally produce with single quotes.”

“when writing the paragraphs” as you have is not as strong as “in all generated output” or “in all responses”

JustinC · July 15, 2023, 6:46pm

Which model are you using?

GPT-3 has always escaped quotes for me, every single time.

GPT-4 will USUALLY escape quotes, but sometimes it uses a fake quote character “, seen below.

Using: https://api.openai.com/v1/chat/completions and curl.

Share with me 5 different quotes about friendship GPT-3:

  "model": "gpt-3.5-turbo-16k-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "1. \"A true friend is one who overlooks your failures and tolerates your success.\" - Doug Larson\n2. \"A real friend is one who walks in when the rest of the world walks out.\" - Walter Winchell\n3. \"Friendship is born at that moment when one person says to another, 'What! You too? I thought I was the only one.'\" - C.S. Lewis\n4. \"A friend is someone who gives you total freedom to be yourself.\" - Jim Morrison\n5. \"Friendship is the only cement that will ever hold the world together.\" - Woodrow Wilson"
      },

Share with me 5 different quotes about friendship GPT-4:

"model": "gpt-4-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "1. “True friendship comes when the silence between two people is comfortable.” - David Tyson\n2. “A real friend is one who walks in when the rest of the world walks out.” - Walter Winchell \n3. “In the end, we will remember not the words of our enemies, but the silence of our friends.” - Martin Luther King, Jr.\n4. “Lots of people want to ride with you in the limo, but what you want is someone who will take the bus with you when the limo breaks down.” - Oprah Winfrey\n5. “Friends are the family you choose.” - Jess C. Scott"
      },

cyruszei · July 15, 2023, 7:52pm

Will try this and see what the outcome will be.

thank you

cyruszei · July 15, 2023, 7:53pm

I am using GPT3 and now that you mention it I haven’t tried GPT4

cyruszei · July 15, 2023, 7:54pm

hmm, will try that and see what it says. I did not know that you could do it like that

codie · July 15, 2023, 9:02pm

I’m not sure what format you are using, but if you end up not finding a good solution, you can use YAML instead. I’ve been using it a lot instead of json or other formats. If you use tab indentations you use far less tokens than json as well.

More specifically to your problem though, there is something called “literal block scalar” in yaml. You can dump whatever you want in it, including single and double quotes.

_j · July 15, 2023, 9:14pm

That is not true. In fact you have poorer semanic understanding because it breaks up common word tokens and is counter to training on code.

tokens - tab vs space2

codie · July 15, 2023, 9:28pm

Maybe in your notional toy example this is the case, but I’ve used this on a variety of different projects already and we save about 20-35% in tokens. So long as the literal scalar blocks themselves don’t have special formatting or escape characters it is better. If the escape characters already exist, then best to just use a string really. It’s becoming less important because we only use davinci 003 and the price is much cheaper. But regardless some customers have high use cases where 20-35% is a lot of money.

You should do some more experimenting and share your results. I know we use it on a case by case basis for our clients but if you have techniques for a variety of situations where JSON can always be less characters, then I’m sure people would love to see it considering how much more prevalent it is.

_j · July 15, 2023, 9:38pm

Above I show chat model’s 100k tokenizer, twice as big as completion engines.

davinci will shortly have a chat model-based replacement and be taken out back and shot.

cyruszei · July 16, 2023, 8:21pm

oh damn, the YAML structure works really really well
wow, thank you

cyruszei · July 16, 2023, 8:24pm

why can’t I mark that answer as solution? I can’t see any button or anything saying “solution”

jochenschultz · September 15, 2023, 9:03am

Can confirm that yaml is in fact the best format you can take out from the response in most cases.
Although I am not sure what happens when you want multiple yaml config files in a yaml structure (yamlyamlyaml)…

Topic		Replies	Views
How do I ensure that JSON mode properly escapes quotation marks? API api , json , json-mode	5	5453	February 9, 2024
Ensure JSON response format API	23	45498	February 19, 2024
Get a valid json output with specific strucutre Prompting gpt-4 , api	7	1182	July 9, 2023
Valid json every time? Prompting	17	12057	January 3, 2024
{ "type": "json_object" } not always working Prompting gpt-4	5	624	January 2, 2025

Any idea on how to prevent double quotes inside of paragraphs?

Related topics