Any idea on how to prevent double quotes inside of paragraphs?

so I am creating a prompt where I want it to return a paragraph and then JSON parse the response from openAI chatGPT.

The problem I get a lot is that inside the string paragraph, there are double quotes and that breaks my json parsing.

Does anyone have a solution to either skip the double quotes or instruct GPT to remove them.

This is the prompt I have
“When writing the paragraphs, if you need to use a double quote character, remember to escape it. For example, instead of writing “She said, “Hello!”” you should write “She said, \“Hello!\””.”

and if you look at the image you can see that it don’t do it like that.

In the prompt say You NEVER use " only ’
I made one for my daughter that never says yes only Yasss
I’m under the assumption it will play along as long as you follow the ToS … so if you aren’t asking it to do something crazy it usually does what you ask

1 Like
  1. “use triple-quotes” (this is also a Python doc string), OR
  2. “escape all double-quote characters within string output with backslash”, OR
  3. “within string output, you must replace all double-quotes you’d normally produce with single quotes.”

when writing the paragraphs” as you have is not as strong as “in all generated output” or “in all responses”

1 Like

Which model are you using?

GPT-3 has always escaped quotes for me, every single time.

GPT-4 will USUALLY escape quotes, but sometimes it uses a fake quote character , seen below.

Using: https://api.openai.com/v1/chat/completions and curl.

Share with me 5 different quotes about friendship GPT-3:

  "model": "gpt-3.5-turbo-16k-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "1. \"A true friend is one who overlooks your failures and tolerates your success.\" - Doug Larson\n2. \"A real friend is one who walks in when the rest of the world walks out.\" - Walter Winchell\n3. \"Friendship is born at that moment when one person says to another, 'What! You too? I thought I was the only one.'\" - C.S. Lewis\n4. \"A friend is someone who gives you total freedom to be yourself.\" - Jim Morrison\n5. \"Friendship is the only cement that will ever hold the world together.\" - Woodrow Wilson"
      },

Share with me 5 different quotes about friendship GPT-4:

"model": "gpt-4-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "1. “True friendship comes when the silence between two people is comfortable.” - David Tyson\n2. “A real friend is one who walks in when the rest of the world walks out.” - Walter Winchell \n3. “In the end, we will remember not the words of our enemies, but the silence of our friends.” - Martin Luther King, Jr.\n4. “Lots of people want to ride with you in the limo, but what you want is someone who will take the bus with you when the limo breaks down.” - Oprah Winfrey\n5. “Friends are the family you choose.” - Jess C. Scott"
      },
2 Likes

Will try this and see what the outcome will be.

thank you

I am using GPT3 and now that you mention it I haven’t tried GPT4

hmm, will try that and see what it says. I did not know that you could do it like that

1 Like

I’m not sure what format you are using, but if you end up not finding a good solution, you can use YAML instead. I’ve been using it a lot instead of json or other formats. If you use tab indentations you use far less tokens than json as well.

More specifically to your problem though, there is something called “literal block scalar” in yaml. You can dump whatever you want in it, including single and double quotes.

2 Likes

That is not true. In fact you have poorer semanic understanding because it breaks up common word tokens and is counter to training on code.

tokens - tab vs space2

Maybe in your notional toy example this is the case, but I’ve used this on a variety of different projects already and we save about 20-35% in tokens. So long as the literal scalar blocks themselves don’t have special formatting or escape characters it is better. If the escape characters already exist, then best to just use a string really. It’s becoming less important because we only use davinci 003 and the price is much cheaper. But regardless some customers have high use cases where 20-35% is a lot of money.

You should do some more experimenting and share your results. I know we use it on a case by case basis for our clients but if you have techniques for a variety of situations where JSON can always be less characters, then I’m sure people would love to see it considering how much more prevalent it is.

1 Like

Above I show chat model’s 100k tokenizer, twice as big as completion engines.

davinci will shortly have a chat model-based replacement and be taken out back and shot.

oh damn, the YAML structure works really really well :open_mouth:
wow, thank you

1 Like

why can’t I mark that answer as solution? I can’t see any button or anything saying “solution”

Can confirm that yaml is in fact the best format you can take out from the response in most cases.
Although I am not sure what happens when you want multiple yaml config files in a yaml structure (yamlyamlyaml)…

1 Like