Best prompt to avoid invalid/undefined values in a JSON response?

almosnow · January 28, 2024, 5:10pm

Hi all,

I’m instructing GPT to fill in a JSON object with some entities extracted from a given text,

My only issue is that sometimes these entities are not explicitly found, but still, GPT still wants to return something.

Examples:

{ date:"2012-10-05", time:"Not found" }
{ date:"2012-10-05", time:"N/A" }
{ date:"2012-10-05", time:"Unavailable" }

In your experience, what’s the best way to avoid this?
i.e. How do I tell GPT to only return properties when it is certain the they exist?

By the way, I tried adding “do not return properties if they’re not present in the text” explicitly in the prompt to no avail.

Macha · January 28, 2024, 11:19pm

So, one of the things that tends to be helpful with these models is to provide what to do, not what not to do.

It is also important to recognize 100% accuracy is not achievable all of the time, but we can do our best to increase that probability.

Perhaps instead of “not returning a property”, consider explicitly providing and exemplifying a “null” value for it to return. Either “N/a”, “null”, “Not found” could be the return value it must specify that could be caught by the rest of your program.

The other thing to note is that “certainty” is not really the forte of a probability machine. Confident, perhaps, but not certain.

Do you have the full prompt you could share with us? We could dive in more deeply to consider ways to improve the prompt this way.

_j · January 29, 2024, 2:55am

If you define your output as a complete JSON schema, suitable for validating the AI output, you can add description and example fields. Additionally, you can specify which properties are required. The contrast between some properties and values being required and others not being specified may inform the AI of what is optional, in addition to the descriptions.

Furthermore, you can specify where the information to complete the output should be sourced from. This could be from AI knowledge, a reproduction from user input or user chat history, or from documentation currently seen in context. You can define if a function should be called if the resources are not available to fill a property, or if the AI must continue interviewing the user to obtain more information before issuing a function send.

jr.2509 · January 29, 2024, 3:29am

Echoing what the others have said: Be as explicit and specific about the expected JSON schema in the event that a property is not present in the text. Currently, your prompt is a bit vague in that regard, which most likely causes the issue. If the issue persists, then add a reinforcing sentence in your instructions along the lines of, i.e. NEVER return property if …

almosnow · January 29, 2024, 11:33pm

Hi all and thanks for your replies ,

I tried all the things mentioned here and decided to leave my feedback for future readers:

Instructing the model to “not return this”, “never do this”, etc… didn’t work so well. It definitely improves things but doesn’t get me there all the way. Empirically, I would say it takes me from 50-50 to 80-20 (out of 100 queries, how many of them have the JSON schema I expect?). I guess this is because completion models have a strong incentive to just … complete things?

Telling the model to return a “null” value explicitly (as @Macha suggested) is what worked the best for me (95-5 rate, maybe better ). Now I always get back some value, but I can quickly check if its nil and discard it.

Macha · January 29, 2024, 11:51pm

Yay! Glad to hear success!

Welcome to prompting lol. As mentioned, we can’t reach 100%, but we can definitely try to help bump it up to at least 95% or higher .

Don’t forget to mark a solution if you believe it has been resolved!

Pretty much! But this translates across other uses as well, like with DALL-E. Emphasizing the “omission” seems to counterintuitively make it so the model focuses more on the thing being omitted than less. For example “No hair” generates, well, hair, but “Bald” generates a person without hair. _j has made mention about this in other topics on this forum as well.

Glad this all helps!

jr.2509 · January 30, 2024, 9:44am

I do see the point with the omission from an attention perspective. That said, I do think the approach can have some merits depending on the use case. For instance, I use models quite a bit for classification tasks, drawing from a list of pre-defined categories. In these cases, when reinforcing the instructions by stating it should never deviate in the selection of categories from the pre-defined list, I have historically achieved pretty high success rates.

But I definitely appreciate that in other circumstances it may not have the desired impact.

I guess at the end of the end of the day, prompting really is an art and will always involve a bit of trial and error.

Topic		Replies	Views
Valid json every time? Prompting	17	12033	January 3, 2024
Alternatives to negative prompting Prompting chatgpt	7	2194	October 2, 2023
Fine-tuning a Language Model to Generate dinamically specific JSON Structure without Prompting API openapi , fine-tuning , api	13	4253	May 24, 2023
Few-shot examples "leaking" into responses in Q&A system Prompting prompt-engineering	5	891	October 30, 2024
Need help with prompt for generating actions and question Prompting	16	2849	December 23, 2023

Best prompt to avoid invalid/undefined values in a JSON response?

Related topics