Make the API return a python list or something I can rely will be a python list

nicola.macchitella · December 30, 2023, 6:23pm

I am passing gpt a list of names and a text and I want the API to return a list of the names that are present in the text AMONG the list of names I pass.

Ideally I would like the response to be a list such as “name_a,name_b,name_c”?

Diet · December 30, 2023, 6:59pm

well, there’s JSON mode you could try.

Personally, my goto is to use the completions endpoint and end the prompt with [ or {.

Including a schema is typically a good idea. Typescript works quite well.

nicola.macchitella · December 30, 2023, 7:37pm

Thanks! What do you mean when you say “Personally, my goto is to use the completions endpoint and end the prompt with [ or {.” ? Do you have an example to share?

Diet · December 30, 2023, 7:43pm

I’m talking about models like gpt-3.5-turbo-instruct/davinci 003. You have more control over the output, so if the last character in your input is { or {\n, the most likely next tokens will be part of the json schema - that way you can basically snuff out the chattiness that comes with the chat models.

Diet · December 30, 2023, 7:51pm

something along those lines:

cyzgab · December 31, 2023, 7:37am

If you’re using Python, you could try this package. It uses pydantic under the hood, so you can specify the output to be a list.

It may be worth mentioning that the completion endpoint is now considered legacy.

See here

Most models that support the legacy Completions endpoint will be shut off on January 4th, 2024.

ramn7 · December 31, 2023, 9:13am

Hey, in my experience adding an instruction to return a JSON array (or any other object) in the system message, along with ‘JSON mode’ usually works well.

Maybe something like this:

_j · December 31, 2023, 12:13pm

I think we can do that.

give the AI an input list of names
give the AI a text passage
entity extraction of names into output list

It seems the task would only need clear instruction.

model:gpt-3.5-turbo-1106; top_p:0.1

Diet · December 31, 2023, 12:59pm

no chat is in output

how reliable is that in your experience?

_j · December 31, 2023, 5:17pm

Requesting no chatter is not as strong as the “you are automated” description I used, or even giving the AI a complete explanation that its job is just part of backend code processing API interface with no actual user, and that anything other than the output specified will crash the systems.

You see here several hints with minimum waste.

jwatte · December 31, 2023, 7:04pm

Saying both “python list” and “json array” is ambiguous.
I’ve always had best success with markdown; “output a bulleted list in markdown format” works well, and is easy to parse.

_j · January 1, 2024, 3:49am

…is flexible. I couldn’t even think of a good example for this topic where inside the “list” the items could be differentiated or identified as being their own “array”.

(I had to use gpt-3.5-turbo or gpt-3.5-turbo-0301, as gpt-3.5-turbo-1106 includes “Jim” – there’s just more dumb decline every iteration of AI model OpenAI makes…)

If referring to the same output as both json array and python list, the only difference in output might be if you include a non-string true or null. For entity extraction, it is unlikely you’ll be searching for boolean truth or searching for nothing. The increased clarity that gets the first token to be correct is what I aimed for.

that system prompt - not as good as one you'll write yourself

You are an automated entity extractor.

you are provided a list of entities to discover within a text;
you are provided the text to search for matching entities;
your only output will be a python list (json array) containing entities which appear in the text.

// output

no chat is in output
output begins with [

bobartig · January 1, 2024, 5:42am

A few methods to test/try here:

Treat the chat model like a completion model - Provide an “Assistant: (blah blah)” or “Answer: (blah blah)” portion at the end of your prompt to prime the model to answer in a particular fashion. Here, (blah blah) is either an exemplar answer, or a description of the answer, or placeholder answer.
Send your request with an additional “assistant” chat dictionary object doing the above. GPT will similarly treat that as a cue to prime it’s response to conform as you indicated.
Include in your system message an instruction regarding the output format you are looking for (bulleted list, python list, flat numbered YAML, etc.).
Write a post-completion validator function which you pass your response output, and it applies a regex check for structure type. If GPT has not produced the appropriate output type, it repeats the api call with a “try” block and limited to whatever you tolerance is for repeating calls.

jwatte · January 1, 2024, 5:43pm

“extract entities from text” isn’t a particularly thorough prompt, though.
With the prompt you posted, you might want to start with “here are the entities I’m interested in: …”
And then end the prompt with “out of the entities I’m interested in, the entities that are present in the text are: …”

_j · January 1, 2024, 5:59pm

The topic is not about the quality, but the output format, which was satisfied with example, satisfied against models that like to chat with you.

“Entity extraction” is a pretty fundamental InstructGPT and language task one can refer to by name. Here, small dictionary-based, supervised, instead of tuned or open.

See Case #4

https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-openai-api

and some arxiv searches etc.

Topic		Replies	Views
How to let GPT do not return any accompanying text? Prompting gpt-4 , gpt-35-turbo , chatgpt	11	13030	December 15, 2023
How to get a raw list as assistant's output? Prompting gpt-4 , gpt-35-turbo , api , knowledge-files	4	779	December 29, 2023
Entity Extraction in chatGPT API	6	6367	March 8, 2023
How to get 100% valid JSON answers? Prompting gpt-4 , gpt-35-turbo , chatgpt , api	16	8662	June 11, 2024
Recommendations for improving the new instruct model? Prompting gpt-35-turbo-instruc	3	5214	December 20, 2023

Make the API return a python list or something I can rely will be a python list

Related topics