Make the API return a python list or something I can rely will be a python list

I am passing gpt a list of names and a text and I want the API to return a list of the names that are present in the text AMONG the list of names I pass.

Ideally I would like the response to be a list such as “name_a,name_b,name_c”?

well, there’s JSON mode you could try.

Personally, my goto is to use the completions endpoint and end the prompt with [ or {.

Including a schema is typically a good idea. Typescript works quite well.

Thanks! What do you mean when you say “Personally, my goto is to use the completions endpoint and end the prompt with [ or {.” ? Do you have an example to share?

I’m talking about models like gpt-3.5-turbo-instruct/davinci 003. You have more control over the output, so if the last character in your input is { or {\n, the most likely next tokens will be part of the json schema - that way you can basically snuff out the chattiness that comes with the chat models.

something along those lines:

1 Like

If you’re using Python, you could try this package. It uses pydantic under the hood, so you can specify the output to be a list.

It may be worth mentioning that the completion endpoint is now considered legacy.

See here

Most models that support the legacy Completions endpoint will be shut off on January 4th, 2024.

1 Like

Hey, in my experience adding an instruction to return a JSON array (or any other object) in the system message, along with ‘JSON mode’ usually works well.

Maybe something like this:

I think we can do that.

  • give the AI an input list of names
  • give the AI a text passage
  • entity extraction of names into output list

It seems the task would only need clear instruction.


model:gpt-3.5-turbo-1106; top_p:0.1

  • no chat is in output

how reliable is that in your experience?

Requesting no chatter is not as strong as the “you are automated” description I used, or even giving the AI a complete explanation that its job is just part of backend code processing API interface with no actual user, and that anything other than the output specified will crash the systems.

You see here several hints with minimum waste.

1 Like

Saying both “python list” and “json array” is ambiguous.
I’ve always had best success with markdown; “output a bulleted list in markdown format” works well, and is easy to parse.

1 Like

…is flexible. I couldn’t even think of a good example for this topic where inside the “list” the items could be differentiated or identified as being their own “array”.

(I had to use gpt-3.5-turbo or gpt-3.5-turbo-0301, as gpt-3.5-turbo-1106 includes “Jim” – there’s just more dumb decline every iteration of AI model OpenAI makes…)

If referring to the same output as both json array and python list, the only difference in output might be if you include a non-string true or null. For entity extraction, it is unlikely you’ll be searching for boolean truth or searching for nothing. The increased clarity that gets the first token to be correct is what I aimed for.

that system prompt - not as good as one you'll write yourself

You are an automated entity extractor.

  • you are provided a list of entities to discover within a text;
  • you are provided the text to search for matching entities;
  • your only output will be a python list (json array) containing entities which appear in the text.

// output

  • no chat is in output
  • output begins with [

A few methods to test/try here:

  • Treat the chat model like a completion model - Provide an “Assistant: (blah blah)” or “Answer: (blah blah)” portion at the end of your prompt to prime the model to answer in a particular fashion. Here, (blah blah) is either an exemplar answer, or a description of the answer, or placeholder answer.

  • Send your request with an additional “assistant” chat dictionary object doing the above. GPT will similarly treat that as a cue to prime it’s response to conform as you indicated.

  • Include in your system message an instruction regarding the output format you are looking for (bulleted list, python list, flat numbered YAML, etc.).

  • Write a post-completion validator function which you pass your response output, and it applies a regex check for structure type. If GPT has not produced the appropriate output type, it repeats the api call with a “try” block and limited to whatever you tolerance is for repeating calls.

1 Like

“extract entities from text” isn’t a particularly thorough prompt, though.
With the prompt you posted, you might want to start with “here are the entities I’m interested in: …”
And then end the prompt with “out of the entities I’m interested in, the entities that are present in the text are: …”

The topic is not about the quality, but the output format, which was satisfied with example, satisfied against models that like to chat with you.

“Entity extraction” is a pretty fundamental InstructGPT and language task one can refer to by name. Here, small dictionary-based, supervised, instead of tuned or open.

See Case #4

and some arxiv searches etc.