Single pre-prompt with multiple prompts for classification task

I have a classification task. There is a list of labels with a short description of each. There are a few examples of classification. Then each individual task will be to classify ~100 strings following this prompt.

What is the best way to encode this? I’ve tried:

  1. Chat API, where for each of the ~100 strings, I give the prompt with examples, then ask to classify a single string. Very inefficient due to repeating the large prompt+examples for every string.

  2. Chat API, where a single ‘user’ message lists ~100 strings, then the assistant echoes back each string, plus the label. This sorta works, but it’s inefficient (strings much longer than labels), and the model sometimes hallucinates when echoing back the strings.

  3. Chat API, as above, but where the assistant outputs a list of 100 labels. This doesn’t work very well; the model gets confused about where it is in the task, i.e. which strings it’s labelling. Numbering the strings and labels helps a little, but not much.

  4. Chat API, where a single ‘user’ message lists one string, then the assistant echoes back its label, repeated ~100 times. Works, but just a worse version of solution 0, because I’m just paying for useless chat history that’s not used for the task.

What I want is an API where I provide a single pre-prompt, then N prompts, and get back N completions, where completion j is the completion following the pre-prompt plus prompt j.

Is this structure possible? Is there some other way? Should I be looking at the deprecated classification API, or fine-tuning?

This might just be me, but I’m not sure what you are asking for, are you asking to ask 100 questions and get 100 answers back in one prompt? or are you looking for the AI to decide if a piece of text is one of 100 defined things you give it?

The optimal text classification method, IMO, is 0. Using gpt 3.5 turbo should be an efficient way of doing this due to its low cost.

Generally speaking, the less tasks you give in a single prompt, the more likely it is to do a good job. You could try to increase the number of strings gradually (try 2, then 3, then 4 etc) in order to optimize efficiency : reliability.

Depending on your use-case, I would even run the query several times on the same string(s) in order to perform a majority / consensus classification task to increase the quality of your final classification.

I hope this helps.