Single pre-prompt with multiple prompts for classification task

I have a classification task. There is a list of labels with a short description of each. There are a few examples of classification. Then each individual task will be to classify ~100 strings following this prompt.

What is the best way to encode this? I’ve tried:

  1. Chat API, where for each of the ~100 strings, I give the prompt with examples, then ask to classify a single string. Very inefficient due to repeating the large prompt+examples for every string.

  2. Chat API, where a single ‘user’ message lists ~100 strings, then the assistant echoes back each string, plus the label. This sorta works, but it’s inefficient (strings much longer than labels), and the model sometimes hallucinates when echoing back the strings.

  3. Chat API, as above, but where the assistant outputs a list of 100 labels. This doesn’t work very well; the model gets confused about where it is in the task, i.e. which strings it’s labelling. Numbering the strings and labels helps a little, but not much.

  4. Chat API, where a single ‘user’ message lists one string, then the assistant echoes back its label, repeated ~100 times. Works, but just a worse version of solution 0, because I’m just paying for useless chat history that’s not used for the task.

What I want is an API where I provide a single pre-prompt, then N prompts, and get back N completions, where completion j is the completion following the pre-prompt plus prompt j.

Is this structure possible? Is there some other way? Should I be looking at the deprecated classification API, or fine-tuning?

This might just be me, but I’m not sure what you are asking for, are you asking to ask 100 questions and get 100 answers back in one prompt? or are you looking for the AI to decide if a piece of text is one of 100 defined things you give it?

The optimal text classification method, IMO, is 0. Using gpt 3.5 turbo should be an efficient way of doing this due to its low cost.

Generally speaking, the less tasks you give in a single prompt, the more likely it is to do a good job. You could try to increase the number of strings gradually (try 2, then 3, then 4 etc) in order to optimize efficiency : reliability.

Depending on your use-case, I would even run the query several times on the same string(s) in order to perform a majority / consensus classification task to increase the quality of your final classification.

I hope this helps.


Sharing my opinion very late on this; are your 100 classes group-able? If yes, I would suggest you to use grouping, sub-grouping, sub-sub-grouping… and then call llm multiple times to go down the decision tree of these group and end up with the actual class.
For example, Decide between Animal and Plant, in next step if Animal: decide between carnivore or herbivores… eventually classify between Whale or Shark. You may use more than 2 classes at each step to shorten the tree depth.
Use chatgpt to comeup with groups, and use those.
This will reduce your input size, as you may not need to provide all the examples, thereby reducing cost. It will however lead to more latency.