Hi all! I’m currently working on a project that requires me to classify a lot of short pieces of text into 6 different categories. I was originally going to finetune an LLM that shall remain nameless, but having read this paper I’m thinking to use COT instead with the API. So far the results are great on a small test set (about 750 pieces of text).
My question is - are these results likely to hold as I scale up to the larger set (2.5 million pieces of text)? In addition, i’m estimating that currently I’m inputting 262 tokens per document and outputting 2, but my costs seems higher than they should be for that looking at the API pricing. Is this me doing something wrong?
Thanks!