Structuring paper abstracts by topic?

Still trying to figure out the pricing here, which is clear as mud. As I’m reading the FAQ (seems like the wrong place compared to the API docs), each sentence would cost me… >$0.05 to classify? Can that possibly be right?

Number of tokens in all of your documents
+ (Number of documents + 1) * 14
+ (Number of documents + 1) * Number of tokens in your query

So as I’m reading this, if I go with my scenario of 5 topics with n=200 examples each (1000 total), and since scientific abstracts are fairly jargon and notation intense assume something like 30 BPEs, and using ada’s $0.0008/1000-BPEs, that would translate to

ndocs <- 1000; avgTokensPerDoc <- 30; 
(((ndocs*avgTokensPerDoc) + 
 (ndocs + 1) * 14 + (ndocs + 1) * 30) / 1000)
 * 0.0008
# [1] 0.0592352

(I’m going to ignore the cost of the additional regular completion because I can’t figure that out from the FAQ. Come on guys.)

Does this sound about right? It’s higher than I expected but another user also seemed to get high costs too… At $0.06 a sentence, a single abstract could easily be a buck. I have a current set of ~6.5k and I expect to get at least that many in the future, so that’s a bit steep. (Multiplying a bit would suggest a total cost of $1.9k to parse topics.)

The main cost here seems to come from the ndocs parameter blowing up the total number of tokens passed into Search. Is everyone using this with relatively few classified documents? If I imagine instead only n=5 examples per topic, then it’s much more feasible, coming at more like $50 total:

R> ndocs <- 5*5; avgTokensPerDoc <- 30; perSentenceCost <- (((ndocs*avgTokensPerDoc) + (ndocs + 1) * 14 + (ndocs + 1) * 30) / 1000) * 0.0008; 6500 * 5 * perSentenceCost
[1] 49.244

This is fairly reasonable but a little bit discouraging. Splitting by topic is but one of many transformations I might want to apply: I was looking into it because I thought it would be one of the easiest to get working, but it’s already proving to be a bit daunting. Considering that, I may be better off trying to finetune the complete set of edits as before/after pairs instead of doing multiple phases in a pipeline.