Structuring paper abstracts by topic?

gwern · December 17, 2021, 3:51am

Still trying to figure out the pricing here, which is clear as mud. As I’m reading the FAQ (seems like the wrong place compared to the API docs), each sentence would cost me… >$0.05 to classify? Can that possibly be right?

Number of tokens in all of your documents
+ (Number of documents + 1) * 14
+ (Number of documents + 1) * Number of tokens in your query

So as I’m reading this, if I go with my scenario of 5 topics with n=200 examples each (1000 total), and since scientific abstracts are fairly jargon and notation intense assume something like 30 BPEs, and using ada’s $0.0008/1000-BPEs, that would translate to

ndocs <- 1000; avgTokensPerDoc <- 30; 
(((ndocs*avgTokensPerDoc) + 
 (ndocs + 1) * 14 + (ndocs + 1) * 30) / 1000)
 * 0.0008
# [1] 0.0592352

(I’m going to ignore the cost of the additional regular completion because I can’t figure that out from the FAQ. Come on guys.)

Does this sound about right? It’s higher than I expected but another user also seemed to get high costs too… At $0.06 a sentence, a single abstract could easily be a buck. I have a current set of ~6.5k and I expect to get at least that many in the future, so that’s a bit steep. (Multiplying a bit would suggest a total cost of $1.9k to parse topics.)

The main cost here seems to come from the ndocs parameter blowing up the total number of tokens passed into Search. Is everyone using this with relatively few classified documents? If I imagine instead only n=5 examples per topic, then it’s much more feasible, coming at more like $50 total:

R> ndocs <- 5*5; avgTokensPerDoc <- 30; perSentenceCost <- (((ndocs*avgTokensPerDoc) + (ndocs + 1) * 14 + (ndocs + 1) * 30) / 1000) * 0.0008; 6500 * 5 * perSentenceCost
[1] 49.244

This is fairly reasonable but a little bit discouraging. Splitting by topic is but one of many transformations I might want to apply: I was looking into it because I thought it would be one of the easiest to get working, but it’s already proving to be a bit daunting. Considering that, I may be better off trying to finetune the complete set of edits as before/after pairs instead of doing multiple phases in a pipeline.

Topic		Replies	Views
Need help creating a copy editor for novels and other long texts API	4	1244	December 1, 2023
Limits and limits and limits API	2	1188	May 31, 2021
Reducing filler, fluff, and meta in responses? API prompt-engineering	9	1326	January 11, 2024
Practical Tips for Dealing with Large Documents (>2048 tokens) API	6	8358	December 17, 2023
Cannot get gpt-4o-mini to follow instructions API gpt-4o-mini	7	267	October 8, 2024

Structuring paper abstracts by topic?

Related topics