How to process large sections of text in API call?

I’m trying to perform sentiment analysis on 11k+ short-form user reviews for an event using the API (currently using GPT-4).
This is part of an automated analysis of multiple events so not every event will have as many asa 11k reviews but a lot will.
What’s the best way to do this in the API to avoid token limits and rate limiting? Tokens for 11k reviews are usually in the hundred-thousands, so well exceeding current limits

I can’t really split it into chunks because all reviews need to be analysed together to get a full picture of them, but open to suggestions.

One thing to note is that an LLM can’t actually ‘look’ at the whole large context window at once. So it’s possible that you may not actually be doing what you think you’re doing.

If you’re really doing raw sentiment analysis (and assuming you can evaluate individual reviews after all, and compile the results later), you can do it cheaply with embeddings after generating some training data: Regression using the embeddings | OpenAI Cookbook, Classification using embeddings | OpenAI Cookbook

However, I’ll admit this is kinda contrived - if you just need plain old sentiment analysis, good old NTLK might be good enough.

Welcome to the dev forum @liam.wright

You can use gpt-3.5-turbo-instruct model on completions endpoint with batching to send multiple reviews to be batched in a single API call.

The code for batching will look something like:

from openai import OpenAI
client = OpenAI()

user_review_sentiment_prompt = []

# Your code for extracting individual user review and pushing them into the array by inserting them into a prompt template like f"User Review: {indicidual_user_review} job.\nSentiment:"

response = client.completions.create(
  model="gpt-3.5-turbo-instruct",
  prompt= user_review_sentiment_prompt,
  temperature=0,
  max_tokens=1,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0
)

The resulting response object will be an array of completions for every corresponding element of the user_review_sentiment_prompt list

Thanks for replying, can you please elaborate a bit on these points:

you may not actually be doing what you think you’re doing

embeddings after generating some training data

I’m not at all familiar with embeddings

My ideal output is a set of 3 positive, key themes from all 11k reviews and likewise 3 negative.

If i perform this method, i’m only going to get analysis review by review, right?

Or are you suggesting performing an analysis of the response as well for the overall summary?

I’m assuming that you mean that you think that you must load all your reviews into one prompt. Is that correct?

The models only have limited attention - they can’t pay attention to all reviews at the same time, so whatever response you get likely won’t actually be reflective of all of the reviews you loaded into the context.


embeddings are an easy way to excode some semantic meaning numerically. you can then do some linear algebra on that to extract some meaning, but to get the transformation matrix, you need some training data.

all that said and done however, for a simple use-case, NTLK sentiment analysis will likely be easier, any may even give you better results in a lot of cases.

Yes this will be a review by review analysis.

You could alter the prompt to give you a json object with “score” (from 0 -10 0 being worst and 10 being best) and “theme” fields. Then all you have to do is to filter out the review(s) with the highest and lowest scores, and whatever you want to do with the “theme”.

This approach ensures that the models attention is equally distributed to every single review.

1 Like

You send the text and get an AI-based array of values. I send 5 texts and abbreviate the results:

== Sample from 5 d=1536 vectors returned ==
0 [‘+0.07483459’, ‘-0.05935031’, ‘-0.00856189’, ‘-0.00781500’]
1 [‘+0.00903089’, ‘+0.01907911’, ‘-0.00238489’, ‘-0.00072495’]
2 [‘+0.08030009’, ‘-0.03861678’, ‘+0.00864688’, ‘+0.02503138’]
3 [‘+0.03054902’, ‘-0.06330244’, ‘-0.06083483’, ‘-0.00176743’]
4 [‘+0.02987156’, ‘-0.00951164’, ‘-0.01131394’, ‘-0.06322699’]
Abs(maximum) of all: 0.117826216

Then use math to find the distance between them. Here I show the distance between embedding 0 and other text that are reviews easy to assign stars to.

== Cosine similarity comparisons ==
0:" My GPT still won’t retrieve a PDF file. Downvote." -
match score: 1.0000
1:" Loved it! I would highly recommend!" -
match score: 0.0730
2:" Was pretty good. It met the specs and not much els" -
match score: 0.1639
3:" I was dissatisfied with the product. It wasn’t wel" -
match score: 0.2612
4:" Absolutely the worst garbage ever created. I’m sui" -
match score: 0.1892

If you build a small vector database of example reviews that have your score assignments placed alongside, you can weigh the input against all the embeddings, or also attempt to sort it into the place of others for rank.

Then adjust the dynamic range, as the most glowing review ever will have some results below it but none higher to offset that mass.

1 Like

great explanation

I’ll add to this that you can also use this particular method to try and find the most salient/interesting reviews by checking which ones are most unlike all other reviews (lowest match score to everything else).