Bad performance execution when asking the ChatCompletion to translate a json file with nearly 100 lines of small texts

jonatas · August 25, 2023, 2:41pm

Hi we are using the chatcompletion api and asking gpt3.5 to translate a small json file. The process is taking 3 minutes to return the translations. Any idea about possibilities on how to tune this ? we would like that the process taking just a few seconds.
thanks

Foxalabs · August 25, 2023, 2:45pm

Hi,

You may be able to take advantage of parallelism, i.e. splitting the task into sections and running them in parallel if that is suitable to your use case. If your task is not able to effectively take advantage of that then there is currently no low cost way to improve speed of inference. The only option would be a dedicated instance, that would only start to make sense at the 450million token per day usage mark.

Can you give an example of the prompt taking 3 mins?

jonatas · August 25, 2023, 3:19pm

hi Foxabilo, thank for the inputs .
Sure. tried also in the playground where it is also taking a while.

you are a translator please translate the json to brazilian portuguese language.

I cannot post the json since there are some links together

Foxalabs · August 25, 2023, 3:23pm

Ok, well you should be able to achieve 5 words per seconds with gpt-3.5 so 60 seconds should give you around 300 words, note that for json, a single { is an entire token and must be included.

jonatas · August 25, 2023, 3:32pm

thanks Foxabilo it was really insightful and helped a lot to understanding the boundaries

novaphil · August 25, 2023, 3:57pm

Keep in mind you pay by token, not by request. You may find you get better results if instead of sending a whole JSON blob you make individual requests with each item from the JSON. You’ll pay a bit more for the repeated instructions, but they look pretty short already. Smaller requests also give less of a chance for the model to go off the rails somehow.

jonatas · August 25, 2023, 5:47pm

sure Novaphil thanks for your inputs. We tried filtering only a few fields and not the entire json, but even though we got almost 2 minutes. Appreciate your advice on the costs. We will try the parallelism

_j · August 25, 2023, 11:38pm

Inference speed and token output will be increased if the input provided is small.

When the AI has to consider each of 4000 tokens when forming its answer (along with then considering the newly-formed answer itself while it continues), the computing requirements are higher and generation rate is lower than a similar task, but operating on 100 in and 100 out.

Here’s a start on parallel requests. Then you just have to write the splitting and batching.

github.com

openai/openai-cookbook/blob/main/examples/api_request_parallel_processor.py

"""
API REQUEST PARALLEL PROCESSOR

Using the OpenAI API to process lots of text quickly takes some care.
If you trickle in a million API requests one by one, they'll take days to complete.
If you flood a million API requests in parallel, they'll exceed the rate limits and fail with errors.
To maximize throughput, parallel requests need to be throttled to stay under rate limits.

This script parallelizes requests to the OpenAI API while throttling to stay under rate limits.

Features:
- Streams requests from file, to avoid running out of memory for giant jobs
- Makes requests concurrently, to maximize throughput
- Throttles request and token usage, to stay under rate limits
- Retries failed requests up to {max_attempts} times, to avoid missing data
- Logs errors, to diagnose problems with requests

Example command to call script:
```
python examples/api_request_parallel_processor.py \

This file has been truncated. show original

Topic		Replies	Views
Parallelise calls to the API - is it possible and how? API	13	46330	December 13, 2023
How to improve the speed of the ChatGPT API calling? API	1	778	September 8, 2023
ChatGPT answers partially to request API chatgpt	6	150	February 20, 2025
Completion Speeds - How can we optimise speeds! URGENTLY! API	8	2123	December 25, 2023
How can I improve response times from the OpenAI API while generating responses based on our knowledge base? API chatgpt , api	3	22233	November 9, 2023

Bad performance execution when asking the ChatCompletion to translate a json file with nearly 100 lines of small texts

Related topics