I would like to do TTS for a large amount of my data; however, my rate limits (RPM) for TTS models are low.
The TPM, however, is adequate for the purpose. Upon trying to use my TPM with batching by passing an array of strings as input, I get the following error:
Exception has occurred: BadRequestError
Error code: 400 - {'error': {'message': '1 validation error for Request\nbody -> input\n str type expected (type=type_error.str)', 'type': 'invalid_request_error', 'param': None, 'code': None}}
httpx.HTTPStatusError: Client error '400 Bad Request' for url 'https://api.openai.com/v1/audio/speech'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400
During handling of the above exception, another exception occurred:
File "/Users/sukhmanjawa/Projects/py-tes/ssml.py", line 9, in <module>
response = client.audio.speech.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
openai.BadRequestError: Error code: 400 - {'error': {'message': '1 validation error for Request\nbody -> input\n str type expected (type=type_error.str)', 'type': 'invalid_request_error', 'param': None, 'code': None}}
input (string) - Required
The text to generate audio for. The maximum length is 4096 characters.
The error you show is sending something other than a string in that position (an array, which only works on “completion”)
Rate limits for tts-1 are 100-500 per minute, certainly suitable for parallel processing.
tts-1-hd - at 3 requests per minute across tiers, is the only one where you wouldn’t have the ability to run lots of parallel jobs and it would be more suited to one at a time with a wait for the next minute after those three.
Do you just need techniques to make parallel async or threaded calls?
You can simulate a “batch”, joining your text up to the character limit, using silence detection after reception, and prompting for a pause longer than would appear in anything being read.
example:
prompt = """
. The error you show is sending something other than a
string in that position (an array, which only works on “completion”).
[pause for 5 seconds] [another pause]
Rate limits for tts-1 are 100-500 per minute,
certainly suitable for parallel processing.
""".strip().replace('\n', " ")
Initially I also thought of the silence detection for batching but there are points in the audio (due to the nature of audio I’m generating), where the model generates pauses on its own - which could cause problems with the audio segmentation.
Another is obviously the cost; inserting pauses as you suggest wouldn’t be economic, given the current pricing.
My use case is generating small but numerous segments of audio and such pauses will add to cost.