I am using Whisper, and from my calculations, I’m being overcharged quite a bit (about 25% more than what I am sending). I noticed this and then I had an idea - I sped up the files using ffmpeg before I sent them to the API. Not sure if this is explicitly allowed, but I scoured the ToS and could find nothing prohibiting it. Anyway, it was only a test with a lowish volume of audio. The transcription accuracy is almost the same with a 2x speedup of the input file but, astonishingly, I am being charged the same as if I didn’t speed them up.
We get no indication of pricing back from the API, only tokens returned. Does the Whisper API actually charge on a per-token basis instead of a minutely basis? Is there any visibility on this?