In my experience the gpt-4-1106-preview model is as fast as the gpt-3.5-turbo-1106. But actually 2 to 3 times slower than the gpt-3.5-turbo-16k.
I’m in Tier 4, I don’t think speed has anything to do with it. At least there hasn’t been a timeout for a few days, that’s the main thing. But it’s true that a little more speed would make a big difference because the 1106 is really great, especially for calling functions in parallel and following instructions.
At the moment I need 8-10 seconds to execute a simple multi-function call (one function called 2 times with different params) with a 3 lines system prompt… The speed seems being very unstable and vary over the time of the day because sometime the same request just take 3-4 seconds… So surely it’s an API overloading issue.
OMG… Are you sure that all that consumption is yours?
I have been experimenting and testing for months to be able to expand my services, but that is impossible, the consumption is super high and does not correspond to the real token count.
I have generated keys in a 100% secure way and they had consumption without using them.