I’ve been using a very simple langchain code to create a chatbot powered by gpt-4-turbo.
Until December 4th, the speed was consistently fast, but in the past two days, it has suddenly become very slow. (Approximately from 30 tokens/S dropped to 10 tokens/S)
I haven’t made any changes to the code.
Additionally, I have recorded the model’s fingerprint, and it seems that OpenAI began updating the model around December 1st:
- 2023-12-01 14:35:15,002 - openai_fp_log - model: gpt-4-1106-preview, fingerprint: fp_a24b4d720c
- 2023-12-01 14:39:27,898 - openai_fp_log - model: gpt-4-1106-preview, fingerprint: fp_a24b4d720c
- 2023-12-01 14:42:57,834 - openai_fp_log - model: gpt-4-1106-preview, fingerprint: fp_a24b4d720c
- 2023-12-04 20:00:22,212 - openai_fp_log - model: gpt-4-1106-preview, fingerprint: fp_2eb0b038f6
- 2023-12-04 20:56:04,723 - openai_fp_log - model: gpt-4-1106-preview, fingerprint: fp_a24b4d720c
- 2023-12-04 20:59:42,837 - openai_fp_log - model: gpt-4-1106-preview, fingerprint: fp_a24b4d720c
- 2023-12-05 09:48:43,225 - openai_fp_log - model: gpt-4-1106-preview, fingerprint: fp_d2455ee9e0
- 2023-12-05 10:10:41,939 - openai_fp_log - model: gpt-4-1106-preview, fingerprint: fp_a24b4d720c
- 2023-12-05 11:12:09,103 - openai_fp_log - model: gpt-4-1106-preview, fingerprint: fp_a24b4d720c
- 2023-12-05 11:34:53,838 - openai_fp_log - model: gpt-4-1106-preview, fingerprint: fp_a24b4d720c
- 2023-12-05 11:43:53,015 - openai_fp_log - model: gpt-4-1106-preview, fingerprint: fp_a24b4d720c
- 2023-12-05 12:17:15,388 - openai_fp_log - model: gpt-4-1106-preview, fingerprint: fp_a24b4d720c
The code I’m using is as follows:
llm = OpenAI(model_name=_self.openai_model, temperature=0, streaming=True)
chain = ConversationChain(llm=llm, memory=memory, verbose=True)
chain.run("xxxxxx")
Additionally, I’ve noticed that now when inputting the same question, gpt4-turbo’s responses in English are over twice as fast as those in Chinese, despite the response length and semantic content being similar."
Has the speed of the gpt4-turbo model really decreased?Or maybe my code is not adapted to the new version of the model?Will the speed increase in the release version in the future?