When I hit my token limit in my api requests, for example, x-ratelimit-remaining-tokens: 253; then I make another api request with 1000 tokens in the prompt. The api returns a 200 response. Why? I was expecting an error to deal with, instead of worrying that I may get unreliable response without warning.
for i in range(100):
log_info(f"{i} starts")
response = completions_with_backoff(
messages=[{"role": "user", "content": "Say this is a test" * 2000}],
model="gpt-3.5-turbo",
)
log_info(f"{i} ends")
log_info(response.content)
for k in response.headers:
if "ratelimit" in k:
log_info(f"{k}: {response.headers.get(k)}")
log_info("==============================================")
log:
2024-04-10 00:21:52,636 - 5 starts
2024-04-10 00:21:54,340 - 5 ends
2024-04-10 00:21:54,342 - b’{\n “id”: “chatcmpl-9C8WH114MCq3Vv1dpfQmMhhBgx0uH”,\n “object”: “chat.completion”,\n “created”: 1712679713,\n “model”: “gpt-3.5-turbo-0125”,\n “choices”: [\n {\n “index”: 0,\n “message”: {\n “role”: “assistant”,\n “content”: “This is a test.”\n },\n “logprobs”: null,\n “finish_reason”: “stop”\n }\n ],\n “usage”: {\n “prompt_tokens”: 10007,\n “completion_tokens”: 5,\n “total_tokens”: 10012\n },\n “system_fingerprint”: “fp_b28b39ffa8”\n}\n’
2024-04-10 00:21:54,343 - x-ratelimit-limit-requests: 10000
2024-04-10 00:21:54,344 - x-ratelimit-limit-tokens: 60000
2024-04-10 00:21:54,345 - x-ratelimit-remaining-requests: 9994
2024-04-10 00:21:54,346 - x-ratelimit-remaining-tokens: 4787
2024-04-10 00:21:54,346 - x-ratelimit-reset-requests: 46.001s
2024-04-10 00:21:54,347 - x-ratelimit-reset-tokens: 55.212s
2024-04-10 00:21:54,348 - ==============================================
2024-04-10 00:21:54,348 - 6 starts
2024-04-10 00:21:59,509 - 6 ends
2024-04-10 00:21:59,511 - b’{\n “id”: “chatcmpl-9C8WM76pU8Qw5yEsLHgy78rBoQEzQ”,\n “object”: “chat.completion”,\n “created”: 1712679718,\n “model”: “gpt-3.5-turbo-0125”,\n “choices”: [\n {\n “index”: 0,\n “message”: {\n “role”: “assistant”,\n “content”: “This is a test”\n },\n “logprobs”: null,\n “finish_reason”: “stop”\n }\n ],\n “usage”: {\n “prompt_tokens”: 10007,\n “completion_tokens”: 4,\n “total_tokens”: 10011\n },\n “system_fingerprint”: “fp_b28b39ffa8”\n}\n’
2024-04-10 00:21:59,512 - x-ratelimit-limit-requests: 10000
2024-04-10 00:21:59,513 - x-ratelimit-limit-tokens: 60000
2024-04-10 00:21:59,514 - x-ratelimit-remaining-requests: 9993
2024-04-10 00:21:59,514 - x-ratelimit-remaining-tokens: 273
2024-04-10 00:21:59,515 - x-ratelimit-reset-requests: 57.783s
2024-04-10 00:21:59,516 - x-ratelimit-reset-tokens: 59.726s
2024-04-10 00:21:59,516 - ==============================================
2024-04-10 00:21:59,517 - 7 starts
2024-04-10 00:22:09,082 - 7 ends
2024-04-10 00:22:09,085 - b’{\n “id”: “chatcmpl-9C8WWiVhBzteWovKLD8K7dvMlMwfK”,\n “object”: “chat.completion”,\n “created”: 1712679728,\n “model”: “gpt-3.5-turbo-0125”,\n “choices”: [\n {\n “index”: 0,\n “message”: {\n “role”: “assistant”,\n “content”: “This is a test.”\n },\n “logprobs”: null,\n “finish_reason”: “stop”\n }\n ],\n “usage”: {\n “prompt_tokens”: 10007,\n “completion_tokens”: 5,\n “total_tokens”: 10012\n },\n “system_fingerprint”: “fp_b28b39ffa8”\n}\n’
2024-04-10 00:22:09,085 - x-ratelimit-limit-requests: 10000
2024-04-10 00:22:09,086 - x-ratelimit-limit-tokens: 60000
2024-04-10 00:22:09,087 - x-ratelimit-remaining-requests: 9992
2024-04-10 00:22:09,087 - x-ratelimit-remaining-tokens: 312
2024-04-10 00:22:09,088 - x-ratelimit-reset-requests: 1m5.013s
2024-04-10 00:22:09,089 - x-ratelimit-reset-tokens: 59.687s
2024-04-10 00:22:09,090 - ==============================================
2024-04-10 00:22:09,090 - 8 starts
2024-04-10 00:22:18,679 - 8 ends
2024-04-10 00:22:18,682 - b’{\n “id”: “chatcmpl-9C8WfoShWYeCKNjYgcJQV2GBlK2cA”,\n “object”: “chat.completion”,\n “created”: 1712679737,\n “model”: “gpt-3.5-turbo-0125”,\n “choices”: [\n {\n “index”: 0,\n “message”: {\n “role”: “assistant”,\n “content”: “Say this is a test”\n },\n “logprobs”: null,\n “finish_reason”: “stop”\n }\n ],\n “usage”: {\n “prompt_tokens”: 10007,\n “completion_tokens”: 5,\n “total_tokens”: 10012\n },\n “system_fingerprint”: “fp_b28b39ffa8”\n}\n’
2024-04-10 00:22:18,683 - x-ratelimit-limit-requests: 10000
2024-04-10 00:22:18,683 - x-ratelimit-limit-tokens: 60000
2024-04-10 00:22:18,684 - x-ratelimit-remaining-requests: 9991
2024-04-10 00:22:18,685 - x-ratelimit-remaining-tokens: 295
2024-04-10 00:22:18,686 - x-ratelimit-reset-requests: 1m13.309s
2024-04-10 00:22:18,686 - x-ratelimit-reset-tokens: 59.704s
2024-04-10 00:22:18,687 - ==============================================