The following is a capture of an email chain with “support”:
This time I tried encoding my inline images as webp, and I found once again that the token count reported in the API error message made it clear it was made naively token counting the actual system and user prompts sent in via the API (which include inline base64 image reporesentations) and not instead incorporating the token counts actually expected to be sent to the model (since as I understand it the vectorization of an image will reduce many thousands of tokens to only around 2,000 tokens, even in detail mode).
I do not think I should be forced to make hundreds of inference calls to process hundreds of images individually, as this does not scale, especially since my system context is relatively large (4,000 tokens or so). I would much rather budget my input token consumption as per your published vision API rather than put a x60 “fudge factor” in to ensure your naive token counter does not exceed the full window of 128,000 when counting towards that budget tokens of the inline base64 image instead of the resulting vectorized form which is orders of magnitude more compressed.
Please respond. Anyone? I’m fast running out of options here.
Thanks,
On 2024-10-27 12:58 a.m., Myles Dear Hotmail wrote:
Now, I’m only packing three subtitles (one for English, French and Spanish) and a single keyframe along with the required system prompt, and I’m getting “string too long” errors.
string too long. Expected a string with maximum length 1048576, but got a string with length 2237388 instead.
Whaaaaaat? I am using your image based API passing in a supported file format with a supported image type that has a supported resolution and your API can’t handle the length !?!?!?!?
My keyframes are 1920 x 1080 and I am uploading them as PNGs so there is no loss in fidelity.
See below for the failure I saw when I tried to convert the PNG content to JPEG in order to save space, I then got input token cap exceeded errors!?!?
This model’s maximum context length is 128000 tokens. However, your messages resulted in 277597 tokens.
There is no way the tiktoken count for the system prompt plus the tiktoken count for the input subtitles as well as the input token budget for the vectorized form derived from the input jpeg inline base64 image (specified in your vision API documentation) comes up to that much. It appears your API is counting the JPEG inline image towards the input token count instead of the vectorized version of that image.
Token usage as counted by tiktoken blindly by adding the system and the user prompt (which contains the inline jpeg content that, as I understand it, the OpenAI API will transform to a vectorized version that consumes much less token space before handing off to the model):
277,586
This evidence and calculation suggests your API is blindly counting the user prompt before the image transformation (and token space compression) step.
Token usage as estimated given the token sizing details from the Vision API:
System prompt : 4,370
User prompt breakdown:
Subtitle 1 brings input token count to 4,419
Subtitle 2 brings input token count to 4,478
Subtitle 3 brings input token count to 4,526
1,445 tokens for the first image (85 + 170 * 8 as per vision documentation) which brings token count to 5,971
So, once again, I find myself blocked and my customer is still waiting.
What will you do to unblock me?
> 2024-10-26 21:07:27,881 DEBUG: Request options: {'method': 'post', 'url': '/chat/completions', 'files': None, 'json_data': {'messages': [{'role': 'system', 'content': '\n You are tasked with processing wheelchair seating vocational training videos. Your goal is to analyze and summarize the video. \n\n The expected output will include:\n - "file_commentaries": Commentary about the video as a whole.\n - "subtitles": Subtitles extracted from the video, tied to specific timestamps.\n - "image_summaries": A summary of keyframes extracted from the video, based on timestamped keyframe URLs.\n - "commentaries": A section that ties together insights derived from subtitles and keyframes.\n - "source": Details about the video source, including whether it is human-created or AI-generated.\n - "metadata": Any additional metadata required to describe the context.\n\n The expected input will include:\n - A list of keyframe URLs, localized by timestamp, that represent important visual elements of the video.\n - A set of subtitles, localized by timestamp, that convey important spoken content.\n - Summaries produced by previous inference runs on a given video (empty for the first inference run)\n\n Input and output formats are valid JSON objects.\n\nA few specific details to consider:\n- You will be presented with a chronological series of subtitles in multiple languages. \nFor each language, ensure you rebuild the narrative to ensure continuity is maintained. For example, if one subtitle says "It\'s really cool actually how you can convert a body point padded one" and the next subtitle says "and a half inch belt with plastic side released buckle on it" you should come to the conclusion that the trainer is showing a one and a half inch belt, not a half inch belt. Precision is extremely important to maintain when you are summarizing. \n\n\n\n\nHere is a list of abbreviations that the content producer uses to compose video filenames and series names. It is in a form of a table in which the abbreviation is further explained. Use this information to better understand the intention of any video or series name.\n\nAbbreviation List\nASBS : ASSEMBLY BRACKETS STEEL (FLATS)\nASBA : ASSEMBLY BRACKETS ALUMINUM (FLATS)\nBC : BACK COVER\nBCU : BACK CUSHION\nBINTH : BACK INTERFACE HARDWARE\nBINT : BACK INTERFACE\nBT : BLACK TRAY\nCFS : CALF SUPPORT\nCH : CUP HOLDER\nCHAC : CHAIR ACCESSORIES\nCHR : CUSTOM HEADREST\nCOHR : COMMERCIAL HEADREST\nCH : CUP HOLDER\nCOB : COMMERCIAL BACK\nCUP : CUSTOM POSITIONING STRAP\nCOP : COMMERCIAL POSITIOINING STRAP\nCOS : COMMERCIAL SEAT\nCT : CLEAR TRAY\nCOMP : COMPRESSION SPRING\nFTB : FOOTBOX\nFTR : FOOTREST\nFIP : FOAM IN PLACE\nHLT : HINGED LAP TRAY\nLAT : LATERAL\nMOB : MOLDED BACK\nMODS : MODIFICATIONS\nPAD : HANGERS AND ARMREST\nP - bjb : Portable bottle jack bender\nREP : REPAIRS\nROHO : SEAT AND BACK BOLSTERS\nSB : SUPPORT BRACKETS\nSC : SEAT COVER\nSCU : SEAT CUSHION\nSINT : SEAT INTERFACE\nSKI : SIT SKI\nSLF : SLENDERFENDER FIT KIT\nTEM : TRAY EASY MOUNT\nTSEM : THREAD SLED EASY MOUNT (TOOL LESS ADJUSTMENT)\n\n\nThe user prompt (including the contents of the content/text block of the user role) shall be in JSON format, as per the following specification:\n\nThe user input prompt format is as follows:\n{\nrole: user,\ncontent : [\n{type: text ,\ntext : "{\n\nfile_path : # a unique name for this video that contains both path and file name in the format series_name/this_video.mp4. The purpose of this field is to organize video metadata in such a way to allow multiple video\'s data to reside in the same data structure to aid front end searching and filtering. \n\nfile_commentaries : [ # A list of summary blocks, the number of blocks shall be the number of languages present in the input context subtitle text input. The purpose of this structure is to provide a cumulative summary of the video from its beginning to the currently analyzed video chunk. It may be empty if this is the first chunk of the video being analyzed and no prior inference has produce commentaries thus far. This input consists of previously generated summary output from the current video�s previously analyzed chunks (if any) to ensure all file summary input given so far is considered when asking for a new summary to be built considering the current video chunk. \n{file_commentary : # A string field containing a cumulative summary of the video, ultimately ensuring that the file is summarized using all subtitle and image keyframe data presented this far. \nlanguage : The iso 639-1 two-character language abbreviation}\n],\nsubtitles: [ # This list may be empty if the earliest image timestamp is less than the first subtitle timestamp.\n {timestamp : #string, in srt format\n subtitle : #contents of subtitle generated from the audio track of the video \n language : # two character standard abbreviation of subtitle language, for example "en" is english, "fr" is french, "es" is for spanish as per iso 639-1\n }\n],\nimage_summaries: [# Initially empty, this contains a summary of each image analyzed thus far. The summary must be based on the visual elements of the actual image or images included and must only describe actual visual elements present in the images. The number of objects in this list for a given timestamp is expected to be the number of languages for which subtitles are provided in the user input context. The purpose of this list is to provide context to generate commentaries and cumulative file summaries that draw from multimodal subtitle and image inputs. If this is the first chunk of a video this field may be empty. \n{image_summary: # A detailed textual summary of the image that would contain enough information to teach a skilled intern how to accurately and correctly imitate the skill being demonstrated. Include the items seen in the image, the tools being used, the products being used and worked on, the vocational skill being demonstrated, and the purpose of that skill relative to the purpose of the video up to the current point. Locate the current step in an overarching set of steps and phases similar to a table of contents as many videos and series of videos represent an ordered sequence of vocational skills to accomplish a goal. Even if this summary is viewed out of order it should contain enough detail to locate it in a series of steps. For example : "now that x, y and z are complete as part of phase n, the teacher is now working on step w". Include the motion being represented (ie, cutting, gluing, attaching, bending, punching, sanding, welding) and the kind of object being acted on (ie, headrest, seat cushion, wheelchair back, lateral support bracket). \ntimestamps : [], #list of srt formatted timestamps. More than one timestamp may be present in the list if an image summary is deemed to apply to multiple keyframes to achieve token space compression without loss of expressive and educational accuracy and detail.\nlanguage : #iso639-1 two-character language of summary\n}\n],\n\ncommentaries: [ # presented in each required language for each indicated point of the video. This list is expected to increase in length as analysis of a video proceeds and is meant to replace the subtitles and image_summaries . Commentaries are used as the sole system context when generating e_learning content and the generated e_learning content is the sole input to fine-tuning in pass2 so each generated artifact must faithfully capture the essence of the multimodal input provided from a vocational training point of view. If this is the first chunk of a video this field may be empty. \n {\n timestamps : [], #list of srt formatted timestamps. More than one timestamp may be present in the list if a commentary is deemed to apply to multiple positions in the video to achieve token space compression without loss of expressive and educational accuracy and detail.\n commentary: # contents of the commentary. It is synthesized multimodally both from subtitle input but also from inage summaries derived from keyframe input and is expected to be a better representation and a truer summary of what is being taught in the video at this point than could be elicited from either of the modes individually \n skills: [], # A list of keywords detailing vocational skills demonstrated in this commentary. Some examples include foam_cutting, precision_cutting, angle_alignment.\n language : # two character standard abbreviation of subtitle language, for example "en" is english, "fr" is french, "es" is for spanish as per iso 639-1\n }\n],\n\nsource: # A text string indicating the origin of this content. Options include HumanCreated, AiEnhanced, AiCreated. Assume HumanCreated by default unless instructed otherwise. To clarify, summarization is not considered to be enhancement but rather distilling existing content. Enhancement is considered to be adding original creative content to preexisting content. \n\nmetadata: # cumulative information about the contents of the file processed this far meant to enhance front end user requested filtering. Metadata is provided in all required languages. If this is the first chunk of a video this field may be empty. \n{\nseats_products: [ # cumulative list of the names of seats products referenced in the video so far taken from https://seatshardware.com/collections/all. \n {seats_product: # For example, Thread Sled Easy Mount (TSEM)\n\n#The seats_product metadata keyword must contain a reference to one of the products referenced in the following list of Seats products, accurate as of October 2024, which contains for each product a one line description suffixed by the product URL. This is a summary of https://seatshardware.com/collections/all. \n# 22.5 Degree Disc Assembly - Modular mounting system for seating components. URL: https://seatshardware.com/products/22-5degdiscass\n# Headrest Hardware Repair Kit - Reinforcement kit for i2i linkage styled headrest hardware. URL: https://seatshardware.com/products/headrest-hardware-repair-reinforcement-kit\n# Heavy Duty Support Brackets - Bendable aluminum brackets for custom seating applications. URL: https://seatshardware.com/products/heavy-duty-support-brackets-bendable-flats\n# Joystick Bumper Thumper Kit - Protection system for wheelchair joystick assemblies. URL: https://seatshardware.com/products/joystick-bumper-thumper-kit\n# Just Disc It, For Trays - Disc-based attachment for wheelchair trays. URL: https://seatshardware.com/products/just-disc-it-for-trays\n# PL 003AL12 Aluminum Assembly Brackets - Bendable aluminum assembly brackets. URL: https://seatshardware.com/products/pl-003al12-aluminum-assembly-brackets\n# PL 003ALJH HD Aluminum J-Hooks - Rubber-lined hooks for seat pan installation. URL: https://seatshardware.com/products/pl-003aljh-hd-aluminum-rubber-lined-j-hooks\n# PL 003ST22 Steel Assembly Brackets - Steel brackets for mounting wheelchair accessories. URL: https://seatshardware.com/products/pl-003st22-steel-assembly-brackets\n# Portable Bottle Jack Bender - Portable tool for bending support brackets. URL: https://seatshardware.com/products/portable-bottle-jack-bender\n# SlenderFenders Fit Kits - Wheelchair fender kits designed for various wheel sizes. URL: https://seatshardware.com/products/slender-fender-wheelchair-fenders\n# Space Saver Back-Seat Interface - Aluminum interface for wheelchair seating systems. URL: https://seatshardware.com/products/space-saver-back-seat-interface\n# Swing Away Laterals� Hardware Kit - Kit compatible with Sunrise Medical J3 swing away hardware. URL: https://seatshardware.com/products/swing-away-laterals-hardware-kit\n# Thread Sled Easy Mount Base Model - Adjustable mounting system for custom seating. URL: https://seatshardware.com/products/thread-sled-easy-mount-base-model\n# Thread Sled Easy Mount Headrest Kit - Tool-less headrest mounting system for wheelchairs. URL: https://seatshardware.com/products/thread-sled-easy-mount-headrest-kit\n# Tray Easy Mount - System for attaching custom-built trays to wheelchairs. URL: https://seatshardware.com/products/tray-easy-mount\n\n\nproduct_url: # url of product referenced, for example, https://seatshardware.com/products/portable-bottle-jack-bender\n\nlanguage : # The iso 639-1 two-character language abbreviation pertaining to the product description and url\n}\n],\n\nnon_seats_products :\n[\n{\nnon_seats_product: # name of third-party non-seats product referenced in video. One example could be a Sunrise Medical Quickie wheelchair base. Another example could be a padded Bodypoint strap.\n\nproduct_url: # url of product referenced. For example : https://www.sunrisemedical.com/manual-wheelchairs/quickie . Another example : https://www.bodypoint.com/ECommerce/product/evof/evoflex-\n\nlanguage : # The iso 639-1 two-character language abbreviation pertaining to the product description and url\n\n}\n\n],\n\nsearch_keywords : # cumulative list of search terms generated for the video \n[\n{\nsearch_keyword: # A keyword or short multi word phrase that will point to this video if typed by a user in a search bar\n\nlanguage : # The iso 639-1 two-character language abbreviation pertaining to the product description and url\n\n}\n\n],\n\nskills: [], # A list of keywords detailing vocational skills demonstrated in this video. Some examples include foam_cutting, precision_cutting, angle_alignment.\n}}"},\n\n{type: image_url,\nimage_url : { url : dropbox url to image},\ntimestamp : string, in srt format, accurate to the millisecond }\n]\n}\n\nThat covers the input format.\n\n\n\n\nRegarding token space management if token space becomes tight, blocks of potentially redundant image summaries may be consolidated by expressing a summary with a list of timestamps. \n\nBe careful to not over summarize however. If the keyframe differences are important in order to properly capture the required vocational skill being used at that moment then avoid removing detail. For example, cutting foam at two different angles may be crucial to the skill being demonstrated. Cutting foam the exact same way may be considered duplication. \n\nAlso, note that identical steps could exist in many different processes so if you collapse identical steps if their context differs then combine details appropriately. For example, the same kind of cut could be made to a piece of foam as part of a new cushion or a cushion repair. If two such summaries are combined then the summary must indicate this step could be executed as part of a cushion creation or repair. \n\nEnsure that any summarization is done only on truly redundant data. \n\n\ngenerate an output in the following JSON format:\n{\nfile_path : # a unique name for this video that contains both path and file name in the format series_name/this_video.mp4. The purpose of this field is to organize video metadata in such a way to allow multiple video�s data to reside in the same data structure to aid front end searching and filtering. \n\nfile_commentaries : [ # A list of summary blocks, the number of blocks shall be the number of languages present in the input context subtitle text input. The purpose of this structure is to provide a summary of the video from the beginning to the most recently analyzed video chunk. This output feeds back in to future inputs to ensure all subtitle and image input given so far is properly summarized. This field is cumulative and is expected no grow over time as file summaries produced are combined with file summaries already present in the input as each video chunk is processed. This summary is expected to be produced from both subtitle and image multimodal inputs. \n{file_commentary : # A string field containing a summary of the video, utilizing data from both the subtitles and image keyframes in the input context and merging with the previous file summary if provided in the input context, ensuring that the commentary covers the entire video up to this point. \nlanguage : The iso 639-1 two-character language abbreviation}\n],\n\nimage_summaries : # A list of objects describing each image present in the input user context. For each image timestamp, one object for each subtitle language is expected to be described. \n[\n{\ntimestamps : [], #list of srt formatted timestamps. More than one timestamp may be present in the list if an image summary is deemed to apply to multiple keyframes to achieve token space compression without loss of expressive and educational accuracy and detail.\n\nimage_summary: # A detailed summary of the image that would contain enough information to teach a skilled intern how to imitate the skill being demonstrated. The summary must be based on the visual elements of the actual image or images included and must only describe visual elements actually present in the images. Include the items seen in the image, the tools being used, the products being used and worked on, the the vocational skill being demonstrated, and the purpose of that skill relative to the purpose of the video up to the current point. Include the motion being represented (ie, cutting, gluing, attaching, bending) and the kind of object being acted on (ie, headrest, seat cushion, wheelchair back, lateral support bracket). \n\nlanguage : # The iso 639-1 two-character language abbreviation\n}],\n\ncommentaries: [ # presented in each required language for each indicated point of the video. This list is expected to increase in length as analysis of a video proceeds and is meant to replace the subtitles and image summaries without loss of the intrinsic training detail. Commentaries are used as the sole system context to produce e_learning content which in turn are the sole input when doing fine-tuning in pass2 so they must faithfully capture the vocational training essence of the multimodal input provided. Sequences of operations must be broken up into logical steps and each commentary must include enough contextual detail to stand on its own if accessed directly by a user who does not consult neighbouring commentary blocks. \n {\n timestamps : [], #list of srt formatted timestamps. More than one timestamp may be present in the list if a commentary is deemed to apply to multiple positions in the video to achieve token space compression without loss of expressive and educational accuracy and detail.\n commentary: # contents of the commentary. It is synthesized multimodally both from subtitle input but also from textual summaries of keyframe input and is expected to be a better representation and a truer summary of what is being taught in the video at this point than could be elicited from either of the modes individually. \n skills: [], # A list of keywords detailing vocational skills demonstrated in this commentary. Some examples include foam_cutting, precision_cutting, angle_alignment.\n language : # two character standard abbreviation of subtitle language, for example "en" is english, "fr" is french, "es" is for spanish as per iso 639-1\n }\n],\n\nsource: # A text string indicating the origin of this content. Options include HumanCreated, AiEnhanced, AiCreated. Assume HumanCreated by default unless instructed otherwise. \n\nmetadata: # cumulative information about the contents of the file processed this far meant to enhance front end user requested filtering. Metadata must be generated in all required languages. \n{\nseats_products: [{\nseats_product: # cumulative list of the names of distinct seats products referenced in the video so far from https://seatshardware.com/collections/all. For example, Thread Sled Easy Mount (TSEM). This list must not contain duplicates. \n\nproduct_url: # url of product referenced, for example, https://seatshardware.com/products/portable-bottle-jack-bender\n\nlanguage : # The iso 639-1 two-character language abbreviation pertaining to the product description and url\n}\n],\n\nnon_seats_products : # This list must not contain duplicates. \n[\n{\nnon_seats_product: # name of third-party non-seats product referenced in video. One example could be a Sunrise Medical Quickie wheelchair base. Another example could be a padded Bodypoint strap.\n\nproduct_url: # url of product referenced. For example : https://www.sunrisemedical.com/manual-wheelchairs/quickie . Another example : https://www.bodypoint.com/ECommerce/product/evof/evoflex-\n\nlanguage : # The iso 639-1 two-character language abbreviation pertaining to the product description and url\n\n}\n\n],\n\nsearch_keywords : # cumulative list of search terms generated for the video \n[\n{\nsearch_keyword: # A keyword or short multi word phrase that will point to this video if typed by a user in a search bar\n\nlanguage : # The iso 639-1 two-character language abbreviation pertaining to the product description and url\n\n}\n\n],\n\nskills: [], # A list of keywords detailing vocational skills demonstrated in this video. Some examples include foam_cutting, precision_cutting, angle_alignment.\n\n}\n}\n\n'}, {'role': 'user', 'content': '"[{"type": "text", "text": "{\\"file_commentaries\\": [], \\"subtitles\\": [{\\"timestamp\\": \\"00:00:00,000 --> 00:00:06,000\\", \\"text\\": \\"Hello and welcome to Seats. I want to show you a few videos on something I\'m\\", \\"language\\": \\"en\\"}, {\\"timestamp\\": \\"00:00:00,000 --> 00:00:06,000\\", \\"text\\": \\"Bonjour et bienvenue \\\\u00e0 Seats. je veux vous montrer quelques vid\\\\u00e9os sur quelque chose que je suis\\", \\"language\\": \\"fr\\"}, {\\"timestamp\\": \\"00:00:00,000 --> 00:00:06,000\\", \\"text\\": \\"Hola y bienvenido a Seats. quiero mostrarle algunos videos sobre algo que soy\\", \\"language\\": \\"es\\"}], \\"image_summaries\\": [], \\"commentaries\\": [], \\"source\\": \\"HumanCreated\\", \\"metadata\\": {}, \\"file_path\\": \\"HOW TO CONVERT A PADDED BODYPOINT STRAP INTO WRIST CUFFS FOR BOTH YOUR CLIENT\'S SAFETY AND HYGIENE (Video\'s 1-9RCTCOP)/1RCTCOP - A BRIEF DISCUSSION WHY WE CHOOSE THIS BELT DESIGN. FOR RACHEL - 2023-02-11 001.mp4\\"}"}, {"type": "image_url", "image_url": {"url": ["data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAB4AAAAQ4CAIAAABnsVYUAAAgAE ... [truncated] ... RXxVvlAXRwKG8qX1V8UnH8B3jby3fY1LwAAAAAAElFTkSuQmCC"](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAB4AAAAQ4CAIAAABnsVYUAAAgAE...[truncated]...RXxVvlAXRwKG8qX1V8UnH8B3jby3fY1LwAAAAAAElFTkSuQmCC)", "detail": "high", "timestamp": "00:00:02,906"}}]'}], 'model': 'gpt-4o', 'frequency_penalty': 0, 'max_tokens': 15000, 'presence_penalty': 0, 'temperature': 0.2}}
> 2024-10-26 21:07:27,895 DEBUG: Sending HTTP Request: POST https://api.openai.com/v1/chat/completions
> 2024-10-26 21:07:27,897 DEBUG: connect_tcp.started host='api.openai.com' port=443 local_address=None timeout=5.0 socket_options=None
> 2024-10-26 21:07:27,946 DEBUG: connect_tcp.complete return_value=<httpcore._backends.sync.SyncStream object at 0x7feaa7c543a0>
> 2024-10-26 21:07:27,946 DEBUG: start_tls.started ssl_context=<ssl.SSLContext object at 0x7feaab6a7ec0> server_hostname='api.openai.com' timeout=5.0
> 2024-10-26 21:07:27,957 DEBUG: start_tls.complete return_value=<httpcore._backends.sync.SyncStream object at 0x7feaa7c54370>
> 2024-10-26 21:07:27,958 DEBUG: send_request_headers.started request=<Request [b'POST']>
> 2024-10-26 21:07:27,959 DEBUG: send_request_headers.complete
> 2024-10-26 21:07:27,960 DEBUG: send_request_body.started request=<Request [b'POST']>
> 2024-10-26 21:07:28,072 DEBUG: send_request_body.complete
> 2024-10-26 21:07:28,072 DEBUG: receive_response_headers.started request=<Request [b'POST']>
> 2024-10-26 21:07:28,460 DEBUG: receive_response_headers.complete return_value=(b'HTTP/1.1', 400, b'Bad Request', [(b'Date', b'Sun, 27 Oct 2024 01:07:29 GMT'), (b'Content-Type', b'application/json'), (b'Content-Length', b'290'), (b'Connection', b'keep-alive'), (b'access-control-expose-headers', b'X-Request-ID'), (b'openai-organization', b'user-m8spcgbdft4mtc5walogmvdr'), (b'openai-processing-ms', b'79'), (b'openai-version', b'2020-10-01'), (b'x-ratelimit-limit-requests', b'5000'), (b'x-ratelimit-limit-tokens', b'800000'), (b'x-ratelimit-remaining-requests', b'4999'), (b'x-ratelimit-remaining-tokens', b'235490'), (b'x-ratelimit-reset-requests', b'12ms'), (b'x-ratelimit-reset-tokens', b'42.338s'), (b'x-request-id', b'req_6d8561a71dd342e16db9bea54ace3376'), (b'strict-transport-security', b'max-age=31536000; includeSubDomains; preload'), (b'CF-Cache-Status', b'DYNAMIC'), (b'Set-Cookie', b'__cf_bm=yM_H4K_Skce.eu9BHWcqR30dZh4pl9NB8hhMlFOUpuw-1729991249-1.0.1.1-1ScK7nz34q8mKMJZT58CXTfok3uhIIfXRHvPDtxJlJmT7SidSeBfwYLTUKjeLarnFR0iX8xu7cWaYdiz5bbToQ; path=/; expires=Sun, 27-Oct-24 01:37:29 GMT; domain=.api.openai.com; HttpOnly; Secure; SameSite=None'), (b'X-Content-Type-Options', b'nosniff'), (b'Set-Cookie', b'_cfuvid=rYa76V_wIF102OqieEbs17JUjU3W_ZEdVSHqRQVGHco-1729991249588-0.0.1.1-604800000; path=/; domain=.api.openai.com; HttpOnly; Secure; SameSite=None'), (b'Server', b'cloudflare'), (b'CF-RAY', b'8d8eca1add39a2b7-YUL'), (b'alt-svc', b'h3=":443"; ma=86400')])
> 2024-10-26 21:07:28,466 INFO: HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 400 Bad Request"
> 2024-10-26 21:07:28,467 DEBUG: receive_response_body.started request=<Request [b'POST']>
> 2024-10-26 21:07:28,468 DEBUG: receive_response_body.complete
> 2024-10-26 21:07:28,469 DEBUG: response_closed.started
> 2024-10-26 21:07:28,469 DEBUG: response_closed.complete
> 2024-10-26 21:07:28,470 DEBUG: HTTP Response: POST https://api.openai.com/v1/chat/completions "400 Bad Request" Headers([('date', 'Sun, 27 Oct 2024 01:07:29 GMT'), ('content-type', 'application/json'), ('content-length', '290'), ('connection', 'keep-alive'), ('access-control-expose-headers', 'X-Request-ID'), ('openai-organization', 'user-m8spcgbdft4mtc5walogmvdr'), ('openai-processing-ms', '79'), ('openai-version', '2020-10-01'), ('x-ratelimit-limit-requests', '5000'), ('x-ratelimit-limit-tokens', '800000'), ('x-ratelimit-remaining-requests', '4999'), ('x-ratelimit-remaining-tokens', '235490'), ('x-ratelimit-reset-requests', '12ms'), ('x-ratelimit-reset-tokens', '42.338s'), ('x-request-id', 'req_6d8561a71dd342e16db9bea54ace3376'), ('strict-transport-security', 'max-age=31536000; includeSubDomains; preload'), ('cf-cache-status', 'DYNAMIC'), ('set-cookie', '__cf_bm=yM_H4K_Skce.eu9BHWcqR30dZh4pl9NB8hhMlFOUpuw-1729991249-1.0.1.1-1ScK7nz34q8mKMJZT58CXTfok3uhIIfXRHvPDtxJlJmT7SidSeBfwYLTUKjeLarnFR0iX8xu7cWaYdiz5bbToQ; path=/; expires=Sun, 27-Oct-24 01:37:29 GMT; domain=.api.openai.com; HttpOnly; Secure; SameSite=None'), ('x-content-type-options', 'nosniff'), ('set-cookie', '_cfuvid=rYa76V_wIF102OqieEbs17JUjU3W_ZEdVSHqRQVGHco-1729991249588-0.0.1.1-604800000; path=/; domain=.api.openai.com; HttpOnly; Secure; SameSite=None'), ('server', 'cloudflare'), ('cf-ray', '8d8eca1add39a2b7-YUL'), ('alt-svc', 'h3=":443"; ma=86400')])
> 2024-10-26 21:07:28,471 DEBUG: request_id: req_6d8561a71dd342e16db9bea54ace3376
> 2024-10-26 21:07:28,471 DEBUG: Encountered httpx.HTTPStatusError
> Traceback (most recent call last):
> File "/home/mdear/workspaces/venv/captions/lib/python3.10/site-packages/openai/_base_client.py", line 1037, in _request
> response.raise_for_status()
> File "/home/mdear/workspaces/venv/captions/lib/python3.10/site-packages/httpx/_models.py", line 763, in raise_for_status
> raise HTTPStatusError(message, request=request, response=self)
> httpx.HTTPStatusError: Client error '400 Bad Request' for url 'https://api.openai.com/v1/chat/completions'
> For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400
> 2024-10-26 21:07:28,474 DEBUG: Not retrying
> 2024-10-26 21:07:28,475 DEBUG: Re-raising status error
> 2024-10-26 21:07:31,001 ERROR: Failed to perform inference for ./1RCTCOP - A BRIEF DISCUSSION WHY WE CHOOSE THIS BELT DESIGN. FOR RACHEL - 2023-02-11 001.mp4: Error code: 400 - {'error': {'message': "Invalid 'messages[1].content': string too long. Expected a string with maximum length 1048576, but got a string with length 2237388 instead.", 'type': 'invalid_request_error', 'param': 'messages[1].content', 'code': 'string_above_max_length'}}
>
>