How to cut input text based on Open AI tokens

Hi, I have this call to Open AI API and I use tiktoken to count the number of tokens before I sent the request. The problem is that when a text exceeds the 16k token limit I try to cut the text but at the moment it doesn’t work correct. For example the code now cuts a 30k token text not to the max, which is 16k, but to 3,5k tokens. Is there a way in python to cut the text correctly based on the tokens. This is the code:

num_tokens = self.count_tokens_from_text(text, "cl100k_base")
            print("Tokens:", num_tokens)
            model = 'gpt-3.5-turbo-16k'
            max_tokens = 16000  # Maximum tokens allowed by OpenAI API
            if num_tokens > max_tokens:
                # Crop the text to max_tokens
                text = text[:max_tokens + 1]  # Note the +1 here


            print(self.count_tokens_from_text(text, "cl100k_base"))

            input_text = "Develop a 5-sentence script appropriate for a YouTube-style video, "\
                         "using the provided Wikipedia article as the primary source of information."\
                         "I will provide you with a list of image descriptions."\
                         "These are the images that will be used in the video and "\
                         "I want the script to be based on these images."\
                         "Avoid unnecessary details such as suggestions about the pictures in [] brackets "\
                         "and the 'Narrator:' part before each paragraph. "\
                         "Deliver the text as one cohesive string and most importantly the script "\
                         "should be no longer than 5 sentences no matter the length of the article provided."\
                         "This is the provided Wikipedia article: "\
                         f"{text} These are the image descriptions: {image_titles}"

            input_text_tokens = self.count_tokens_from_text(input_text, "cl100k_base")
            print("Input script tokens:", input_text_tokens)

            completion = client.chat.completions.create(
                model=model,
                messages=[
                    {
                        "role": "system",
                        "content": "Develop a 5-sentence script appropriate for a YouTube-style video, "
                                   "using the provided Wikipedia article as the primary source of information."
                                   "I will provide you with a list of image descriptions."
                                   "These are the images that will be used in the video and "
                                   "I want the script to be based on these images."
                                   "Avoid unnecessary details such as suggestions about the pictures in [] brackets "
                                   "and the 'Narrator:' part before each paragraph. "
                                   "Deliver the text as one cohesive string and most importantly the script "
                                   "should be no longer than 5 sentences no matter the length of the article provided."
                    },
                    {
                        "role": "user",
                        "content": f"These are the image descriptions: {image_titles}."
                                   f"And this is the provided Wikipedia article: {text}"
                    },
                ],
            )

            message = completion.choices[0].message
            answer = message.content
            output_text_tokens = self.count_tokens_from_text(answer, "cl100k_base")
            print("Output script tokens:", output_text_tokens)
            return answer

The encoding of the whole text you provide to get a token count takes a bit of computation time, so you’d want to minimize that.

First you can do an informed search. Compare the string length to the token count length. You’ll have measured a characters per token. Then you can bisect and count tokens to get a much closer result, and find if higher or lower than your maximum.

Then do an intelligent binary search in chunks to discover a result closer to the token count you want and the character position it appears near.

Then split naturally around that point at words or sentences.