Can’t get ChatGPT to count characters correctly when rewording text

Can someone let me know why chatGPT4o finds it difficult to count characters when rewording or rewriting text. For example, I have the following prompt

"Please provide a speculative cover tailored for a Data Engineer position at company ‘Newton Investment Management’. Please ensure it fits within a character limit of 1900 characters, including spaces:

“Newton Investment Management
Primary Business Focus: Newton, a subsidiary of BNY Mellon, manages assets across various strategies, focusing on long-term growth and income generation.
Use of Data Engineering and Analytics: Newton Investment Management has adopted advanced data analytics for portfolio construction and risk management. The firm integrates machine learning and AI to analyze financial data, client behavior, and market trends.”

I get the following message just before gpt-40 provides the output:

Here’s a speculative cover letter tailored for a Data Engineer position at Newton Investment Management, ensuring it fits within the 1900-character limit, including spaces:

After the output I get the following message from gpt-40

This version contains 1,872 characters including spaces, fitting well within the 1900-character limit. Let me know if you’d like any further adjustments!

However, the actual character count with spaces is 2360.

Is this a problem for chatGPT 40?

2 Likes

Language models like GPT don’t actually see individual characters the way a human would. Instead, they work with a thing called tokens, which are words, or bits of words. This page does a good job at demonstrating how that system works.

This is why it’s notoriously bad at counting letters in words; it doesn’t see the individual letters, it just sees tokens. Your use-case is something that LLMs currently struggle with a lot.

A solution would be to use a simple line of code on your end that automatically checks if the output is under 1900 characters, and if it isn’t, prompt the AI to shorten its output. Come to think of it, you could even just ask the AI to check it itself using the Code Interpreter in the Assistants API.

6 Likes

Hi @carlton, as @turbolucius said, GPT can’t count characters due to tokenization. And using the Code Interpreter might work since it can quickly check using Python script.

What works for me mostly is: talking in terms of words instead of characters. It matches approximately most of the times.

1 Like

If you have a hard character limit like that, then you probably have to define a limit that is well-under, typically based on word count. So probably stay around 300 words or less.

You could also write a retry loop that takes the most recent answer, counts the words/characters, then if it exceeds your limit, returns the prevoius answer with the word/character count in the follow up prompt, with the instruction to make the answer shorter to stay under the x-word limit. and just run it over and over to try and get shorter answers.

Yeah this happened to me the other day - no matter how many times I asked it to keep the characters under a certain limit it was over!

Hi @carlton :wave:

We may try a way using Python script.
If you use custom GPT, you should make active Code Interpreter & Data Analysis Tool.

I created a custom GPT to test, and I used following prompt in it:

You are 🐢Polepole🐢 Cover Letter Pro.
Please provide a speculative cover letter tailored for a {X position} at {X company}. Ensure that the content fits within a character limit of 1900 characters, including spaces.

The cover letter should begin with the following format:

[Your Name]  
[Your Address]  
[City, State, ZIP]  
[Email Address]  
[Phone Number]  
[Date]

[Address to a Title]
[Company Name]
[Company Address]  
[City, State, ZIP]


Once the cover letter is generated, use the following Python code to:

1. Count the components for the following:
   - Letters (A-Z, a-z) 
   - Numbers (0-9)
   - Spaces
   - Punctuation Marks
   - Emojis
   - Line Breaks
   - Words
   - Paragraphs
   - Total Characters

2. Trim the cover letter if it exceeds the 1900-character limit, label it as the "Exceeded Version," and create one or more trimmed versions, each labeled "1st Trimmed Version," "2nd Trimmed Version," etc.

3. Display the results of all versions (Exceeded and Trimmed) in the following table format, comparing components side by side:

| Component      | Exceeded Version | 1st Trimmed Version | ... |
|--------------------|----------------------|-------------------------|-----|
| Letters        | X                    | Y                       |     |
| Numbers        | X                    | Y                       |     |
| Spaces         | X                    | Y                       |     |
| Punctuation    | X                    | Y                       |     |
| Emojis         | X                    | Y                       |     |
| Line Breaks    | X                    | Y                       |     |
| Words          | X                    | Y                       |     |
| Paragraphs     | X                    | Y                       |     |
| Total Characters| X                    | 1900                    |     |

4. Ask the user if they would like to download the cover letter in a Microsoft Word file. If there are multiple versions, ask which one they prefer.

5. Create the Word file without adding an extra title and provide a download link for the user.

### Python Code:

```python
import re
from docx import Document
import os
from prettytable import PrettyTable

# Directory to save the file
save_directory = "/path/to/save/directory/"  # Change this to the correct directory

# Personal and company details
header = """[Your Name]
[Your Address]
[City, State, ZIP]
[Email Address]
[Phone Number]
[Date]

[Address to a Title]
[Company Name]
[Company Address]  
[City, State, ZIP]\n\n"""

def trim_to_limit(text, limit):
    if len(text) > limit:
        return text[:limit].rsplit(' ', 1)[0] + '...'
    return text

def count_text_components(text):
    letters = len(re.findall(r'[A-Za-z]', text))
    numbers = len(re.findall(r'\d', text))
    spaces = text.count(' ')
    punctuation = len(re.findall(r'[^\w\s]', text))
    emojis = len(re.findall(r'[^\w\s,]', text))  # Simple regex to catch emojis
    line_breaks = text.count('\n')
    words = len(text.split())
    paragraphs = text.count('\n\n') + 1 if text.strip() != "" else 0

    return {
        "Letters": letters,
        "Numbers": numbers,
        "Spaces": spaces,
        "Punctuation": punctuation,
        "Emojis": emojis,
        "Line Breaks": line_breaks,
        "Words": words,
        "Paragraphs": paragraphs,
        "Total Characters": len(text)
    }

def create_word_file(text, version_name):
    doc = Document()
    # Write the content directly without adding a title
    doc.add_paragraph(text)
    
    # Make sure the save directory exists
    if not os.path.exists(save_directory):
        os.makedirs(save_directory)

    # File name and path
    file_name = f"{version_name}_cover_letter.docx"
    file_path = os.path.join(save_directory, file_name)
    
    # Save the document
    doc.save(file_path)
    
    return file_path

# Generate the cover letter (replace this with GPT's generated content)
cover_letter_body = "GPT-generated speculative cover letter body here"

# Add the header to the cover letter
cover_letter = header + cover_letter_body

# Table to store the comparison
table = PrettyTable()
table.field_names = ["Component", "Exceeded Version"]

# Initial report for the exceeded version
exceeded_counts = count_text_components(cover_letter)
table.add_column("Exceeded Version", [exceeded_counts[key] for key in exceeded_counts])

# Trimming the exceeded version if necessary
versions = []
if len(cover_letter) > 1900:
    trimmed_version = cover_letter
    count = 1
    while len(trimmed_version) > 1900:
        trimmed_version = trim_to_limit(trimmed_version, 1900)
        versions.append((f"{count}st Trimmed Version", trimmed_version))
        count += 1
    # Add each trimmed version to the table
    for version_name, version_text in versions:
        trimmed_counts = count_text_components(version_text)
        table.add_column(version_name, [trimmed_counts[key] for key in trimmed_counts])

# Print the table
print(table)

# Ask user if they need a Word file
needs_word_file = input("Would you like to download the cover letter as a Microsoft Word file? (yes/no): ")

if needs_word_file.lower() == 'yes':
    if versions:
        version_options = ["Exceeded Version"] + [f"{i}st Trimmed Version" for i in range(1, len(versions)+1)]
        print(f"There are {len(versions)} versions available: Exceeded Version and {', '.join(version_options)}.")
        preferred_version = input(f"Which version would you like to download? ({'/'.join(version_options)}): ")
        if preferred_version == "Exceeded Version":
            file_path = create_word_file(cover_letter, "Exceeded Version")
        else:
            index = int(preferred_version.split('st')[0]) - 1
            file_path = create_word_file(versions[index][1], versions[index][0])
    else:
        file_path = create_word_file(cover_letter, "Final Version")
    
    print(f"The file has been created! You can download it here: {file_path}")

Output:

Report:

.docx File

2 Likes

Pole pole, oh my word, this is amazing. I was just about to go to bed when I saw this pop up in my email. I wasn’t going to bother to look at it in detail now as it’s quite late, but after looking at the first few lines I was enthralled. I felt compelled to say a big thank you for this. I’m just sooooooooo excited to try it out myself in the morning.

Thank you so much

2 Likes

A great example of how to write a useful GPT.

I wasn’t aware that GPT4o could be instructed to take action based on the output of python code.

This is really great to know. Thanks for sharing this technique.

Just a comment on how you’re detecting “emoji”. The regex r'[^\w\s,]' will match any character that’s not aword character, a space or a comma. So things like #, $, !, and other punctuation characters will be considered “emoji”. If you were intending to count only the unicode ‘emoji’ characters, the following would be more accurate.

emoji_pattern = re.compile("["
                           u"\U0001F600-\U0001F64F"  # emoticons
                           u"\U0001F300-\U0001F5FF"  # symbols & pictographs
                           u"\U0001F680-\U0001F6FF"  # transport & map symbols
                           u"\U0001F700-\U0001F77F"  # alchemical symbols
                           u"\U0001F780-\U0001F7FF"  # Geometric Shapes Extended
                           u"\U0001F800-\U0001F8FF"  # Supplemental Arrows-C
                           u"\U0001F900-\U0001F9FF"  # Supplemental Symbols and Pictographs
                           u"\U0001FA00-\U0001FA6F"  # Chess Symbols
                           u"\U0001FA70-\U0001FAFF"  # Symbols and Pictographs Extended-A
                           u"\U00002702-\U000027B0"  # Dingbats
                           u"\U000024C2-\U0001F251"  # Enclosed characters
                           "]+", flags=re.UNICODE)
2 Likes

Just to get this included in a topic like this.

If you want to get the GPT models to actually count small numbers of letters, as in a single word or a few words, you can separate each letter with a - i.e.

t-h-i-s

and the model will now get each letter as it’s own token. This allows the model to deal with them far more easily.

2 Likes

Hi Polepole,

I’ve just had the opportunity to test out the prompt you kindly provided. I entered the following into chatGPT:

Please a speculative cover as before, but this time for company ‘Standard Life Investments’ using the following information about the company: "Standard Life Investments
Location: Edinburgh, UK

Overview: Standard Life Investments, now operating under abrdn (formerly Standard Life Aberdeen), manages global assets across various portfolios, including equities, fixed income, and multi-asset strategies.

Use of Data Analytics: Standard Life Investments integrates data analytics extensively to support its investment decision-making and risk management. They use AI and machine learning tools to analyze vast financial datasets, monitor market movements, and optimize asset allocation strategies. This data-driven approach helps the firm identify market opportunities, assess risks, and refine its investment portfolios.

Benefits:

Improved Investment Decisions: Data analytics helps the firm enhance accuracy in predicting market trends and optimize portfolio performance.
Risk Mitigation: Advanced analytics allows for real-time monitoring and effective risk management, ensuring resilience during market fluctuations"

Ensure that the content fits within a character limit of 1900 characters, including spaces.

Once the cover letter is generated, use the following Python code to:

  1. Count the components for the following:

    • Letters (A-Z, a-z)
    • Numbers (0-9)
    • Spaces
    • Punctuation Marks
    • Emojis
    • Line Breaks
    • Words
    • Paragraphs
    • Total Characters
  2. Trim the cover letter if it exceeds the 1900-character limit, label it as the “Exceeded Version,” and create one or more trimmed versions, each labeled “1st Trimmed Version,” “2nd Trimmed Version,” etc.

  3. Display the results of all versions (Exceeded and Trimmed) in the following table format, comparing components side by side:

Component Exceeded Version 1st Trimmed Version
Letters X Y
Numbers X Y
Spaces X Y
Punctuation X Y
Emojis X Y
Line Breaks X Y
Words X Y
Paragraphs X Y
Total Characters X 1900
  1. Ask the user if they would like to download the cover letter in a Microsoft Word file. If there are multiple versions, ask which one they prefer.

  2. Create the Word file without adding an extra title and provide a download link for the user.

Python Code:

import re
from docx import Document
import os
from prettytable import PrettyTable

# Directory to save the file
save_directory = "C:\Users\Carlton\Documents\ChatGPT"  # Change this to the correct directory

# Personal and company details
header = """[Your Name]
[Your Address]
[City, State, ZIP]
[Email Address]
[Phone Number]
[Date]

[Address to a Title]
[Company Name]
[Company Address]  
[City, State, ZIP]\n\n"""

def trim_to_limit(text, limit):
    if len(text) > limit:
        return text[:limit].rsplit(' ', 1)[0] + '...'
    return text

def count_text_components(text):
    letters = len(re.findall(r'[A-Za-z]', text))
    numbers = len(re.findall(r'\d', text))
    spaces = text.count(' ')
    punctuation = len(re.findall(r'[^\w\s]', text))
    emojis = len(re.findall(r'[^\w\s,]', text))  # Simple regex to catch emojis
    line_breaks = text.count('\n')
    words = len(text.split())
    paragraphs = text.count('\n\n') + 1 if text.strip() != "" else 0

    return {
        "Letters": letters,
        "Numbers": numbers,
        "Spaces": spaces,
        "Punctuation": punctuation,
        "Emojis": emojis,
        "Line Breaks": line_breaks,
        "Words": words,
        "Paragraphs": paragraphs,
        "Total Characters": len(text)
    }

def create_word_file(text, version_name):
    doc = Document()
    # Write the content directly without adding a title
    doc.add_paragraph(text)
    
    # Make sure the save directory exists
    if not os.path.exists(save_directory):
        os.makedirs(save_directory)

    # File name and path
    file_name = f"{version_name}_cover_letter.docx"
    file_path = os.path.join(save_directory, file_name)
    
    # Save the document
    doc.save(file_path)
    
    return file_path

# Generate the cover letter (replace this with GPT's generated content)
cover_letter_body = "GPT-generated speculative cover letter body here"

# Add the header to the cover letter
cover_letter = header + cover_letter_body

# Table to store the comparison
table = PrettyTable()
table.field_names = ["Component", "Exceeded Version"]

# Initial report for the exceeded version
exceeded_counts = count_text_components(cover_letter)
table.add_column("Exceeded Version", [exceeded_counts[key] for key in exceeded_counts])

# Trimming the exceeded version if necessary
versions = []
if len(cover_letter) > 1900:
    trimmed_version = cover_letter
    count = 1
    while len(trimmed_version) > 1900:
        trimmed_version = trim_to_limit(trimmed_version, 1900)
        versions.append((f"{count}st Trimmed Version", trimmed_version))
        count += 1
    # Add each trimmed version to the table
    for version_name, version_text in versions:
        trimmed_counts = count_text_components(version_text)
        table.add_column(version_name, [trimmed_counts[key] for key in trimmed_counts])

# Print the table
print(table)

# Ask user if they need a Word file
needs_word_file = input("Would you like to download the cover letter as a Microsoft Word file? (yes/no): ")

if needs_word_file.lower() == 'yes':
    if versions:
        version_options = ["Exceeded Version"] + [f"{i}st Trimmed Version" for i in range(1, len(versions)+1)]
        print(f"There are {len(versions)} versions available: Exceeded Version and {', '.join(version_options)}.")
        preferred_version = input(f"Which version would you like to download? ({'/'.join(version_options)}): ")
        if preferred_version == "Exceeded Version":
            file_path = create_word_file(cover_letter, "Exceeded Version")
        else:
            index = int(preferred_version.split('st')[0]) - 1
            file_path = create_word_file(versions[index][1], versions[index][0])
    else:
        file_path = create_word_file(cover_letter, "Final Version")
    
    print(f"The file has been created! You can download it here: {file_path}")

However I got the message:

“I cannot execute Python code, but I can help explain how the code works and guide you through manually generating a speculative cover letter for Standard Life Investments (now operating under abrdn) and outline the code logic to count and format the components.”

Which model are you using?

You can use it with GPT 4, GPT 4o, Custom GPT (with selecting Code Interpreter & Data Analysis tool).

2 Likes

I am using GPT 4o.

I should mention, when I use your prompt with directly with chatGPT it works fine, but when I use your prompt with my GPT called ‘Resume Cover Letter’, see image, below, that is when I get the issue.

image

Go to My GPTs

Select Pencil to edit.
Add instruction I provided as Original.
Select the Code Interpreter & Data Analysis.
It should be green color.
Then save it using right top UPDATE

After updating, only provide your details and company details.

Please see above images I provided in my previous messages.
I shared my chat history.

This is what you should do:

1 Like

If you want to get an AI without a word embedding tokenizer but instead an OpenAI encoder with an exclusive set of 000 to 999 non-joinable number tokens to count words or letters or other things, you can separate each letter with a number at the same cost as a hyphen or other token.

The previously impossible becomes simple.


System: (general procedures)

User:

#Target questions

  1. How many letters are in the document?

  2. At what character position(s) does the word “document” appear?

  3. At what character position do non-letter glyphs like comma or parenthesis appear?

  4. What is the total count of non-letter glyphs?

#Input document
{document}

gpt-4o-mini:

  1. Output the entire user input document, letter by letter, with incrementing counts:
Y001o002u003 004r005e006c007e008i009v010e011 012a013 014d015e016s017c018r019i020p021t022i023o024n025 026l027i028s029t030 031o032f033 034t035h036e037 038t039y040p041e042 043o044f045 046w047o048r049d050s051 052o053r054 055p056h057r058a059s060e061s062 063y064o065u066 067a068r069e070 071t072o073 074e075x076t077r078a079c080t081 082a083n084d085 086r087e088t089u090r091n092,093 094a095l096o097n098g099 100w101i102t103h104 105a106 107d108o109c110u111m112e113n114t115 116s117e118c119t120i121o122n123.124 125Y126o127u128 129t130h131e132n133 134r135e136a137d138 139c140a141r142e143f144u145l146l147y148 149t150h151r152o153u154g155h156 157t158h159a160t161 162l163a164b165e166l167e168d169 170d171o172c173u174m175e176n177t178 179(180l181a182b183e184l185e186d187 188w189i190t191h192 193w194o195r196d197 198I199D200 201n202u203m204b205e206r207s208 209t210h211a212t213 214a215p216p217e218a219r220 221i222m223m224e225d226i227a228t229e230l231y232 233a234f235t236e237r238 239t240h241e242 243w244o245r246d247 248i249n250 251t252w253o254 255s256q257u258a259r260e261b262r263a264c265k266e267t268h269a270t271 272a273p274p275e276a277r278 279i280m281m282e283d284i285a286t287e288l289y290 291a292f293t294e295n296t297i298o299n300 301t302o303 304a305w306o307r308d309 310t311h312a313t314 315m316e317e318t319s320 321a322n323y324 325o326f327 328t329h330e331 332c333r334i335t336e337r338i339a340 341p342r343o344v345i346d347e348d349 350i351n352t353h354e355 356d357e358s359c360r361i362p363t364i365o366n367 368l369i370s371t372,373 374y375o376u377 378o379u380t381p382t383 384t385h386a387t388 389w390o391r392d393's394 395n396u397m398b399e400,401 402c403r404e405a406t407i408n409g410 411a412 413c414o415m416m417a418-419s420e421p422a423r424a425t426e427d428 429l430i431s432t433o434f435 436a437l438l439 440I441D442 443n444u445m446b447e448r449 450p451o452s453i454t455i456o457n458s459 460t461h462a463t464 465h466a467v468e469 470w471o472r473d474s475 476m477e478e479t480i481n482g483 484t485h486e487 488c489r490i491t492e493r494i495a496 497p498p499e500a501r502 503s504y505s506t507e508m509a510t511i512c513a514l515l516y517.518
  1. How many letters are in the document?

    • There are 518 letters in the document.
  2. At what character position(s) does the word “document” appear?

    • The word “document” appears at character positions 115-122.
  3. At what character position do non-letter glyphs like comma or parenthesis appear?

    • Non-letter glyphs appear at the following character positions: 003, 012, 024, 036, 048, 060, 072, 084, 096, 108, 120, 124, 132, 144, 156, 168, 180, 192, 204, 216, 228, 240, 252, 264, 276, 288, 300, 312, 324, 336, 348, 360, 372, 384, 396, 408, 420, 432, 444, 456, 468.
  4. What is the total count of non-letter glyphs?

    • The total count of non-letter glyphs is 40.

Using the above and another technique of building a positional dictionary, and structured output to mask the precursors, a dumb meme is expanded on:

user:

#Target questions

Using your letter counting techniques, how many letter “r” are in the document?

#Input document

‘’’
Aurora moved through the strawberry field, the early morning mist still clinging to the leaves. Her fingers brushed against the vibrant red berries, each one plump and dewy from the night’s cool air. The scent of the earth and sweet strawberries filled her senses as she carefully plucked each fruit, dropping them into the woven basket at her side. The sun slowly began to rise, casting a golden glow over the rows of plants, and Aurora felt a quiet contentment in the simple, rhythmic task. The field stretched endlessly before her, a sea of green vines and red gems, and she knew she’d be out here for hours, but she didn’t mind. There was peace in this, a world away from the noise of the village.

As she worked, Aurora’s mind wandered, lost in memories of her grandmother, who had taught her how to care for the plants. She could still hear her voice, soft but firm, explaining the importance of patience and gentleness. “You don’t just pick the strawberries,” she had said. “You listen to them.” Aurora had laughed at the idea then, but now, years later, she understood what her grandmother meant. Each berry seemed to tell its own story—the small, tart ones, the overripe, sweet ones, each with a place in the cycle of the field. By midday, her basket was heavy, and her heart was light, filled with the quiet wisdom the land had shared with her once again.
‘’’

gpt-4o (gpt-4o-mini started making patterns instead of facts)

Referring to the output document and the letter count report, there are 97 occurrences of the letter “r” in the document.

Intermediary details

{
001: 003,
002: 005,
003: 016,
004: 028,
005: 033,
006: 034,
007: 096,
008: 107,
009: 120,
010: 129,
011: 136,
012: 140,
013: 141,
014: 172,
015: 177,
016: 185,
017: 194,
018: 203,
019: 216,
020: 229,
021: 232,
022: 233,
023: 238,
024: 251,
025: 256,
026: 265,
027: 270,
028: 283,
029: 288,
030: 291,
031: 300,
032: 307,
033: 318,
034: 329,
035: 334,
036: 337,
037: 342,
038: 347,
039: 370,
040: 373,
041: 407,
042: 418,
043: 429,
044: 431,
045: 434,
046: 437,
047: 444,
048: 453,
049: 457,
050: 460,
051: 471,
052: 472,
053: 475,
054: 478,
055: 493,
056: 498,
057: 503,
058: 508,
059: 519,
060: 520,
061: 523,
062: 524,
063: 527,
064: 536,
065: 537,
066: 544,
067: 551,
068: 552,
069: 557,
070: 558,
071: 561,
072: 566,
073: 575,
074: 576,
075: 579,
076: 588,
077: 595,
078: 596,
079: 603,
080: 607,
081: 613,
082: 625,
083: 626,
084: 629,
085: 641,
086: 642,
087: 653,
088: 655,
089: 661,
090: 662,
091: 665,
092: 666,
093: 669,
094: 670,
095: 673,
096: 679,
097: 680
}