Tokenization fun: long character strings of only 1 token, for prompt and code formatting and more

ChatGPT’s tokenizer optimizes many long series of characters into a single token. We can craft some radical character reductions out of them.

For example, a section header only four tokens long, picking a long sequence including a final carriage return:

################################################################################
 instructions----------------------------------------------------------------------
################################################################################

This makes for some interesting code compression techniques.
You can make prompt sections or code comments fed to the AI abundantly clear at little expense.

I expect that lower token numbers, being more common, will have more semantic meaning such as β€œsection divider”.

List of longest tokens, some of no interest removed.
Notes:

  • infer the length of spaces not seen; below 80, there’s a token for every series of only spaces
  • a few begin with an unpictured free carriage return
number len token characters
058041 128 ’ ’
090281 114 β€˜//----------------------------------------------------------------------------------------------------------------’
067308 113 ’ ----------------------------------------------------------------------------------------------------------------’
057443 098 β€˜//------------------------------------------------------------------------------------------------’
060910 097 ’ ------------------------------------------------------------------------------------------------’
087645 097 β€˜/************************************************************************************************’
061730 096 β€˜////////////////////////////////////////////////////////////////////////////////////////////////’
099575 096 β€˜------------------------------------------------------------------------------------------------’
087867 095 ’ ’
080972 091 ’ ’
066372 087 ’ ’
052576 083 ’ ’
051662 082 β€˜//--------------------------------------------------------------------------------’
067001 082 ’ =================================================================================’
088668 082 β€˜//------------------------------------------------------------------------------\n\n’
094783 082 ’ ******************************************************************************/\n\n’
095404 082 β€˜//================================================================================’
099421 082 β€˜////////////////////////////////////////////////////////////////////////////////\n\n’
037814 081 β€˜/*******************************************************************************\n’
040474 081 ’ --------------------------------------------------------------------------------’
046915 081 β€˜//------------------------------------------------------------------------------\n’
059970 081 β€˜////////////////////////////////////////////////////////////////////////////////\n’
076733 081 β€˜/********************************************************************************’
077651 081 ’ ********************************************************************************’
080472 081 ’ ******************************************************************************/\n’
080504 081 β€˜*******************************************************************************/\n’
086100 081 β€˜/******************************************************************************/\n’
091831 081 β€˜################################################################################\n’
029327 080 β€˜////////////////////////////////////////////////////////////////////////////////’
041587 080 β€˜################################################################################’
044550 080 β€˜--------------------------------------------------------------------------------’
045243 080 β€˜//-----------------------------------------------------------------------------\n’
054297 080 β€˜/******************************************************************************\n’
062794 080 β€˜********************************************************************************’
064495 080 β€˜================================================================================’
077838 080 β€˜///////////////////////////////////////////////////////////////////////////////\n’
080549 080 β€˜###############################################################################\n’
086886 080 ’ ******************************************************************************\n’
052915 079 ’ -----------------------------------------------------------------------------\n’
069233 079 β€˜/*****************************************************************************\n’
080039 079 β€˜//----------------------------------------------------------------------------\n’
083150 079 ’ =============================================================================\n’
095565 079 β€˜//---------------------------------------------------------------------------\n\n’
018499 078 β€˜//----------------------------------------------------------------------------’
058408 078 β€˜//---------------------------------------------------------------------------\n’
059007 078 β€˜/*----------------------------------------------------------------------------’
059108 078 β€˜/****************************************************************************\n’
064639 078 ’ ----------------------------------------------------------------------------\n’
065327 078 β€˜//****************************************************************************’
084995 078 β€˜/////////////////////////////////////////////////////////////////////////////\n’
090419 078 ’ ============================================================================\n’
023382 077 β€˜/****************************************************************************’
024794 077 ’ ----------------------------------------------------------------------------’
029745 077 ’ ****************************************************************************’
062016 077 β€˜#----------------------------------------------------------------------------’
062351 077 ’
072089 077 β€˜/***************************************************************************\n’
079858 077 ’ ---------------------------------------------------------------------------\n’
087301 077 ’ ############################################################################’
023152 076 β€˜----------------------------------------------------------------------------’
028283 076 β€˜////////////////////////////////////////////////////////////////////////////’
033142 076 β€˜############################################################################’
034619 076 β€˜(76 asterisks)’
043181 076 ’
099043 076 ’ --------------------------------------------------------------------------\n’
017995 075 ’ **************************************************************************’
026973 075 β€˜--------------------------------------------------------------------------\n’
072609 075 ’ //////////////////////////////////////////////////////////////////////////’
081651 075 ’ -------------------------------------------------------------------------\n’
039469 074 ’ =========================================================================’
066181 074 β€˜//************************************************************************’
092016 074 ’ -------------------------------------------------------------------------’
097682 074 ’ ------------------------------------------------------------------------\n’
098106 074 ’ /************************************************************************’
010758 073 ’ ************************************************************************’
011625 073 β€˜/************************************************************************’
094959 073 ’ ########################################################################’
005714 072 β€˜************************************************************************’
036210 072 β€˜////////////////////////////////////////////////////////////////////////’
070162 072 β€˜########################################################################’
090795 072 ’ ----------------------------------------------------------------------\n’
063218 071 β€˜----------------------------------------------------------------------\n’
096281 071 ’ //////////////////////////////////////////////////////////////////////’
054365 070 β€˜----------------------------------------------------------------------’
044301 068 β€˜////////////////////////////////////////////////////////////////////’
036720 067 ’ //----------------------------------------------------------------’
068650 067 ’ /*----------------------------------------------------------------’
087094 067 ’ //================================================================’
010090 066 β€˜//----------------------------------------------------------------’
016564 066 ’ =================================================================’
024037 066 β€˜//================================================================’
030966 066 β€˜/*----------------------------------------------------------------’
045539 066 β€˜/*================================================================’
056364 066 β€˜//****************************************************************’
078250 066 ’ *----------------------------------------------------------------’
100090 066 ’ /****************************************************************’
008634 065 ’ ----------------------------------------------------------------’
020767 065 β€˜/****************************************************************’
023090 065 ’ ****************************************************************’
049598 065 ’ ################################################################’
073105 065 β€˜#================================================================’
076611 065 β€˜#----------------------------------------------------------------’
082143 065 β€˜//--------------------------------------------------------------\n’
087173 065 ’ (65 periods)’
003598 064 β€˜----------------------------------------------------------------’
004170 064 β€˜(64 asterisks)’
008316 064 β€˜================================================================’
010024 064 β€˜////////////////////////////////////////////////////////////////’
013368 064 β€˜################################################################’
043370 064 β€˜(64 periods)’
048033 064 β€˜________________________________________________________________’
080619 064 β€˜%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%’
071763 063 ’ ==============================================================’
087733 061 ’ ------------------------------------------------------------’
065676 060 β€˜////////////////////////////////////////////////////////////’
098945 060 β€˜############################################################’
066296 057 ’ ********************************************************’
067260 057 β€˜/********************************************************’
054248 056 β€˜********************************************************’
062009 056 β€˜////////////////////////////////////////////////////////’
091139 056 β€˜########################################################’
083946 052 β€˜////////////////////////////////////////////////////’
067105 051 ’ //------------------------------------------------’
029686 050 ’ =================================================’
031990 050 β€˜//------------------------------------------------’
074163 050 β€˜/*------------------------------------------------’
077608 050 β€˜//================================================’
018528 049 ’ ------------------------------------------------’
068262 049 ’ ################################################’
072937 049 ’ ************************************************’
079472 049 β€˜/************************************************’
096048 049 ’ \n’
009412 048 β€˜------------------------------------------------’
019312 048 β€˜================================================’
028506 048 β€˜////////////////////////////////////////////////’
030955 048 β€˜################################################’
047677 048 β€˜************************************************’
079094 045 ’ \n’
089905 042 ’ \n \n’
057697 041 ’ \n’
062461 041 ’ ****************************************’
074062 041 β€˜/****************************************’
041173 040 β€˜****************************************’
083679 040 β€˜########################################’
046908 037 ’ \n’
061710 035 ’ //////////////////////////////////’
074639 035 ’ //--------------------------------’
082109 035 ’ __________________________________’
036875 034 ’ =================================’
049357 034 β€˜//--------------------------------’
060750 034 ’ \n \n’
079447 034 β€˜("--------------------------------’
020309 033 ’ --------------------------------’
023815 033 ’ ********************************’
026343 033 β€˜/********************************’
034742 033 ’ \n’
055402 033 ’ ################################’
082473 033 ’ (32 periods)’
001435 032 β€˜--------------------------------’
001725 032 β€˜********************************’
003135 032 β€˜================================’
003986 032 β€˜////////////////////////////////’
005135 032 β€˜################################’
016972 032 β€˜(32 periods)’
017925 032 β€˜________________________________’
034110 032 β€˜%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%’
066749 032 β€˜~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~’

validate your one token sequence by clearing this tokenizer until it reads 0 tokens, then paste.

Actual token numbers used internally are one lower, since I didn’t start at 0…

1 Like

It’s really interesting to see that the same character at different number of repetitions have a unique token number. I wonder what prompted this during the encoding…

The tokens are generated from coding efficiency on training corpus, along with others manually built (like all numbers up to 999).

Just programmers doing programmer stuff:

This is how I’m going to define any quirks in my systems from now on XD

1 Like

List of longest tokens, some of no interest removed.

They’re all interesting to me, could you share the longest ones?