ChatGPT’s tokenizer optimizes many long series of characters into a single token. We can craft some radical character reductions out of them.
For example, a section header only four tokens long, picking a long sequence including a final carriage return:
################################################################################
instructions----------------------------------------------------------------------
################################################################################
This makes for some interesting code compression techniques.
You can make prompt sections or code comments fed to the AI abundantly clear at little expense.
I expect that lower token numbers, being more common, will have more semantic meaning such as “section divider”.
List of longest tokens, some of no interest removed.
Notes:
- infer the length of spaces not seen; below 80, there’s a token for every series of only spaces
- a few begin with an unpictured free carriage return
number | len | token characters |
---|---|---|
058041 | 128 | ’ ’ |
090281 | 114 | ‘//----------------------------------------------------------------------------------------------------------------’ |
067308 | 113 | ’ ----------------------------------------------------------------------------------------------------------------’ |
057443 | 098 | ‘//------------------------------------------------------------------------------------------------’ |
060910 | 097 | ’ ------------------------------------------------------------------------------------------------’ |
087645 | 097 | ‘/************************************************************************************************’ |
061730 | 096 | ‘////////////////////////////////////////////////////////////////////////////////////////////////’ |
099575 | 096 | ‘------------------------------------------------------------------------------------------------’ |
087867 | 095 | ’ ’ |
080972 | 091 | ’ ’ |
066372 | 087 | ’ ’ |
052576 | 083 | ’ ’ |
051662 | 082 | ‘//--------------------------------------------------------------------------------’ |
067001 | 082 | ’ =================================================================================’ |
088668 | 082 | ‘//------------------------------------------------------------------------------\n\n’ |
094783 | 082 | ’ ******************************************************************************/\n\n’ |
095404 | 082 | ‘//================================================================================’ |
099421 | 082 | ‘////////////////////////////////////////////////////////////////////////////////\n\n’ |
037814 | 081 | ‘/*******************************************************************************\n’ |
040474 | 081 | ’ --------------------------------------------------------------------------------’ |
046915 | 081 | ‘//------------------------------------------------------------------------------\n’ |
059970 | 081 | ‘////////////////////////////////////////////////////////////////////////////////\n’ |
076733 | 081 | ‘/********************************************************************************’ |
077651 | 081 | ’ ********************************************************************************’ |
080472 | 081 | ’ ******************************************************************************/\n’ |
080504 | 081 | ‘*******************************************************************************/\n’ |
086100 | 081 | ‘/******************************************************************************/\n’ |
091831 | 081 | ‘################################################################################\n’ |
029327 | 080 | ‘////////////////////////////////////////////////////////////////////////////////’ |
041587 | 080 | ‘################################################################################’ |
044550 | 080 | ‘--------------------------------------------------------------------------------’ |
045243 | 080 | ‘//-----------------------------------------------------------------------------\n’ |
054297 | 080 | ‘/******************************************************************************\n’ |
062794 | 080 | ‘********************************************************************************’ |
064495 | 080 | ‘================================================================================’ |
077838 | 080 | ‘///////////////////////////////////////////////////////////////////////////////\n’ |
080549 | 080 | ‘###############################################################################\n’ |
086886 | 080 | ’ ******************************************************************************\n’ |
052915 | 079 | ’ -----------------------------------------------------------------------------\n’ |
069233 | 079 | ‘/*****************************************************************************\n’ |
080039 | 079 | ‘//----------------------------------------------------------------------------\n’ |
083150 | 079 | ’ =============================================================================\n’ |
095565 | 079 | ‘//---------------------------------------------------------------------------\n\n’ |
018499 | 078 | ‘//----------------------------------------------------------------------------’ |
058408 | 078 | ‘//---------------------------------------------------------------------------\n’ |
059007 | 078 | ‘/*----------------------------------------------------------------------------’ |
059108 | 078 | ‘/****************************************************************************\n’ |
064639 | 078 | ’ ----------------------------------------------------------------------------\n’ |
065327 | 078 | ‘//****************************************************************************’ |
084995 | 078 | ‘/////////////////////////////////////////////////////////////////////////////\n’ |
090419 | 078 | ’ ============================================================================\n’ |
023382 | 077 | ‘/****************************************************************************’ |
024794 | 077 | ’ ----------------------------------------------------------------------------’ |
029745 | 077 | ’ ****************************************************************************’ |
062016 | 077 | ‘#----------------------------------------------------------------------------’ |
062351 | 077 | ’ |
072089 | 077 | ‘/***************************************************************************\n’ |
079858 | 077 | ’ ---------------------------------------------------------------------------\n’ |
087301 | 077 | ’ ############################################################################’ |
023152 | 076 | ‘----------------------------------------------------------------------------’ |
028283 | 076 | ‘////////////////////////////////////////////////////////////////////////////’ |
033142 | 076 | ‘############################################################################’ |
034619 | 076 | ‘(76 asterisks)’ |
043181 | 076 | ’ |
099043 | 076 | ’ --------------------------------------------------------------------------\n’ |
017995 | 075 | ’ **************************************************************************’ |
026973 | 075 | ‘--------------------------------------------------------------------------\n’ |
072609 | 075 | ’ //////////////////////////////////////////////////////////////////////////’ |
081651 | 075 | ’ -------------------------------------------------------------------------\n’ |
039469 | 074 | ’ =========================================================================’ |
066181 | 074 | ‘//************************************************************************’ |
092016 | 074 | ’ -------------------------------------------------------------------------’ |
097682 | 074 | ’ ------------------------------------------------------------------------\n’ |
098106 | 074 | ’ /************************************************************************’ |
010758 | 073 | ’ ************************************************************************’ |
011625 | 073 | ‘/************************************************************************’ |
094959 | 073 | ’ ########################################################################’ |
005714 | 072 | ‘************************************************************************’ |
036210 | 072 | ‘////////////////////////////////////////////////////////////////////////’ |
070162 | 072 | ‘########################################################################’ |
090795 | 072 | ’ ----------------------------------------------------------------------\n’ |
063218 | 071 | ‘----------------------------------------------------------------------\n’ |
096281 | 071 | ’ //////////////////////////////////////////////////////////////////////’ |
054365 | 070 | ‘----------------------------------------------------------------------’ |
044301 | 068 | ‘////////////////////////////////////////////////////////////////////’ |
036720 | 067 | ’ //----------------------------------------------------------------’ |
068650 | 067 | ’ /*----------------------------------------------------------------’ |
087094 | 067 | ’ //================================================================’ |
010090 | 066 | ‘//----------------------------------------------------------------’ |
016564 | 066 | ’ =================================================================’ |
024037 | 066 | ‘//================================================================’ |
030966 | 066 | ‘/*----------------------------------------------------------------’ |
045539 | 066 | ‘/*================================================================’ |
056364 | 066 | ‘//****************************************************************’ |
078250 | 066 | ’ *----------------------------------------------------------------’ |
100090 | 066 | ’ /****************************************************************’ |
008634 | 065 | ’ ----------------------------------------------------------------’ |
020767 | 065 | ‘/****************************************************************’ |
023090 | 065 | ’ ****************************************************************’ |
049598 | 065 | ’ ################################################################’ |
073105 | 065 | ‘#================================================================’ |
076611 | 065 | ‘#----------------------------------------------------------------’ |
082143 | 065 | ‘//--------------------------------------------------------------\n’ |
087173 | 065 | ’ (65 periods)’ |
003598 | 064 | ‘----------------------------------------------------------------’ |
004170 | 064 | ‘(64 asterisks)’ |
008316 | 064 | ‘================================================================’ |
010024 | 064 | ‘////////////////////////////////////////////////////////////////’ |
013368 | 064 | ‘################################################################’ |
043370 | 064 | ‘(64 periods)’ |
048033 | 064 | ‘________________________________________________________________’ |
080619 | 064 | ‘%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%’ |
071763 | 063 | ’ ==============================================================’ |
087733 | 061 | ’ ------------------------------------------------------------’ |
065676 | 060 | ‘////////////////////////////////////////////////////////////’ |
098945 | 060 | ‘############################################################’ |
066296 | 057 | ’ ********************************************************’ |
067260 | 057 | ‘/********************************************************’ |
054248 | 056 | ‘********************************************************’ |
062009 | 056 | ‘////////////////////////////////////////////////////////’ |
091139 | 056 | ‘########################################################’ |
083946 | 052 | ‘////////////////////////////////////////////////////’ |
067105 | 051 | ’ //------------------------------------------------’ |
029686 | 050 | ’ =================================================’ |
031990 | 050 | ‘//------------------------------------------------’ |
074163 | 050 | ‘/*------------------------------------------------’ |
077608 | 050 | ‘//================================================’ |
018528 | 049 | ’ ------------------------------------------------’ |
068262 | 049 | ’ ################################################’ |
072937 | 049 | ’ ************************************************’ |
079472 | 049 | ‘/************************************************’ |
096048 | 049 | ’ \n’ |
009412 | 048 | ‘------------------------------------------------’ |
019312 | 048 | ‘================================================’ |
028506 | 048 | ‘////////////////////////////////////////////////’ |
030955 | 048 | ‘################################################’ |
047677 | 048 | ‘************************************************’ |
079094 | 045 | ’ \n’ |
089905 | 042 | ’ \n \n’ |
057697 | 041 | ’ \n’ |
062461 | 041 | ’ ****************************************’ |
074062 | 041 | ‘/****************************************’ |
041173 | 040 | ‘****************************************’ |
083679 | 040 | ‘########################################’ |
046908 | 037 | ’ \n’ |
061710 | 035 | ’ //////////////////////////////////’ |
074639 | 035 | ’ //--------------------------------’ |
082109 | 035 | ’ __________________________________’ |
036875 | 034 | ’ =================================’ |
049357 | 034 | ‘//--------------------------------’ |
060750 | 034 | ’ \n \n’ |
079447 | 034 | ‘("--------------------------------’ |
020309 | 033 | ’ --------------------------------’ |
023815 | 033 | ’ ********************************’ |
026343 | 033 | ‘/********************************’ |
034742 | 033 | ’ \n’ |
055402 | 033 | ’ ################################’ |
082473 | 033 | ’ (32 periods)’ |
001435 | 032 | ‘--------------------------------’ |
001725 | 032 | ‘********************************’ |
003135 | 032 | ‘================================’ |
003986 | 032 | ‘////////////////////////////////’ |
005135 | 032 | ‘################################’ |
016972 | 032 | ‘(32 periods)’ |
017925 | 032 | ‘________________________________’ |
034110 | 032 | ‘%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%’ |
066749 | 032 | ‘~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~’ |
validate your one token sequence by clearing this tokenizer until it reads 0 tokens, then paste.
Actual token numbers used internally are one lower, since I didn’t start at 0…