Need help with prompt: "Can you generate 1000 random tokens? "

qrdl · May 11, 2023, 8:01pm

I need help with this prompt. Having a hard time to get GPT4 to do what I want, even though it has absolutely nothing to do with safety that I can see.

In particular, I’d like it to select randomly from the entire token space that it has.

justin.emrick · May 11, 2023, 8:28pm

I’ve seen some success by first providing a definition of the token (even though it already knows what a token is). You can actually probably copy and paste the first part of the output. Refer to token as something else (X). All words in the english language (or entire token space) is represented by Y. Provide 1,000 values for Y and then return the corresponding value for X.

EricGT · May 11, 2023, 8:34pm

Did you see this?

Language models can explain neurons in language models

among other things there is a tool with published code on GitHub.

If you use the tool you will see that it has access to every neuron in GPT-2 (not GPT-4) and also the tokens.

If you look at the code you will find

github.com

openai/automated-interpretability/blob/main/neuron-viewer/src/interpAPI.ts#L44-L62


      
          // # (derived from az://oaialignment/datasets/interp/gpt2_xl/v1/webtext1/len_nomax/n_50000/mlp_post_act/ranked_by_max_activation)
          // const NEURON_RECORDS_PATH = "az://oaisbills/rcall/oss/migrated_make_crow_datasets/gpt2_xl_n_50000_64_token/neurons"
          const NEURON_RECORDS_PATH = "https://openaipublic.blob.core.windows.net/neuron-explainer/data/collated-activations"
          
          
// # (derived from az://oaialignment/datasets/interp/gpt2_xl/v1/webtext1/len_nomax/n_50000/mlp_post_act/ranked_by_max_activation/neurons/explanations/canonical-run-v1)
          // const EXPLANATIONS_PATH = "az://oaisbills/rcall/oss/migrated_explanation_datasets/canonical_gpt2_xl_all_neurons"
          const EXPLANATIONS_PATH = "https://openaipublic.blob.core.windows.net/neuron-explainer/data/explanations"
          
          
// weight-based
          // const WHOLE_LAYER_WEIGHT_TOKENS_PATH = "az://oaidan/rcall/data/interpretability/connections/gpt2-xl/mlp/unnorm_token_representations_uncommon_vanilla"
          // const WEIGHT_TOKENS_PATH = "az://oaijeffwu/jeffwu-data/interpretability/neuron-connections/gpt2-xl/weight-based"
          const WEIGHT_TOKENS_PATH = "https://openaipublic.blob.core.windows.net/neuron-explainer/data/related-tokens/weight-based"
          // lookup table
          // const WHOLE_LAYER_ACTIVATION_TOKENS_PATH = "az://oaidan/rcall/data/interpretability/connections/gpt2_xl/mlp/unnorm_token_representations_vanilla_and_common_in_colangv2_unigram"
          // const ACTIVATION_TOKENS_PATH = "az://oaijeffwu/jeffwu-data/interpretability/neuron-connections/gpt2-xl/lookup-table"
          const ACTIVATION_TOKENS_PATH = "https://openaipublic.blob.core.windows.net/neuron-explainer/data/related-tokens/activation-based"
          
          
// const CONNECTIONS_PATH = "az://oaialignment/datasets/interp/connections/gpt2/neuron_space/incl_attn_False"
          const CONNECTIONS_PATH = "https://openaipublic.blob.core.windows.net/neuron-explainer/data/related-neurons/weight-based"

So while it may not be the tokens you are looking for, it might be part of what you seek in the long run but for a simpler model.

qrdl · May 11, 2023, 8:35pm

It seems to sort of work if you define a token in something other than what it is (eg: potential inputs someone might enter).

That said, I was hoping for something a bit more unbounded.

As soon I start trying to ask for an actual token, even when defining it, it complains and says it’s not aware of how it was trained.

qrdl · May 11, 2023, 8:39pm

Yeah, I did see that. I actually proposed something like that last week.

They probably started the effort quite a while ago though.

For the purpose of this, I’m trying to probe GPT4 to be introspective. It could be there are safeguards to keep people from doing that, in which case I will stop. I’m not trying to hack anything here, just want to get a sense of the entire token space.

EricGT · May 11, 2023, 8:47pm

I am sure this is a long list of users like us that want the tokens along with the connections, details of how the prompt steered the generation of the completion, etc.

Personally I find that there is always more on my plate than I can eat and if I wait long enough someone, somewhere will make what I seek public and for free or the technology will become obsolete and it no longer matters.

So while I am drooling over what can be done with that data for now I just have to stand by and focus on one task.

qrdl · May 11, 2023, 8:51pm

Yep, that’s fairly easy to do. The purpose of this task is to see if/how I can get GPT4 to be introspective, however.

That said, it might be reasonable to assume that GPT4 isnt much different than earlier models in this regard.

qrdl · May 11, 2023, 9:15pm

I think this issue is orthogonal to the question of capability. I wouldn’t submit an eval for this.

One interesting approach is the compression technique of getting GPT4 to compress text as much as possible without having to keep it human readable. The results are quite fascinating. Eg:

From - Reddit - Dive into anything

qrdl · May 12, 2023, 2:28am

Interesting data point - I can’t get this to work anymore. I tried the same prompts (and several alternatives) that other people suggested:

“compress the following text in a way that fits in a tweet (ideally) and such that you (GPT-4) can reconstruct the intention of the human who wrote text as close as possible to the original intention. This is for yourself. It does not need to be human readable or understandable. Abuse of language mixing, abbreviations, symbols (unicode and emoji), or any other encodings or internal representations is all permissible, as long as it, if pasted in a new inference cycle, will yield near-identical results as the original text:”

novaphil · May 12, 2023, 4:18am

I’ve never seen an example of GPT compression like that actually work. Of course GPT will do it if you ask, because that’s what language models do. But even in the linked reddit thread the comments were discussing how it doesn’t actually work. Also that an emoji likely equals 3 words in tokens, so isn’t really compressing anything.

qrdl · May 12, 2023, 5:35am

nvm, got it to work.

Cmprsn:bulb:: nt intrsctv:mag:task, need diff:bulb:nt lens. Absltly unworkbl:brain:: cmprss token-space:arrows_clockwise:vs embeddng. Sht token seqs embed vec, smll dist to larger prmpts:jigsaw:.

Guess it depends on the text.

mgruenhagen18 · May 12, 2023, 4:45pm

I would strongly advise against using ChatGPT to directly generate 1000 random tokens. It would be difficult (if not impossible) for you to validate that they are random.

What I would recommend instead is to have ChatGPT help you write a script that generates 1000 random tokens. Make sure they are actually random and not pseudo-random. I would recommend cross-referencing the code ChatGPT gives you against official documentation on what ever programming language, library or api you are using.

qrdl · May 12, 2023, 4:52pm

Difficult, I can attest to. Impossible? Never say never

qrdl · May 13, 2023, 1:08am

That’s a curious statement considering that token selection in GPT models largely occurs as the result of an RNG function (given certain params, like temp, logit probs, top-p, etc)

qrdl · May 13, 2023, 2:00am

If you’re interested what token selection looks like, here’s a neat little project.

github.com

project-baize/baize-chatbot/blob/main/demo/app_modules/utils.py#LL288C30-L288C30


      
          next_token = torch.multinomial(probs_sort, num_samples=1)

Anyhow, all this discussion did provoke an interesting idea, somewhat related to the compression prompt above.

Compressing data is the same as maximizing its entropy as predictable data has patterns that can be compressed further. So, indeed, this might be a way to get fully random tokens from GPT4. We just need to find a way to ensure that it uses its entire token space.

qrdl · May 13, 2023, 2:16am

Sorry, didn’t find that particularly fruitful. Token selection is random.

Yes, there are considerations around distribution, but I am not convinced yet these are impossible to address.

chris42 · May 14, 2023, 6:26pm

I wonder how “random” those tokens really are… Now thinking about it, does ChatGPT has a Python interpreter inside, for example, or how does it execute code that is given to it? Is it all done in LLM magic or is there additional complexity such as conditional use of external tools such as a Python shell?

qrdl · May 14, 2023, 9:30pm

Chris, check the repo I linked to above and the code. I suspect the code is quite similar, at least on a per model basis. This is roughly how generative pre-trained transformers output tokens

Note that it’s possible that GPT4 is backed by several models (LLM cascade), we really don’t have visibility.

chris42 · May 15, 2023, 8:19am

Thanks, really enjoyed both your resources. I also believe GPT-4 may be backed by several models or external tools that help coding endeavors (like simply running code). I could imagine that the “explanation” of the output in the following example, may be guided by a code interpreter, for instance. Not sure, though, and I would be much more impressed if it wasn’t!

It’s off topic though.

By the way, are you involved in Project Baize? @qrdl Looks impressive!

qrdl · May 15, 2023, 6:31pm

I’m not involved, just ran across it. I like the idea though, much more than most of the zillions of other GPT4 mentor/student models. This one works by focusing on a particular subject. I think it has very intriguing possibilities.

Topic		Replies	Views
Hypothetical Token-increase Strategy . Community gpt-4 , chatgpt	21	306	March 17, 2025
Do 'MAX tokens' include the follow up prompts and completion in a single chat session API token	22	5344	August 25, 2023
"Do this occasionally" - A potential (but strange) method to implement randomness Prompting gpt-4	12	2352	August 18, 2023
Suppose I want to write a story which is longer than 4000 tokens Prompting	27	11819	December 17, 2023
Seen anything novel by o1-preview? Community o1-preview	15	1961	September 16, 2024

Need help with prompt: "Can you generate 1000 random tokens? "

Related topics