Language models can explain neurons in language models

While the OpenAI paper is interesting

what is more interesting is the GitHub code and tools associated with the paper

Automated interpretability

and a tool for viewing neuron activations and explanations: Neuron Viewer.


Good eye. Thanks for sharing.

My toread list never ends…

The problem is that if the list grows to the point it takes more time than in a day to read we will have to move to a planet with a longer day; don’t forget your towel.

Ah, yes, the Time Enough at Last problem! :wink:

Hope your week is going well. Working on anything cool?


A Twilight Zone fan. :slightly_smiling_face:

Makes we wonder if there is a Twilight Zone GPT?

From this post

The method I am using to accomplish this is to use Bootstrapping with T-Diagrams - Computerphile - Tombstone diagram

Since I have not seen this technique used/noted for prompt engineering I am going to name it “Prompt Bootstrapping”.