Hi, are the token embeddings learned together with the other weights in the codex (or indeed in the gpt3) model? I couldn’t find a description of how the embeddings arise in the codex/gpt papers, but i may have missed it in the references or something. Any links to further info on this would be much appreciated. Thanks.
1 Like