A single attention heads output

struebbe79 · April 17, 2023, 9:25pm

Hello,
I am trying to understand the attention mechanism and have some basic questions.

Is the output of one single attention head a vector or a scalar? (or even matrix?)
There is a softmax function over a matrix in the attention head. Is the output of this softmax function a vector or scalar? Over which dimension goes the softmax, if the output is a vector?

Topic		Replies	Views
Mathematics behind GPT3 - Masked Multihead Self Attention API	1	1710	October 13, 2021
Positional encoding and changed order of input tokens API	0	229	January 26, 2024
Embedding tokens vs embedding strings? API	12	7619	February 11, 2024
GPT without positional encoding API	4	847	April 17, 2023
Text embeddings vs word embeddings API embeddings	1	2480	September 4, 2023