Text embeddings vs word embeddings

I have the following question which I don’t think is well-documented in the documentation:
The OpenAI text embedding models such as text-embedding-ada-002 takes a string of text as input and outputs a single embedding vector. However, I don’t think this is a step in training a transformer. Instead, in transformer training, each individual token is encoded and self-attention is applied across all tokens. This procedure doesn’t seem to include any sentence-level embedding. So how is the text embedding trained?

If you google “train sentence embeddings” you will find a lot of information.

But think of the embedding, at a high level, as a dimensionally reduction technique (as opposed to something high dimensional like one-hot encodings).

Also, you can take the embedding vector, and feed it into your own neural network for further insights. But most are happy with just correlating the vectors for similarity of meaning.