What happens when we directly feed text to gpt4? Embedding or Tokenization?

I am a little confused when tokenization comes in the picture and when embedding actually comes in.
I understand the difference b/w the both. Embedding brings the relationship b/w tokens.

My question is when we feed text to chatcompletions API via prompts. We don’t use embedding API and directly feed in. Is it creating embedding on the backend by itself?
What about tokenization then?