In the early days of completion models, we used the terms “prompt” and “completion”. Like, a human prompter or teleprompter, the “prompt” guided the LLM to complete the prompt, hence the term “completion”.
While “prompt” is still predominantly used, I’ve noticed OpenAI and other LLM companies use both “completion” and “response” interchangeably.
As someone who is teaching AI concepts, I’d like to be as accurate as possible. Are they synonyms, or is “response” more specific to a type of completion… or something else?
My perspective only from reading the docs and discussing such topics in the models is that:
Completion is a sort of “industry-specific term” that relates actually more to how the internal architecture of the transformer model of LLM actually works. Which is that it’s “completing the statistical probability model of a response” for your prompt. It’s also “completing your “call” to the model” (in the backend, these things look like you SEND a REQUEST and GET a COMPLETION for that request (i.e. the request is “completed” once you receive the “response”).
Response is a general term, which is intuitive and easy to understand, though perhaps just slightly off in a technical way from what’s actually happening in a sense when you are working with an LLM. However, that’s really sort of nit picky in my opinion, and that absolutely the API RESPONSE is the same thing as an API COMPLETION in a general-understanding sort of way.
The documentation does seem to change and evolve in terms of the terminology being used at a programmatic level, but overall I definitely perceive the terms of being interchangeable, kind of more dependent a bit on who your talking to and what your talking about more so than anything else!
Yes, agreed that “completion” has been used as “complete the statistical probability given the initial prompt” in the past. That was certainly true when the first completion models (now considered legacy) came out 3 years ago. I’ve been using that term with my students.
But as I teach other LLMs and API abstractions, I’m now seeing “response” come up more as the result from an LLM. To make it more confusing and harder to teach, OpenAI has a completions API and a responses API.
I’m hoping OpenAI is not using the standard Microsoft trick of usurping technical definitions for marketing. As teachers, we have to be accurate. Marketing people don’t have to be.
More accurately: Before ChatML was introduced, LLMs such as Davinci were “Completion Models”. They would finish whatever input was provided.
Then, ChatML was introduced 1. It can be considered a structured wrapper. It prevented prompt injections, emitting retrieving training data, allowed for a more structured response, and as a result helped with alignment. So, “Completions” was updated to “ChatCompletions”, indicating the underlying structure used.
Now, “Completion” mdoels are non-existent. Everybody uses the ChatML format because, realistically, 99% of people use the models in a chat format anyways, and the completion format provides too much control.
So, they shouldn’t be used interchangeably. Completions is a historic term. Responses is a more modern & accurate with current models.
Yes, I understand the OpenAI view, but it is inconsistent with some other LLM providers who use “completion” (and some use “response”). Mistral uses “completion” for their MoE/CoT models. Ollama uses “completion”. LangChain uses “Response”.
Of course, with the newer OpenAI Response API, the answer coming back is not just a “completion” wrt statistical completion, but it could be the result of a collection of completions, function calls, etc. Other vendors have similar newer APIs.
So perhaps from a teaching pov, we can say “completions” are statistical results from a model and “responses” are results from a GenAI service (which may be a collection of model completions, external tool calling, etc.