Title says it, I’d like to know what version of GPT the latest text embedding model uses behind the scenes: GPT-3, GPT-3.5, GPT-4?
The prior ada, babbage, curie, and davinci embeddings models based on GPT-3 used 1024, 2048, 4096, and 12288 dimensions respectively. There were many specialized versions of them tuned for tasks.
The new text-embedding-ada-002 model uses a unique 1536 dimensions, which is one-eighth the size of davinci-001 embeddings.
It also uses the cl100k-base token embedding system of chat models 3.5 and 4.
Released late 2022, it is newer than when the initial secret training of gpt-4 was finished.
Nothing else is disclosed about “what it is”. You use it, you get out a semantic vector. Performance is rated between the babbage and davinci versions of embeddings depending on the evaluator.
More gpt-3 model info, all to be turned off in January:
The sizes of ada, babbage, curie, and davinci named versions of GPT-3 are 350 million, 1.3 billion, 6.7 billion, and 175 billion parameters respectively. They correspond to the GPT-3 models with the same number of parameters in the initial “few-shot learners” paper. The eight different sizes of GPT-3 models discussed in the paper are:
- GPT-3-small: 125 million parameters
- GPT-3-medium: 350 million parameters (same as ada)
- GPT-3-large: 760 million parameters
- GPT-3-xl: 1.3 billion parameters (same as babbage)
- GPT-3-2.7B: 2.7 billion parameters
- GPT-3-6.7B: 6.7 billion parameters (same as curie)
- GPT-3-13B: 13 billion parameters
- GPT-3-175B: 175 billion parameters (same as davinci)
Thank you for your reply. So what model is actually behind it is not disclosed?
Yes, it seems that the specific model behind text-embedding-ada-002
is not publicly disclosed. It’s a bit disappointing, isn’t it? However, it’s fascinating to know that it uses elements from chat models 3.5 and 4. Do you have any insights on why OpenAI might be keeping this information under wraps?
API
The position is mostly “proprietary” = AI is dangerous and only we can handle its capabilities, and need trade secrets.
Same with the replacement -instruct
model, base completion davinci-002
and babbage-002
, or fine-tune on top of gpt-3.5-turbo
= no hint of what kind of reinforcement learning or training techniques these have, what the models actually are as far as parameters or corpus size, or what to expect from what would normally be a model card.
I think that takes a negative view of things. Under the system we have to work with, large amounts of capital are required to push a resource heavy technology forwards, the only way to achieve the levels of investment required to do this effectively is to offer investors IP with value. You can’t generate the billions needed and the 100’s of billions more that will be required to continually move to AGI with Open Source. I’m a huge OS advocate and I contribute to and use OS all the time, but it’s sadly not a suitable framework for technology of this complexity.
GPT-4 Technical Report
Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.
(GPT-4 experimentation follows)
Indeed, and I believe both are valid points, like most people who spend a lot of time around the models, I don’t consider GPT-4 to be a risk, but wanting to use common sense when it comes to risk management is not a bad thing.