I found there is an API for text embeddings, can I use this embedding as an input for GPT? I’m not sure if this is possible.
The output of the embeddings endpoint is a very large vector
{
"object": "embedding",
"embedding": [
0.0023064255,
-0.009327292,
-0.0028842222,
(1500 more dimensional values
],
"index": 0
}
This is useful for comparing algorithmically with other embeddings results to find how similar the language sent is. This can be employed not just for scoring, but also for searching and information retrieval.
As you can imagine, the AI can make no sense of a long list of numbers.
You can use this technology to, for example, make an actions API that returns AI-powered knowledge database. However, furnishing that to public GPTs means you have to pay the bill.
Thank you for your reply.
I’m sorry, I made a misunderstanding.
I know normal GPT cannot accept embeddings as an input. I meant to say does OpenAI provide other APIs that can take text embeddings as an input?
What I want to do is I want to encode the image to text context and give API images as text embeddings.
You would use GPT-4-Vision to generate the text description of the image.
Then use ada-002 to embed the above text to create your vector.
So a 2-step process.
This is an entirely valid question that many AI researchers are asking. It’s the holy grail for real understanding, and like many things it’s more complicated than it seems.
This is misleading. All operations in a neural networks are ‘long list of numbers’ - also known as high-dimensional vectors.
tokens are a hacky and ugly midpoint to vectors. The question shows insight.
This is useful for comparing algorithmically
That’s because vectors are representations of meaning and they are how neural networks ‘think’ - the weights are fixed, the vectors do the work.
Not applicable to someone asking in a roundabout method, “how can I make a ChatGPT GPT do xxx”, though.
“Embeddings” is being used ambiguously, like “stick some data in somewhere”, when it should be clear that it has a very distinct meaning in natural language AI processing.
It’s probably semantics, but embeddings, or a list of numbers, as input to a language model generally results in gibberish.
We get this question a lot, and @_j is right, basically the input needs to be text, but yes, the internals are numbers.
The OP want’s to encode an image to text and then form an embedding. So it’s a 2-step process.
First have an AI model describe what is in the image and output text. Then take the output text and embed it to create the embedding vector.
What is the use-case behind converting your image to text and then to embeddings ?
we can directly integrate text and multiple images
if this is possible, we don’t have to give them to GPT4 separately
GPT4 can understand text and images without connecting them manually by texts
For example, now we have to specify text and images with “first image is …” or “second images is …”
You can’t pass embeddings. They would be destroyed by tokenisation and no longer embeddings if you were silly enough to try.
The OP want’s to encode an image to text and then form an embedding. So it’s a 2-step process.
This is very fascinating, cool stuff. I do the same kind of things with my own and open source models, where you can stay in the latent and embedding space.
Yep, clarity is key. How you think about things is pretty much the key to success with llms and neural networks.