I’ve read about embeddings, and from what I’ve seen it is always a “text =(model)=> vectors” direction. Is it possible to feed the model’s prompt with vectors, in order to save some tokens?
IMHO it looks like what fine-tuning is doing, however fine-tuning seems to be a offline operation, while I would like to provide the vectors during a live-prompt.
Is that something possible, in practice or even in theory?
There’s just one problem: GPT can’t do the required math.
do you mean the LLM model, 3.5-turbo or gpt-4? they are trained on tokens as input, not sure what value a vector would be to the model. It speaks tokeneese, you are proposing a conversation in vectorish.
The vectors are quite big usually. For example, ada-002 embeddings are 1536 floats. Supposing each float is at least 4 tokens, this is 6000 tokens! Not much compression.
Besides using embeddings in the traditional sense, you could take the embedding vectors, and use them as an input to your own neural network. Usually a simple feed-forward network is a place to start, where the input is the vector of 1536 floats (or whatever your embedding dimension is) and you have however many hidden layers, ending with your final output layer. So if you create a binary classifier, your output layer has dimension 2.