Upload text data like article, product description, company rules into OpenAI (API)
A Custom Chatbot replies based on Document uploaded. (API)
I have study this forum and OpenAI documents.
i am not sure I get the idea right, which is
Upload text doc as Embeddings
But
I have no idea how the Prompt and Completion are refer to my embeddings(which is a list of numbers).
in API document, there is no such parameter for Completion, that include Document Embeddings.
Embeddings are not used in this manner to my knowledge.
Training data should used to fine-tune a model is actual text, not embedding vectors.
I have not seen any example by OpenAI where fine-tuning is accomplished by using embedding vectors as the values in the prompt, completion, key values pairs, @zhihong0321
If you have an OpenAI reference which demonstrates this approach, please share the reference.
(Text) Embeddings are numerical representations of text, represented as a (unit) vector (using the OpenAI API). This vector can be tested against another vector using linear algebra, commonly the “dot product” (among other methods), and ranked numerically. Embedding vectors are used for search, classification, etc.
OpenAI Fine-Tuning is the process of training an OpenAI model to change it’s generative output text based on the input text.
If you want to build a custom chatbot to reply to a prompt with your company data, you might need fine-tuning.
If you want to just search a DB and return the required text based on a semantic match, you can use embeddings.
Many well developed applications with use both and they will search their DB for a good match using either a full-text DB search (for short phrases and keywords) or a vector-based semantic search (for larger strings and text), and then if no high scoring reply is found, then query a GPT model for a reply.
Then, take the GPT reply (if it meets a certain criteria) add it do the DB (generate a vector for it) and then in the future that reply can be matched via the same search process as above.
The key is to have a DB of text and for each row of text in the DB to have an embedding vector. In that way, the DB can be searched by vectorizing the search text and then taking the “dot product” (for example) between the vectors in the DB and the search term vector, then ranking the results and picking the highest ranked match (if that is what you want).
Hi sir!
May I know where to find a project or github that uses this method. I’ve thought of almost the same concept for my case but I still can’t implement it and need direction to do it. thank you sir