Need Guidance about Custom Data Training. [Thanks]

zhihong0321 · February 24, 2023, 4:14pm

My Goal :

Upload text data like article, product description, company rules into OpenAI (API)
A Custom Chatbot replies based on Document uploaded. (API)

I have study this forum and OpenAI documents.
i am not sure I get the idea right, which is

Upload text doc as Embeddings
But
I have no idea how the Prompt and Completion are refer to my embeddings(which is a list of numbers).
in API document, there is no such parameter for Completion, that include Document Embeddings.

Thanks in advance.

ruby_coder · February 25, 2023, 2:04am

This is not correct.

Embeddings are not used in this manner to my knowledge.

Training data should used to fine-tune a model is actual text, not embedding vectors.

I have not seen any example by OpenAI where fine-tuning is accomplished by using embedding vectors as the values in the prompt, completion, key values pairs, @zhihong0321

If you have an OpenAI reference which demonstrates this approach, please share the reference.

Thanks so much.

zhihong0321 · February 26, 2023, 4:37am

thanks for reminder.

i tried both embedding and fine-tuning API
not quite sure the technical different

but obviously Fine-tuning is much easier with just few API calls.
but embedding is much more complex where it involved indexing.

Currently continue to testing my data via fine-tuning, see whether can I achieve my desired result.

ruby_coder · February 26, 2023, 5:06am

In a nutshell,

(Text) Embeddings are numerical representations of text, represented as a (unit) vector (using the OpenAI API). This vector can be tested against another vector using linear algebra, commonly the “dot product” (among other methods), and ranked numerically. Embedding vectors are used for search, classification, etc.
OpenAI Fine-Tuning is the process of training an OpenAI model to change it’s generative output text based on the input text.

If you want to build a custom chatbot to reply to a prompt with your company data, you might need fine-tuning.

If you want to just search a DB and return the required text based on a semantic match, you can use embeddings.

Many well developed applications with use both and they will search their DB for a good match using either a full-text DB search (for short phrases and keywords) or a vector-based semantic search (for larger strings and text), and then if no high scoring reply is found, then query a GPT model for a reply.

Then, take the GPT reply (if it meets a certain criteria) add it do the DB (generate a vector for it) and then in the future that reply can be matched via the same search process as above.

The key is to have a DB of text and for each row of text in the DB to have an embedding vector. In that way, the DB can be searched by vectorizing the search text and then taking the “dot product” (for example) between the vectors in the DB and the search term vector, then ranking the results and picking the highest ranked match (if that is what you want).

Hope this helps.

abhi3hack · June 2, 2023, 11:55am

@ruby_coder can you have a look at my query in an approach of solving a similar project
I am unable to understand how to progress in it

Byun · September 26, 2023, 5:09am

Hi sir!
May I know where to find a project or github that uses this method. I’ve thought of almost the same concept for my case but I still can’t implement it and need direction to do it. thank you sir

SomebodySysop · September 26, 2023, 7:36am

Fine-tuning vs. embedding: https://www.youtube.com/watch?v=9qq6HTr7Ocw&t=110s&ab_channel=DavidShapiro~AI

Guide to fine-tuning: https://www.youtube.com/watch?v=Oawp_McdHzU

GPT4 Tutorial: How to chat with multiple pdf files - The Chat Completion Process (R.A.G. / Embeddings)
- https://youtu.be/Ix9WIZpArm0
- https://www.youtube.com/watch?v=ih9PBGVVOO4

Topic		Replies	Views
Creating a support chat bot for my business API	4	3767	December 18, 2023
What's better for the type of chatbot I am building? Fine tune or embedding? Community chatgpt , api	10	2284	August 20, 2023
I read about embeddings and I want to try it. How to start? Community embeddings , chatgpt , api	2	4856	August 11, 2023
Turning chatgpt API into a assistant for a (complex) website API	20	4360	December 21, 2023
Is it possible to fine-tune a model to answer questions given a raw text? Prompting	18	10262	December 15, 2023

Need Guidance about Custom Data Training. [Thanks]

Related topics