I am currently contemplating which approach would yield better results between embedding and list prompting.
Let’s consider a scenario with 50 different documents. With embedding, I can log user inputs and use vector storages to direct which message (input) goes where, determining which document or documents need to be read by AI before providing an answer. This approach will work well if I can manage vector storage effectively.
On the other hand, without embeddings, using only prompts and commands to GPT-4, I can also provide titles or short descriptions that the AI can understand, helping it grasp the user’s inquiry and select the correct document for a more accurate response.
Currently, I am trying to determine which process will work better in terms of efficiency (less work, better results, lower cost, difficulty, accuracy, etc.). Can anyone clearly indicate which approach they would recommend, and why?
If I’m understanding you correctly (please let me know if I’ve misunderstood), this is a simple answer:
better results: Padding the prompt
lower cost: Almost guaranteed embedding (unless very long dogs and very few queries)
difficulty: Embedding would definitely be easier; it doesn’t require attempting to semantically determine what content to embed as with prompt-packing.
accuracy: Same as “better results” in this context.
I should clarify that my understanding is that you’re comparing (a) embedding vs (b) packing prompts with content, assuming the original documents are relatively short.
Big picture, I can’t imagine a scenario where embedding wouldn’t be a lot easier and less expensive. In addition, because the documents must be relatively short (based on the point about adding them into prompts), the accuracy delta probably wouldn’t be consequential.
The biggest difference I suspect is “list prompting” wouldn’t scale well (and let’s not get into LLM performance degrading with longer prompts … )
The embedding approach scales extremely well because increasing your dataset does not, all other things being equal, increase the amount of tokens being handled.