Combing domain facts with model generated text

pawnxing · April 26, 2023, 1:06pm

I’m a SWE with very little knowledge of AI language models.

I’m having fun and mostly success building a API client, using the chat completions endpoint. One thing I want to avoid is a snowball of prompts with the AI as the chat conversation grows.

Question: I have a set of domain facts that are not part of ChatGPT language models. I would like to have ChatGPT weave in their generative model text with these domain facts. As an example, these facts maybe a pricing catalog in the shape of: product id, name, stock, price.

When I exclude the catalog in a call to chatGPT it makes something up. Such as when asked a very narrow and specific question like: “What is the price of XYZ?”.

I want to avoid having to include a system prompt with this catalog for every call to the chat completions API. I see that there is an endpoint to upload a file, can something like this be leveraged fro this? Do I need fine-tuning, embeddings?

anon10827405 · April 26, 2023, 1:37pm

Weave is a great word.

Typically one would use a vector database (such as Weaviate or Pinecone) with the embeddings of your catalog items to return specific information. You may want to consider using sparse embeddings to prioritize keywords rather than semantic relevance.

For testing (and fun) purposes a hybrid would be great, and more adaptable! (For example adding a 10/90 split weight between dense/sparse embeddings, because … who knows maybe it’s better??)

Once you have a robust retrieval system, your next question will (mainly because it was mine) how do I optimize and reduce the amount of data? Fortunately, there’s an answer for (almost) everything

github.com

openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "3b0435cb",
   "metadata": {},
   "source": [
    "# Question answering using embeddings-based search\n",
    "\n",
    "GPT excels at answering questions, but only on topics it remembers from its training data.\n",
    "\n",
    "What should you do if you want GPT to answer questions about unfamiliar topics? E.g.,\n",
    "- Recent events after Sep 2021\n",
    "- Your non-public documents\n",
    "- Information from past conversations\n",
    "- etc.\n",
    "\n",
    "This notebook demonstrates a two-step Search-Ask method for enabling GPT to answer questions using a library of reference text.\n",
    "\n",

This file has been truncated. show original

Topic		Replies	Views
Fine-tuning 3.5 turbo to act as conversational AI like Non-Playable Character in games API fine-tuning	4	1594	October 4, 2023
Expanding GPT domain knowledge Community chatgpt	4	1467	June 22, 2023
Best method of injecting relatively large amount of context to be leveraged in a response API	10	11707	December 17, 2023
How to create FAQ on internal company data? API	6	4433	December 18, 2023
Question answering with extended number of chunks API embeddings , chatgpt , fine-tuning	13	2392	June 6, 2023

Combing domain facts with model generated text

Related topics