Fine tuning vs. Embedding

curt.kennedy · January 14, 2023, 11:44pm

If you want a detailed walkthrough on using embeddings for questions about Mars, I would mimic this tutorial.

Basically, you embed all your facts about Mars. Then the incoming question comes in and you embed this. You correlate your incoming question with the entire set of embedded facts. Then based on the top correlations, you pull all the top facts from the database and form a prompt out if this (truncate to fit the limited size of the prompt window). Then you ask GPT-3 to answer the question based on all the top correlated facts in your prompt.

This is probably the best way to extract specific knowledge.

If you fine-tune, it might not be as specific to your facts as you like because you are trying to overcome the noise from the entire set of GPT-3 coefficients (which was trained on the internet, and may not possess your facts).

When it comes to vector databases, you can probably ditch them if you have less than a million embedded facts, but you (or someone helping you) would have to be proficient at database and some amount of coding to achieve this on your own. So don’t get scared away by Pinecone or Weaviate.

github.com

openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c4ca8276-e829-4cff-8905-47534e4b4d4e",
   "metadata": {},
   "source": [
    "# Question Answering using Embeddings\n",
    "\n",
    "Many use cases require GPT-3 to respond to user questions with insightful answers. For example, a customer support chatbot may need to provide answers to common questions. The GPT models have picked up a lot of general knowledge in training, but we often need to ingest and use a large library of more specific information.\n",
    "\n",
    "In this notebook we will demonstrate a method for enabling GPT-3 able to answer questions using a library of text as a reference, by using document embeddings and retrieval. We'll be using a dataset of Wikipedia articles about the 2020 Summer Olympic Games. Please see [this notebook](fine-tuned_qa/olympics-1-collect-data.ipynb) to follow the data gathering process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "9e3839a6-9146-4f60-b74b-19abbc24278d",
   "metadata": {},
   "outputs": [],

This file has been truncated. show original

Topic		Replies	Views
Need some help with understanding embedding/fine-tuning API	2	1324	December 17, 2023
What's better for the type of chatbot I am building? Fine tune or embedding? Community chatgpt , api	10	1402	August 20, 2023
Fine-tuning myths / OpenAI documentation API	24	11616	December 23, 2023
Prompt Assistance , Potentially Fine Tuning oddity Prompting	6	958	February 7, 2023
How can we make the answer concise with fine tuning? API fine-tuning , api	8	2175	June 7, 2023

Fine tuning vs. Embedding

Related Topics