Enhancing Dataset Consistency: OpenAI Model Responses Restricted to Dataset Scope, Default Reply for Unknown Queries

I have fine tuned a model in OpenAI using my dataset. But if I give a question that is not in the dataset the model still answers the question. I don’t need that.

I have given my question and answer as prompt completion pair. I only need to get answer if they are in my dataset. If not then I need to show a Default answer " Sorry out of my context"

How?

Welcome to the developer forum!

Fine tuning is not a reliable method for teaching the model new data, it’s a way to teach it new patterns and new ways of handling data.

I think you should look into embeddings and then retrieve from your embedded knowledge base based on the user query, you can then pass that data as demarked context, and tell it that if the query cannot be answered with the information in the context markers it should answer “unknown”

Information regarding embeddings can be found here

1 Like

If you only want replies to specific questions, then maybe use FAQ database.

fine-tune works on top of its current knowledge, so it will answer anything. You could try using a prompt that tells it only to answer questions on specific topic(s). - ( do specify the topic(s) names, as it wont be aware of which topics were in the training data compared with the topics its already aware of.).

eg. you are a helpful assistant that is only able to answer questions on {…}; any questions outside of {…} please reply with “Sorry out of my context”

I already tried promt engineering. It works but when I ask questions from my dataset. It doesn’t answer as per my dataset completion it takes data from openAI as well.

Typically you demark the data you wish the model to use, pehaps with ### markers at the start and end and then you can instruct the model to only answer queries using that context, you can generate the context at question time using embedding retrieval.

Can I use my fine tuned model along with the embedding retrieval.

Sure, if you have fine tuned your model to respond in a certain way, you can leverage that. If your fine tune was only aimed at adding data then it is likley that it will have little value, but if it was for teaching it the correct way to respond, then you will still get those advantages. As it stands, GPT-3.5-Turbo or GPT-4 will usually provide better performance when used with embeddings and basic prompt engineering.

I used promt and completion as given in the fine tune documentation to fine tune my davinci model.

I understand, what I think you are missing is that you wanted the model to work from your company dataset, and finetuning is not the correct way to go about that. As I mentioned, the embeddings route is the way to go if you wish to leverage your existing documentation with the AI.

import openai

# Set up your OpenAI API key
openai.api_key = "YOUR_API_KEY"

def retrieve_answer(prompt):
    # Use the OpenAI API to get the model's response
    response = openai.Completion.create(
        engine="text-davinci-002",  # Replace with the appropriate engine
        prompt=prompt,
        max_tokens=150,  # Adjust the max tokens as needed to control the response length
        stop=None,  # You can provide a list of strings to stop the response if needed
    )

    # Check if the response contains a valid answer
    if response['choices'][0]['text'].strip() == '':
        return "Sorry, I couldn't find an answer to your question."

    return response['choices'][0]['text'].strip()

# Retrieve answers for each question
for question in questions:
    prompt = f"{context} {question}"
    answer = retrieve_answer(prompt)
    print(f"Question: {question}\nAnswer: {answer}\n")

context = "Your Spotify is skipping because Uninstalling and reinstalling the application generally fixes most issues with skipping, followed by turning your device off and on. Here are some of the fixes you can try on different devices based on solutions verified by other Spotify Community users and moderators. Spotify Stations will be deprecated on May 16th, 2022. Spotify is releasing a transferring tool that allows you to migrate your favorite stations to Your Library on the Spotify application and easily continue listening to your favorite music. Spotify Radio is a perfect tool to get access to a big collection of songs based on any artist, album, playlist, or song."

questions = [
    "How can I fix Spotify skipping?",
    "When will Spotify Stations be deprecated?",
    "What is Spotify Radio?",
    # Add more questions here
]

is this the way to go forward. I didn’t understand the embedding when I looked into the cookbook github

Be worth giving this a watch