GPT 3.5 Turbo not utilising external data provided in .json file

vinniejstanzani · April 4, 2024, 11:12pm

The title explains my issue. Is there anything wrong with the script? I don’t get any error, but when I ask the assistant about a question relative to the json file, it gives me a generic chatgpt answer.

# Python program to translate
# speech to text and text to speech
import speech_recognition as sr
import pyttsx3
import os
import json
from dotenv import load_dotenv

# Load environmental variables from .env file
load_dotenv()

# Get the OpenAI API key from the environmental variable
OPENAI_KEY = os.getenv('OPENAI_KEY')

import openai

# Check if the API key is available
if OPENAI_KEY is None:
    raise ValueError("OpenAI API key is not set. Please set it in your environment variables.")

# Set the OpenAI API key
openai.api_key = OPENAI_KEY

# Function to convert text to speech
def SpeakText(command): 
    # Initialize the engine
    engine = pyttsx3.init()
    engine.say(command)
    engine.runAndWait()

# Initialize the recognizer
r = sr.Recognizer()

def record_text():
    # Loop in case of errors
    while True:
        try:
            # use the microphone as source for input
            with sr.Microphone() as source2:
                # Prepare recognizer to receive input
                r.adjust_for_ambient_noise(source2, duration=0.1)
                print("I'm listening")
                # Listens for the user's input
                audio2 = r.listen(source2)
                # Using Google to recognize audio
                MyText = r.recognize_google(audio2)
                return MyText
        except sr.RequestError as e:
            print("Could not request results; {0}".format(e))
        except sr.UnknownValueError:
            print("Unknown error occurred")

def send_to_chatGPT(messages, model="gpt-3.5-turbo"):
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        max_tokens=100,
        n=1,
        stop=None,
        temperature=0.5,
    )
    message = response.choices[0].message.content
    messages.append(response.choices[0].message)
    return message

# Load personal data from JSON file
def load_personal_data(file_path):
    with open(file_path, 'r') as file:
        personal_data = json.load(file)
    return personal_data.get('messages', [])

# Path to the JSON file containing personal data
personal_data_file = 'personal_data.json'

# Load personal data from JSON file
personal_messages = load_personal_data(personal_data_file)

# Add personal messages to the beginning of the messages list
messages = personal_messages.copy()

while True:
    text = record_text()
    messages.append({"role": "user", "content": text})
    response = send_to_chatGPT(messages)
    SpeakText(response)
    print(response)

And this is the json file:

{
    "messages": [
        {
            "role": "system",
            "content": "An AI assistant like Jarvis from Iron man, named ASTRA."
        },
        {
            "role": "user",
            "content": "Who owns the ABC company?"
        },
        {
            "role": "assistant",
            "content": "ABC is owned by Mr. XYZ."
        }
    ]
}

I am completely new to coding & the community. I apologise in advance if i posted this in the wrong category :).

arstamyanarthur5 · April 4, 2024, 11:39pm

It has to be jsonl file not json.

I cant comment just text here? It keeps saying <<sorry the body seems unclear is it a sentence?>>

Edit:
I’m new here too and that was my first comment and I think when we write only one sentence or short sentence the system doesn’t allow to post it as a comment

_j · April 5, 2024, 1:06am

You cannot fool the overtrained AI into new knowledge by giving it “pretend answering”. You are trying to start messages with:

[{'role': 'system', 'content': 'An AI assistant like Jarvis from Iron man, named ASTRA.'}, {'role': 'user', 'content': 'Who owns the ABC company?'}, {'role': 'assistant', 'content': 'ABC is owned by Mr. XYZ.'}]

but if you add a new user question after that, the assistant is likely to say “I’m sorry, my last answer was in error as I don’t have any record of an ABC company”.

OpenAI made significant oversight in not giving a “new knowledge” type of role to the chat messages format. Therefore, we have to make the injection of new knowledge not look like a generated answer, nor information the user already knows, but give it a new type of annotated container. What works well is an inserted message before the current user message, in assistant role, that says something like “Here’s more information I automatically retrieved to help me answer the next question: xxx.”

The knowledge can be just documentation text, not simulated chat. You can keep your system message separate and focused on the AI identity and abilities.

Then you have only plausible speech-to-text. See here all the ways it won’t work right. You should print out what is being transcribed for diagnosis. Using Whisper will significantly improve the quality using an API under your control.

vinniejstanzani · April 5, 2024, 7:08am

Thank you so much!

I’ve made some edits to the script with some help from chatgpt… Should this work?

# Python program to translate
# speech to text and text to speech
import speech_recognition as sr
import pyttsx3
import os
import json
from dotenv import load_dotenv

# Load environmental variables from .env file
load_dotenv()

# Get the OpenAI API key from the environmental variable
OPENAI_KEY = os.getenv('OPENAI_KEY')

import openai

# Check if the API key is available
if OPENAI_KEY is None:
    raise ValueError("OpenAI API key is not set. Please set it in your environment variables.")

# Set the OpenAI API key
openai.api_key = OPENAI_KEY

# Function to convert text to speech
def SpeakText(command): 
    # Initialize the engine
    engine = pyttsx3.init()
    engine.say(command)
    engine.runAndWait()

# Initialize the recognizer
r = sr.Recognizer()

def record_text():
    # Loop in case of errors
    while True:
        try:
            # use the microphone as source for input
            with sr.Microphone() as source2:
                # Prepare recognizer to receive input
                r.adjust_for_ambient_noise(source2, duration=0.1)
                print("I'm listening")
                # Listens for the user's input
                audio2 = r.listen(source2)
                # Using Google to recognize audio
                MyText = r.recognize_google(audio2)
                return MyText
        except sr.RequestError as e:
            print("Could not request results; {0}".format(e))
        except sr.UnknownValueError:
            print("Unknown error occurred")

def send_to_chatGPT(messages, model="gpt-3.5-turbo"):
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        max_tokens=100,
        n=1,
        stop=None,
        temperature=0.5,
    )
    message = response.choices[0].message.content
    messages.append(response.choices[0].message)
    return message

# Function to load additional information from a text file
def load_additional_info(file_path):
    with open(file_path, 'r') as file:
        additional_info = file.read()
    return additional_info

# Path to the text file containing additional information
additional_info_file = 'additional_info.txt'

# Load additional information from the text file
additional_info = load_additional_info(additional_info_file)

# Add the additional information to the beginning of the messages list
messages = [{"role": "assistant", "content": additional_info}]

while True:
    text = record_text()
    
    # Add user's question
    messages.append({"role": "user", "content": text})
    
    # Get response from GPT-3
    response = send_to_chatGPT(messages)
    
    # Speak the response
    SpeakText(response)
    
    # Print the response
    print(response)

With a txt file structured like this:

Here's more information I automatically retrieved to help me answer the next question: 

ABC is owned by Mr. XYZ. The company was founded in 1990 and specializes in technology solutions for the healthcare industry. They are headquartered in New York City and have offices worldwide. Mr. XYZ has been the CEO since the company's inception and has led ABC to become a leading provider in its field.

_j · April 5, 2024, 7:52am

Let’s make an AI in the playground!

This shows how messages would look if you are also recording past user and assistant messages and passing them back as prior messages – the main purpose of the chat message format. I first ask the AI what it can do.

I give the AI identity in a format already expected. The file injection is placed into that same system message with the origin clearly noted (as if it was retrieved from an automatic semantic search database) . You could add it as another system message or assistant message, either before or after the past chat, and experiment yourself at what performs best, as there is no single answer for all AI models.

Topic		Replies	Views
Example of JSONL for fine-tuning with function support API fine-tuning	4	4554	February 18, 2024
Need help with Assistant (uploading file and getting response back) API assistants-api	6	2195	February 16, 2024
How can I make the bot a little bit smarter? API	6	1034	October 21, 2023
Custom ChatBot for my startup API chatgpt , chat-completion , custom-gpt	6	4059	December 16, 2023
Instruction for Support Assistant Prompting gpt-4 , api , custom-instructions , assistants , assistants-api	2	2113	January 18, 2024

GPT 3.5 Turbo not utilising external data provided in .json file

Related topics