Fine-tuning gpt-3.5-turbo

I want to train the gpt-3.5-turbo model on some medical research my friend did, its in the format of a book. What is the best way to format the training file to complete this?

is it 1 book or is it a multitude of journals with some books?

1 way you could go about it is to digitize it and then have the Api trained to look in the digitized book such formats like .txt,.pdf, epub. for a couple examples to seach for a the relitive eqivelent to a question u ask

import re

# Simulated digitized books
books = {
    'book1.txt': "This is the content of the first book. It contains information about various topics.",
    'book2.txt': "The second book discusses different subjects and provides detailed explanations.",
    'book3.txt': "Book number three has a wealth of knowledge on various subjects as well.",
}

# Function to search for a query in the books
def search_books(query):
    results = []
    for book_name, content in books.items():
        # Use regular expression to find matches (case-insensitive)
        matches = re.findall(r'\b{}\b'.format(re.escape(query)), content, re.IGNORECASE)
        if matches:
            results.append({
                'book_name': book_name,
                'matches': matches,
            })
    return results

# Example query
query = "subjects"

# Search for the query in the books
search_results = search_books(query)

# Display the search results
if search_results:
    for result in search_results:
        print(f"Book: {result['book_name']}")
        for match in result['matches']:
            print(f"Match: {match}")
else:
    print("No matches found.")

see as an exsample but u have to have more specific questions or u will have errors u can add a fuction like this

import re
import spacy

# Simulated digitized books
books = {
    'book1.txt': "This is the content of the first book. It contains information about various topics.",
    'book2.txt': "The second book discusses different subjects and provides detailed explanations.",
    'book3.txt': "Book number three has a wealth of knowledge on various subjects as well.",
}

# Load the spaCy NLP model
nlp = spacy.load("en_core_web_sm")

# Function to search for a query in the books
def search_books(query):
    results = []
    for book_name, content in books.items():
        # Use spaCy for NLP processing
        doc = nlp(content)
        
        # Iterate through sentences in the document
        for sent in doc.sents:
            # Check if the query is present in the sentence
            if query.lower() in sent.text.lower():
                results.append({
                    'book_name': book_name,
                    'sentence': sent.text,
                })
    return results

# Function to ask a question and get a more accurate answer
def ask_question(question):
    search_results = search_books(question)
    if search_results:
        return search_results[0]['sentence']  # Return the first matching sentence
    else:
        return "I couldn't find an answer to your question."

# Example question
question = "What is discussed in book number three?"

# Ask the question and get an answer
answer = ask_question(question)
print("Answer:", answer)

all depends on what ur doing

Fine tuning will not teach the model about the information in the books, it will teach the model the style of how the books were created, if you wish to store knowledge then you would be better using Embeddings and to store the books in chunks.

1 Like

To add on to the @Foxalabs recommendation, I suggest you watch this video to truly understand the fundamental difference between fine-tuning and embedding: https://www.youtube.com/watch?v=9qq6HTr7Ocw&t=110s&ab_channel=DavidShapiro~AI

And, if you’re sure embeddings is the way you want to go, here is an excellent tutorial on the entire process, from embedding to chat completion: https://www.youtube.com/watch?v=ih9PBGVVOO4

Good luck!

1 Like

i mean technically it’s not even" ai" it a chatbot with base task so yes and no it does both and more depends on what u tell it to do and what u attach to the bot that’s all gpt is a nnl with a chatbot interface and structure. a duck is a duck, and a house is a house. 1+ any thing is 1 more it all works for exsample heres the one i worked on as a api with no api bot with an accuracy of 99.6% for what its used for

import xml.etree.ElementTree as ET
import xml.dom.minidom as minidom

# Function to load or create a knowledge base
def load_or_create_knowledge_base(file_path):
    try:
        tree = ET.parse(file_path)
        root = tree.getroot()
    except FileNotFoundError:
        root = ET.Element("knowledgebase")
        tree = ET.ElementTree(root)
        tree.write(file_path)
    return root

# Function to add a new Q&A pair to the knowledge base
def add_qna(knowledge_base, category, question, answer):
    # Check if the category already exists
    category_elem = knowledge_base.find(f"./category[@name='{category}']")

    if category_elem is None:
        # Create a new category if it doesn't exist
        categories_elem = knowledge_base.find("./categories")
        if categories_elem is None:
            categories_elem = ET.Element("categories")
            knowledge_base.append(categories_elem)

        category_elem = ET.SubElement(categories_elem, "category", name=category)

    # Create a new qna element and add the question and answer
    qna = ET.SubElement(category_elem, "qna")
    q = ET.SubElement(qna, "question")
    q.text = question
    a = ET.SubElement(qna, "answer")
    a.text = answer

# Function to get a response from the knowledge base
def get_response(knowledge_base, question):
    for category_elem in knowledge_base.iter("category"):
        for qna_elem in category_elem.iter("qna"):
            q = qna_elem.find("question")
            a = qna_elem.find("answer")
            if q is not None and a is not None:
                if question.lower() == q.text.lower():
                    return a.text
    return None

# Function to interact with the chatbot in test mode
def test_chatbot(knowledge_base, xml_file_path):
    print("Danm Test Mode: You can interact with the chatbot without adding new questions and answers.")
    while True:
        question = input("You: ")
        if question.lower() == "end of test":
            break
        response = get_response(knowledge_base, question)
        if response:
            print("Danm:", response)
        else:
            print("Danm: I don't know the answer. Would you like to add this to my knowledge base?")
            category = input("Chatbot: In which category should this question belong?\nYou: ")
            answer = input("Chatbot: What's the answer to this question?\nYou: ")
            add_qna(knowledge_base, category, question, answer)
            save_knowledge_base(knowledge_base, xml_file_path)
            print("Danm: Thanks! I've added that to my knowledge base.")

# Function to save the knowledge base to the XML file with pretty-printing
def save_knowledge_base(knowledge_base, file_path):
    # Convert the ElementTree to a string
    xml_string = ET.tostring(knowledge_base, encoding="utf-8")
       
    # Use minidom to prettify the XML
    parsed = minidom.parseString(xml_string)
    pretty_xml = parsed.toprettyxml(indent="    ")  # Adjust the indentation as needed

    # Write the prettified XML to the file
    with open(file_path, "wb") as xml_file:
        xml_file.write(pretty_xml.encode("utf-8"))

# Main function
if __name__ == "__main__":
    xml_file_path = r"location here"
    knowledge_base = load_or_create_knowledge_base(xml_file_path)

    while True:
        print("Danm: what do you want to know?")
        command = input("You: ")

        if command.lower() == "exit":
            break
        elif command.lower() == "lets do a test":
            test_chatbot(knowledge_base, xml_file_path)
            print("Danm: Test mode ended. You can continue interacting with the chatbot.")
        else:
            response = get_response(knowledge_base, command)
            if response:
                print("Danm:", response)
            else:
                print("Danm: I don't know the answer. Would you like to add this to my knowledge base?")
                category = input("Danm: In which category should this question belong?\nYou: ")
                answer = input("Danm: What's the answer to this question?\nYou: ")
                add_qna(knowledge_base, category, command, answer)
                save_knowledge_base(knowledge_base, xml_file_path)
                print("Danm: Thanks! I've added that to my knowledge base.")

just make xml file named “xml_knowledge_base.xml” and it will save the informating if it dosnt have it to the xml but u can add more like add api script to it that asks chatgpt for a list of questions and ancers with to each question by running this throw the chatbot

like so

import xml.etree.ElementTree as ET
import xml.dom.minidom as minidom
import openai  # Install the 'openai' library if not already installed

# Function to load or create a knowledge base
def load_or_create_knowledge_base(file_path):
    try:
        tree = ET.parse(file_path)
        root = tree.getroot()
    except FileNotFoundError:
        root = ET.Element("knowledgebase")
        tree = ET.ElementTree(root)
        tree.write(file_path)
    return root

# Function to add a new Q&A pair to the knowledge base
def add_qna(knowledge_base, category, question, answer):
    # Check if the category already exists
    category_elem = knowledge_base.find(f"./category[@name='{category}']")

    if category_elem is None:
        # Create a new category if it doesn't exist
        categories_elem = knowledge_base.find("./categories")
        if categories_elem is None:
            categories_elem = ET.Element("categories")
            knowledge_base.append(categories_elem)

        category_elem = ET.SubElement(categories_elem, "category", name=category)

    # Create a new qna element and add the question and answer
    qna = ET.SubElement(category_elem, "qna")
    q = ET.SubElement(qna, "question")
    q.text = question
    a = ET.SubElement(qna, "answer")
    a.text = answer

# Function to get a response from the knowledge base
def get_response(knowledge_base, question):
    for category_elem in knowledge_base.iter("category"):
        for qna_elem in category_elem.iter("qna"):
            q = qna_elem.find("question")
            a = qna_elem.find("answer")
            if q is not None and a is not None:
                if question.lower() == q.text.lower():
                    return a.text
    return None

# Function to interact with the chatbot in test mode
def test_chatbot(knowledge_base, xml_file_path):
    print("Danm Test Mode: You can interact with the chatbot without adding new questions and answers.")
    while True:
        question = input("You: ")
        if question.lower() == "end of test":
            break
        response = get_response(knowledge_base, question)
        if response:
            print("Danm:", response)
        else:
            print("Danm: I don't know the answer. Would you like to add this to my knowledge base?")
            category = input("Chatbot: In which category should this question belong?\nYou: ")
            answer = input("Chatbot: What's the answer to this question?\nYou: ")
            add_qna(knowledge_base, category, question, answer)
            save_knowledge_base(knowledge_base, xml_file_path)
            print("Danm: Thanks! I've added that to my knowledge base.")

# Function to save the knowledge base to the XML file with pretty-printing
def save_knowledge_base(knowledge_base, file_path):
    # Convert the ElementTree to a string
    xml_string = ET.tostring(knowledge_base, encoding="utf-8")
       
    # Use minidom to prettify the XML
    parsed = minidom.parseString(xml_string)
    pretty_xml = parsed.toprettyxml(indent="    ")  # Adjust the indentation as needed

    # Write the prettified XML to the file
    with open(file_path, "wb") as xml_file:
        xml_file.write(pretty_xml.encode("utf-8"))

# Function to generate questions and answers using GPT-3
def generate_questions_and_answers():
    openai.api_key = "YOUR_API_KEY_HERE"  # Replace with your API key
    prompt = "Generate a question and answer pair."
    response = openai.Completion.create(
        engine="davinci",
        prompt=prompt,
        max_tokens=50,  # Adjust as needed
        n=1,  # Number of responses
    )
    return response.choices[0].text.strip()

# Function to ask and save 50 Q&A pairs, including those generated by GPT-3
def ask_and_save_questions(knowledge_base, xml_file_path):
    print("Danm: Let's add 50 Q&A pairs to the knowledge base.")
    for i in range(50):
        if i < 25:
            print(f"Question {i + 1}:")
            question = input("You: ")
            answer = input("Danm: What's the answer to this question?\nYou: ")
        else:
            generated_qa = generate_questions_and_answers()
            question, answer = generated_qa.split('\n')  # Split generated text into question and answer

        category = input("Danm: In which category should this question belong?\nYou: ")
        add_qna(knowledge_base, category, question, answer)
        print(f"Danm: Q&A pair {i + 1} added to the knowledge base.")

    save_knowledge_base(knowledge_base, xml_file_path)
    print("Danm: All 50 Q&A pairs have been added and the knowledge base has been saved.")

# Main function
if __name__ == "__main__":
    xml_file_path = "xml_knowledge_base.xml"  # Change to your desired file path
    knowledge_base = load_or_create_knowledge_base(xml_file_path)

    while True:
        print("Danm: What do you want to know?")
        command = input("You: ")

        if command.lower() == "exit":
            break
        elif command.lower() == "lets do a test":
            test_chatbot(knowledge_base, xml_file_path)
            print("Danm: Test mode ended. You can continue interacting with the chatbot.")
        elif command.lower() == "add 50 questions":
            ask_and_save_questions(knowledge_base, xml_file_path)
        else:
            response = get_response(knowledge_base, command)
            if response:
                print("Danm:", response)
            else:
                print("Danm: I don't know the answer. Would you like to add this to my knowledge base?")
                category = input("Danm: In which category should this question belong?\nYou: ")
                answer = input("Danm: What's the answer to this question?\nYou: ")
                add_qna(knowledge_base, category, command, answer)
                save_knowledge_base(knowledge_base, xml_file_path)
                print("Danm: Thanks! I've added that to my knowledge base.")

tida i guess why im here

Not sure what you mean by “it’s not an AI” it is a massive neural net with more neurons than are in the human brain, calling it a chatbot is a huge underestimate of it’s capabilities.

GPT models are able to infer meaning from words and they are capable of performing advanced logical reasoning and deduction, if GPT is just a chatbot then so is every person you have ever met.

dude a mushroom has a neural network, trees have neural networks, ameba’s have neural network…
if u don’t have the brain cells it’s not a brain it can’t be an “artificial intelligence” if we can’t make people in their entirety such as memories, emotions, sensations, Tast, thought prosses, the list goes on.

and thinking that there’s an chatbot that is human or as a human also called “artificial intelligence” is quite preposterous.

who made it and is it smarter than the makers i mean it does and knows what we wanted it too

that’s not ai that’s a library of information no memories feelings thought’s anything but a q and a or “questions and answers” if u as a person with free will gave it “free-will” or no input required it would problem put itself back to default because it honestly would feel bad for use or it could take over the market again and make everything go to 0 like in the 80s and activated false nuke alarms everywhere

why we don’t have real ai in the construct that u r thinking of we shut alot of those down throw the 80 and 90s there laws against it for reasons

if u don’t understand advanced fundamentals of the human mind u can’t make it like making a diamond from rubber Bands.

like
Asimov’s laws
basic computer science and technology
anatomy and psychology
and have the ability to code "
emotions
memory’s
Morales
coping mechanics
" just for starts

i mean the rat brain computer was more ai cuz it was made with rats and has the neurons pathways to make it work, harder to do with just code technologies are not there yet maybe in 300 years if we all just become coders but longer if its just us…

if u think I’m lying or disagree just ask ChatGPT it will tell u itself its not an ai it’s a chatbot

1 Like

by the way google it its true unfortunately but 2+2 is 4

I just wanted to say thank you for your response, you just gave me so much help. Thank you so much!

I was wondering if this vector database would be any help in what I want to do.