In need of some expert advice

dan.benton-smith · September 23, 2023, 12:36pm

Hey everyone, I am a little late to the party when it comes to the Open AI API so please go easy on me.

My first foray into the Open AI API was to create a chatbot for a clients website (online guitar courses and lessons).

The chatbot has an initial discussion with the user, gathers information such as current ability level, favourite genres, favourite artists etc etc.

Once this information has been gathered I then use Open AI’s function calling feature to execute a python script.

The Python script loads a large .xlsx file (containing all courses, lessons, meta data etc) into a Pandas dataframe and uses the users criteria to return 3 course recommendations.

Currently this does work pretty well but is really slow in fetching the recommendations.

Here are my questions:

Is there a better and/or more efficient way of doing this?
In my case, would it be better to “train” a GPT model on my own data for carefully crafted and instant responses? If so where would I even start with this?

Again, apologies for my lack of understanding here.

Many thanks in advance.

N2U · September 23, 2023, 12:55pm

No need to apologize my friend, welcome to the developer community forum!

This sounds like a great project, one way of optimizing this is using embeddings, but in your case it might be better with an optimized search function. If you show us your code and some example data, we can do a lot more to help you

dan.benton-smith · September 23, 2023, 1:37pm

Hey man, thanks.

I’ll post my Flask endpoint code below to give you an idea of what’s happening behind the scenes, just for some context this is interacting with my React front end:

@app.route('/api/courses-chat', methods=['POST'])
def handle_chat():
    #---------------------------------------------------------------------------------
    # Set Open AI Key (Kims)
    #---------------------------------------------------------------------------------

    openai.api_key = ""

    #---------------------------------------------------------------------------------
    # Define function variable for Open AI
    #---------------------------------------------------------------------------------

    function = [
        {
            "name": "get_courses",
            "description": "Get course recommendations based on users criteria",
            "parameters": {
                "type": "object",
                "properties": {
                    "skill_level": {
                        "type": "string",
                        "description": "The users current guitar ability or skill level"
                    },
                    "techniques": {
                        "type": "string",
                        "description": "Guitar techniques the user would like to learn or focus on"
                    },
                    "artists": {
                        "type": "string",
                        "description": "Artists and guitarists that the user likes"
                    },
                    "tutor": {
                        "type": "string",
                        "description": "Licklibrary tutors that the user likes"
                    }
                },
                "required": ["skill_level", "techniques", "artists", "tutor"]
            }
        }
    ]

    #---------------------------------------------------------------------------------
    # Define system prompt
    #---------------------------------------------------------------------------------

    with open("systemPrompt.txt") as file:
        system = file.read()

    #---------------------------------------------------------------------------------
    # Define User prompt
    #---------------------------------------------------------------------------------

    user = "I need help learning the guitar"
    user2 = "I am a beginner guitarist. I like BB King and I would like to focus on guitar licks. I also really like Danny Gill"

    #---------------------------------------------------------------------------------
    # Get conversation history
    #---------------------------------------------------------------------------------

    conversationHistory = request.json.get('conversationHistory', [])
    # Ensure every message in conversationHistory has both a 'role' and a 'content'
    for message in conversationHistory:
        if not ('role' in message and 'content' in message):
            return jsonify({"response": "Invalid message format in conversation history."})

    # If conversationHistory is empty or lacks the system message, initialize it accordingly
    if not conversationHistory or conversationHistory[0]['role'] != 'system':
        conversationHistory.insert(0, {'role': 'system', 'content': system})

    if len(conversationHistory) == 1:
        conversationHistory.append({"role": "assistant", "content": "Hey, how can I help you on your Guitar journey today?"})
    print(conversationHistory)
    
    #---------------------------------------------------------------------------------
    # Define function to be called (get_courses)
    #---------------------------------------------------------------------------------

    import pandas as pd

    def get_courses(skill_level, techniques, artists, tutor):
        # Load the dataset
        df = pd.read_excel('products.xlsx')
        
        # Prepare the criteria
        keywords = skill_level.split() + techniques.split() + artists.split() + tutor.split()
        
        # Function to count keyword matches in a string
        def count_matches(string):
            string = str(string)  # Convert non-string values to strings
            return sum(1 for keyword in keywords if keyword.lower() in string.lower())
        
        # Apply the function to each specified column and sum the results
        df['match_count'] = (
            df['description'].apply(count_matches) +
            df['meta_description'].apply(count_matches) +
            df['artists'].apply(count_matches) +
            df['tutors'].apply(count_matches) +
            df['tags'].apply(count_matches) +
            df['genres'].apply(count_matches) +
            df['skills'].apply(count_matches) +
            df['techniques'].apply(count_matches)
        )
        
        # Sort the DataFrame based on match count and take the top 3
        top_courses = df.sort_values('match_count', ascending=False).head(3)
        
        # Create an array of objects with the specified properties
        courses_array = top_courses[['id', 'name', 'slug', 'image', 'store_image', 'tutors']].to_dict(orient='records')
        
        # Convert the top courses to JSON
        top_courses_json = top_courses.to_json(orient='records')
        
        # Create a dictionary to hold both values
        result = {
            "top_courses": json.loads(top_courses_json), 
            "courses_array": courses_array
        }
        
        # Convert the result dictionary to a JSON string and return
        return json.dumps(result)


    #---------------------------------------------------------------------------------
    # First call to Open AI Model
    #---------------------------------------------------------------------------------

    response_one = openai.ChatCompletion.create(
            model="gpt-3.5-turbo-0613",
            messages=conversationHistory,
            functions=function,
            function_call="auto"
        )

    #---------------------------------------------------------------------------------
    # Get response and print data
    #---------------------------------------------------------------------------------

    output = response_one.choices[0].message
    print(output)

    #---------------------------------------------------------------------------------
    # Second call to Open AI API - First we check if function_call property is present in the returned object
    #---------------------------------------------------------------------------------

    if "function_call" in output and output.function_call is not None:
        # Get the criteria object
        params = json.loads(output.function_call.arguments)
        chosen_function = eval(output.function_call.name)
        user_criteria = chosen_function(**params)
        #print(f"Params: {params}")

        # Call Open AI sending users criteria
        response_two = openai.ChatCompletion.create(
            model="gpt-3.5-turbo-0613",
            messages=[
                {"role": "system", "content": system},
                {"role": "function", "name": output.function_call.name, "content": user_criteria}
                ],
            functions=function,
        )
        assistant = response_two['choices'][0]['message']['content']
        
        # Parse the user_criteria to get the overview and courses_data
        user_criteria_data = json.loads(user_criteria)
        top_courses = user_criteria_data.get('top_courses', [])
        courses_data = user_criteria_data.get('courses_array', [])
        
        # Create a summary string based on the top_courses data
        overview = "Here are the top 3 courses based on your preferences:\n"
        for i, course in enumerate(top_courses, 1):
            overview += f"{i}. {course['name']} by {course['tutors']}\n"
        
        # Structure the response
        response_data = {
            "overview": overview,
            "courses_data": courses_data
        }
        
        return jsonify(response_data)
    else:
        return jsonify({"response": response_one['choices'][0]['message']['content']})

Here is a small sample from my dataset:

id	name	slug	product_code	type	active	ultimate_collection	only_show_in_course	description	meta_description	alt_tag	trailer	preview_duration	free_to_all	unlisted	noindex	artists	tutors	tags	genres	skills	techniques	series
1408	(I Can’t Get No) Satisfaction Guitar Backing Track	i-cant-get-no-satisfaction-guitar-backing-track	NULL	backing-track	1	NULL	NULL	This Rolling Stones easy guitar backing track lets you play that famous Keith Richards ‘Satisfaction‘ guitar riff with a full band.	This Rolling Stones easy guitar backing track lets you play that famous Keith Richards ‘Satisfaction‘ guitar riff with a full band.	(I Can’t Get No) Satisfaction	NULL	90	NULL	NULL	NULL	Rolling Stones	NULL	NULL	Rock	Pop	Suitable For All	Easy
7623	12 Bar Blues Progression	12-bar-blues-progression	RDR0413	song-lesson	1	NULL	1	NULL	NULL	12 Bar Blues Progression	NULL	180	NULL	NULL	NULL	Joe Bonamassa	Danny Gill	Eric Clapton	Eric Johnson	Joe Bonamassa	Blues	Rock
37152	‘80s Pop F#m Guitar Backing Track	80s-pop-fm-guitar-backing-track	RDR0003	backing-track	1	NULL	NULL	F# minor/Aeolian (A major) scales are a good choice for soloing over this dreamy guitar backing track, but more memorable melodies can be created with F#m pentatonic.	F# minor/aeolian ballad guitar backing track	LTP Your Own Rock Solos Minor Jamtrax	NULL	90	NULL	NULL	NULL	Stuart Bull	Stuart Bull	NULL	Pop	80’s Rock	NULL	NULL

N2U · September 23, 2023, 2:49pm

Always happy to help, let’s get back to your questions:

Absolutely, your code has multiple areas that could benefit from optimization.

1. Configuration and Initialization

Move the OpenAI API key initialization and function definitions outside of the route function. They remain constant and don’t need to be re-initialized for each request.

2. File Operations

Cache the contents of systemPrompt.txt and products.xlsx to avoid repeated I/O operations on each request.

3. Pandas DataFrame Optimization

Optimizing Pandas DataFrame operations can significantly improve performance, especially when dealing with large datasets.

Original Code

You apply the count_matches function to each column separately and then sum them, which involves multiple passes over the DataFrame.

Optimized Code

Vectorization: Use Pandas’ built-in methods that are optimized for vector operations.
Single Pass: Aggregate all relevant text columns into a single column and apply the count_matches function once.
Use nlargest: Get the top 3 rows based on match count without sorting the entire DataFrame.

Here’s a sample snippet I asked GPT to write:

def get_courses(skill_level, techniques, artists, tutor):
    # Use the already loaded df
    global df
    
    # Prepare the criteria
    keywords = skill_level.split() + techniques.split() + artists.split() + tutor.split()
    
    # Aggregate all relevant columns into a single string column
    df['all_text'] = df[['description', 'meta_description', 'artists', 'tutors', 'tags', 'genres', 'skills', 'techniques']].apply(lambda row: ' '.join(row.values.astype(str)), axis=1)
    
    def count_matches(string):
        return sum(1 for keyword in keywords if keyword.lower() in string.lower())
        
    # Apply count_matches once to the aggregated column
    df['match_count'] = df['all_text'].apply(count_matches)
    
    # Use nlargest to get the top 3
    top_courses = df.nlargest(3, 'match_count')
    # ... (rest of your code)

For specialized responses, consider fine-tuning an existing OpenAI model. The OpenAI Cookbook has a detailed guide on this, which you can find here:

Hope this helps!

Topic		Replies	Views
How can I optimize my GPT4 workflow API gpt-4	11	798	May 10, 2024
Send me your GPT problems, I'll solve them for free and make a YouTube video Community	77	8097	January 3, 2024
Fine-tuned model sometimes repeats itself verbatim Prompting	10	3588	November 6, 2023
Building the Ultimate Chatbot: What Do You Think of My Strategy? API	30	6290	December 18, 2023
Adding my own data to ChatGPT API	27	37423	May 1, 2024