In need of some expert advice

Hey everyone, I am a little late to the party when it comes to the Open AI API so please go easy on me.

My first foray into the Open AI API was to create a chatbot for a clients website (online guitar courses and lessons).

The chatbot has an initial discussion with the user, gathers information such as current ability level, favourite genres, favourite artists etc etc.

Once this information has been gathered I then use Open AI’s function calling feature to execute a python script.

The Python script loads a large .xlsx file (containing all courses, lessons, meta data etc) into a Pandas dataframe and uses the users criteria to return 3 course recommendations.

Currently this does work pretty well but is really slow in fetching the recommendations.

Here are my questions:

  • Is there a better and/or more efficient way of doing this?

  • In my case, would it be better to “train” a GPT model on my own data for carefully crafted and instant responses? If so where would I even start with this?

Again, apologies for my lack of understanding here.

Many thanks in advance.

1 Like

No need to apologize my friend, welcome to the developer community forum!

This sounds like a great project, one way of optimizing this is using embeddings, but in your case it might be better with an optimized search function. If you show us your code and some example data, we can do a lot more to help you :wink:

Hey man, thanks.

I’ll post my Flask endpoint code below to give you an idea of what’s happening behind the scenes, just for some context this is interacting with my React front end:

@app.route('/api/courses-chat', methods=['POST'])
def handle_chat():
    #---------------------------------------------------------------------------------
    # Set Open AI Key (Kims)
    #---------------------------------------------------------------------------------

    openai.api_key = ""

    #---------------------------------------------------------------------------------
    # Define function variable for Open AI
    #---------------------------------------------------------------------------------

    function = [
        {
            "name": "get_courses",
            "description": "Get course recommendations based on users criteria",
            "parameters": {
                "type": "object",
                "properties": {
                    "skill_level": {
                        "type": "string",
                        "description": "The users current guitar ability or skill level"
                    },
                    "techniques": {
                        "type": "string",
                        "description": "Guitar techniques the user would like to learn or focus on"
                    },
                    "artists": {
                        "type": "string",
                        "description": "Artists and guitarists that the user likes"
                    },
                    "tutor": {
                        "type": "string",
                        "description": "Licklibrary tutors that the user likes"
                    }
                },
                "required": ["skill_level", "techniques", "artists", "tutor"]
            }
        }
    ]

    #---------------------------------------------------------------------------------
    # Define system prompt
    #---------------------------------------------------------------------------------

    with open("systemPrompt.txt") as file:
        system = file.read()

    #---------------------------------------------------------------------------------
    # Define User prompt
    #---------------------------------------------------------------------------------

    user = "I need help learning the guitar"
    user2 = "I am a beginner guitarist. I like BB King and I would like to focus on guitar licks. I also really like Danny Gill"

    #---------------------------------------------------------------------------------
    # Get conversation history
    #---------------------------------------------------------------------------------

    conversationHistory = request.json.get('conversationHistory', [])
    # Ensure every message in conversationHistory has both a 'role' and a 'content'
    for message in conversationHistory:
        if not ('role' in message and 'content' in message):
            return jsonify({"response": "Invalid message format in conversation history."})

    # If conversationHistory is empty or lacks the system message, initialize it accordingly
    if not conversationHistory or conversationHistory[0]['role'] != 'system':
        conversationHistory.insert(0, {'role': 'system', 'content': system})

    if len(conversationHistory) == 1:
        conversationHistory.append({"role": "assistant", "content": "Hey, how can I help you on your Guitar journey today?"})
    print(conversationHistory)
    
    #---------------------------------------------------------------------------------
    # Define function to be called (get_courses)
    #---------------------------------------------------------------------------------

    import pandas as pd

    def get_courses(skill_level, techniques, artists, tutor):
        # Load the dataset
        df = pd.read_excel('products.xlsx')
        
        # Prepare the criteria
        keywords = skill_level.split() + techniques.split() + artists.split() + tutor.split()
        
        # Function to count keyword matches in a string
        def count_matches(string):
            string = str(string)  # Convert non-string values to strings
            return sum(1 for keyword in keywords if keyword.lower() in string.lower())
        
        # Apply the function to each specified column and sum the results
        df['match_count'] = (
            df['description'].apply(count_matches) +
            df['meta_description'].apply(count_matches) +
            df['artists'].apply(count_matches) +
            df['tutors'].apply(count_matches) +
            df['tags'].apply(count_matches) +
            df['genres'].apply(count_matches) +
            df['skills'].apply(count_matches) +
            df['techniques'].apply(count_matches)
        )
        
        # Sort the DataFrame based on match count and take the top 3
        top_courses = df.sort_values('match_count', ascending=False).head(3)
        
        # Create an array of objects with the specified properties
        courses_array = top_courses[['id', 'name', 'slug', 'image', 'store_image', 'tutors']].to_dict(orient='records')
        
        # Convert the top courses to JSON
        top_courses_json = top_courses.to_json(orient='records')
        
        # Create a dictionary to hold both values
        result = {
            "top_courses": json.loads(top_courses_json), 
            "courses_array": courses_array
        }
        
        # Convert the result dictionary to a JSON string and return
        return json.dumps(result)


    #---------------------------------------------------------------------------------
    # First call to Open AI Model
    #---------------------------------------------------------------------------------

    response_one = openai.ChatCompletion.create(
            model="gpt-3.5-turbo-0613",
            messages=conversationHistory,
            functions=function,
            function_call="auto"
        )

    #---------------------------------------------------------------------------------
    # Get response and print data
    #---------------------------------------------------------------------------------

    output = response_one.choices[0].message
    print(output)

    #---------------------------------------------------------------------------------
    # Second call to Open AI API - First we check if function_call property is present in the returned object
    #---------------------------------------------------------------------------------

    if "function_call" in output and output.function_call is not None:
        # Get the criteria object
        params = json.loads(output.function_call.arguments)
        chosen_function = eval(output.function_call.name)
        user_criteria = chosen_function(**params)
        #print(f"Params: {params}")

        # Call Open AI sending users criteria
        response_two = openai.ChatCompletion.create(
            model="gpt-3.5-turbo-0613",
            messages=[
                {"role": "system", "content": system},
                {"role": "function", "name": output.function_call.name, "content": user_criteria}
                ],
            functions=function,
        )
        assistant = response_two['choices'][0]['message']['content']
        
        # Parse the user_criteria to get the overview and courses_data
        user_criteria_data = json.loads(user_criteria)
        top_courses = user_criteria_data.get('top_courses', [])
        courses_data = user_criteria_data.get('courses_array', [])
        
        # Create a summary string based on the top_courses data
        overview = "Here are the top 3 courses based on your preferences:\n"
        for i, course in enumerate(top_courses, 1):
            overview += f"{i}. {course['name']} by {course['tutors']}\n"
        
        # Structure the response
        response_data = {
            "overview": overview,
            "courses_data": courses_data
        }
        
        return jsonify(response_data)
    else:
        return jsonify({"response": response_one['choices'][0]['message']['content']})

Here is a small sample from my dataset:

id name slug product_code type active ultimate_collection only_show_in_course description meta_description image store_image alt_tag trailer preview_duration free_to_all unlisted noindex artists tutors tags genres skills techniques series
1408 (I Can’t Get No) Satisfaction Guitar Backing Track i-cant-get-no-satisfaction-guitar-backing-track NULL backing-track 1 NULL NULL This Rolling Stones easy guitar backing track lets you play that famous Keith Richards ‘Satisfaction‘ guitar riff with a full band. This Rolling Stones easy guitar backing track lets you play that famous Keith Richards ‘Satisfaction‘ guitar riff with a full band. (I Can’t Get No) Satisfaction NULL 90 NULL NULL NULL Rolling Stones NULL NULL Rock Pop Suitable For All Easy
7623 12 Bar Blues Progression 12-bar-blues-progression RDR0413 song-lesson 1 NULL 1 NULL NULL 12 Bar Blues Progression NULL 180 NULL NULL NULL Joe Bonamassa Danny Gill Eric Clapton Eric Johnson Joe Bonamassa Blues Rock
37152 ‘80s Pop F#m Guitar Backing Track 80s-pop-fm-guitar-backing-track RDR0003 backing-track 1 NULL NULL F# minor/Aeolian (A major) scales are a good choice for soloing over this dreamy guitar backing track, but more memorable melodies can be created with F#m pentatonic. F# minor/aeolian ballad guitar backing track LTP Your Own Rock Solos Minor Jamtrax NULL 90 NULL NULL NULL Stuart Bull Stuart Bull NULL Pop 80’s Rock NULL NULL
1 Like

Always happy to help, let’s get back to your questions:

Absolutely, your code has multiple areas that could benefit from optimization.

1. Configuration and Initialization

Move the OpenAI API key initialization and function definitions outside of the route function. They remain constant and don’t need to be re-initialized for each request.

2. File Operations

Cache the contents of systemPrompt.txt and products.xlsx to avoid repeated I/O operations on each request.

3. Pandas DataFrame Optimization

Optimizing Pandas DataFrame operations can significantly improve performance, especially when dealing with large datasets.

Original Code

You apply the count_matches function to each column separately and then sum them, which involves multiple passes over the DataFrame.

Optimized Code
  1. Vectorization: Use Pandas’ built-in methods that are optimized for vector operations.
  2. Single Pass: Aggregate all relevant text columns into a single column and apply the count_matches function once.
  3. Use nlargest: Get the top 3 rows based on match count without sorting the entire DataFrame.

Here’s a sample snippet I asked GPT to write:

def get_courses(skill_level, techniques, artists, tutor):
    # Use the already loaded df
    global df
    
    # Prepare the criteria
    keywords = skill_level.split() + techniques.split() + artists.split() + tutor.split()
    
    # Aggregate all relevant columns into a single string column
    df['all_text'] = df[['description', 'meta_description', 'artists', 'tutors', 'tags', 'genres', 'skills', 'techniques']].apply(lambda row: ' '.join(row.values.astype(str)), axis=1)
    
    def count_matches(string):
        return sum(1 for keyword in keywords if keyword.lower() in string.lower())
        
    # Apply count_matches once to the aggregated column
    df['match_count'] = df['all_text'].apply(count_matches)
    
    # Use nlargest to get the top 3
    top_courses = df.nlargest(3, 'match_count')
    # ... (rest of your code)

For specialized responses, consider fine-tuning an existing OpenAI model. The OpenAI Cookbook has a detailed guide on this, which you can find here:

Hope this helps!

4 Likes