Building a Cost-Effective Support Bot with Multiple GPT Models - Your Thoughts?

Hey everyone!

I’m working on a support agent bot to handle customer queries, and I’m trying to strike the right balance between cost-effectiveness and response quality.

Models: GPT-3.5-turbo, GPT-4, and GPT-4o, are the ones I’m thinking of using in a tiered approach:

  • GPT-3.5-turbo: For simple FAQ lookups
  • GPT-4: For searching an external knowledge base and potentially generating initial responses
  • GPT-4o: For analyzing past interactions and refining responses (or directly generating them for complex queries)

I have a FAQ JSON file, an external knowledge base, and a dataset of past support interactions to draw from.

My goal is to minimize costs by using the less powerful models when possible and reserving the heavy-duty GPT-4o for complex cases. But I also want to ensure high-quality responses by leveraging all available information sources.

Here’s a simplified version of the code I’m working with:

Please note that this is to illustrate the concept. I don’t want to refine a code if the idea is not good enough to be invested time on.

import json
import random
import requests  # Forgot to import this before, oops!

# Load FAQ data frm json file
with open("faq.json", "r") as f:
    faq_data = json.load(f)

# Ext knowledge base URL (replace with your actual URL)
kb_url = "" 

# Path to past interactions CSV (replace with your actual path)
past_interactions_csv = "support_interactions.csv"

# Quick and dirty complexity check - could use some improvement later
def get_query_complexity(user_query):
    if len(user_query) < 10:  # Short question, prob simple
        return "low"
    elif len(user_query) < 30:  # Medium length, might be trikier
        return "medium"
    else:                       # Long question, likely complex
        return "high"

# Look up user question in FAQ
def search_faq(user_query):
    for q, a in faq_data.items():
        if q.lower() == user_query.lower():  # Case-insensitive match
            return a
    return None  # Not found in FAQ

# Search our knowlege base for relevant info (needs real API call later)
def search_kb(user_query, model="gpt-4"): 
    # TODO: Replace with actual KB API call using 'requests' or similar
    # For now, just return some dummy results
    return ["Account Troubleshooting", "Billing FAQs"]  

# Analyze past interactions (will need NLP magic here eventually)
def analyze_past_interactions(user_query, model="gpt-4o"):
    # TODO: Implement analysis of CSV data using NLP techniques
    return "Hmm, found some similar issues from the past..."

# Generate response using specified GPT model
def generate_response(model, info, past_interactions=None):
    # TODO: Replace with REAL API calls to OpenAI GPT models
    response = f"Response generated by {model} using this info: {info}"
    if past_interactions:
        response += f" and considering: {past_interactions}"
    return response

# Refine response with past interactions (GPT-4o's special touch)
def refine_response(model, initial_response, past_interactions):
    # TODO: Make this a smarter refinement using GPT-4o's capabilities
    return f"{initial_response} (Further refined by {model})"

# Main function to get the support response
def get_support_response(user_query):
    faq_answer = search_faq(user_query)
    if faq_answer:  # If found in FAQ, that's our answer!
        return generate_response("gpt-3.5-turbo", faq_answer)

    kb_results = search_kb(user_query, "gpt-4")  
    past_interactions = analyze_past_interactions(user_query, "gpt-4o")
    # If found in KB...
    if kb_results:
        complexity = get_query_complexity(user_query)
        if complexity == "high":
            return generate_response("gpt-4o", kb_results, past_interactions)
        else:  # Less complex, so GPT-4 can handle it initially
            initial_response = generate_response("gpt-4", kb_results)
            return refine_response("gpt-4o", initial_response, past_interactions)

    # If nothing found...
    return "Sorry, I couldn't find anything helpful. Please contact support directly."

What do you think of this approach? Is it a good way to balance cost and quality? Do you have any suggestions for improvements or alternative strategies? I’m open to all feedback!

Thanks in advance!

I would suggest working towards an MVP before trying to pre-maturely optimize it.

You may also find that using embeddings is a better solution for understanding the complexity behind a question. An arbitrary cut-off won’t cut it. A follow-up question could be 4 words but deeply involve the whole conversation.


Agree with this. Test different models before you decide on complex vs simple.

Also, the newer models sometimes are cheaper.

1 Like