I am really trying to get OpenAI into my business to serve as an assistant that gives my employees information about the company. I have uploaded vectors to Pinecone and I can not figure out how to get my bot to communicate with it. I am using Google Cloud functions. I want to try and get my bot integrated into google chat as chat bot. I have tried to use RAG to help with this but I can’t get it to work. My hope is that It can also use real time data to respond with current events to keep in flexible in how it is used. I am a complete novice to all of this so I have minimal knowledge. I have used a youtube guides and then chatgpt to help me figure this out and nothing has worked. Pleas help. My logs do not provide me much information at all. I would love a detailed guide to get this accomplished or some direction. Thank you!!
Hi Zack,
While I can’t provide specific advice without seeing your code / backend setup, here are some tips and best practices to get you started:
Understanding the RAG Process
Retrieval-Augmented Generation (RAG) is a process that combines information retrieval and language generation. This process typically involves:
Retrieval: When a user asks a question, the system searches a knowledge base (e.g., a database, documents, or other forms of information) to retrieve relevant information. This step is important because it provides context and factual information that the language model can leverage to provide more complete and informed answers.
Generation: The retrieved information is then passed to a language model (e.g., GPT-4) to generate a response that is both cohesive and natural, incorporating both the user query and the information retrieved from the knowledge base. This ensures that answers are more factually accurate and contextually appropriate than if the language model were used alone. By combining these two stages, RAG systems can provide more accurate and contextually aware responses, making them especially valuable for tasks that require information retrieval and processing.
Very basic workflow
1- Define a Retrieval Function
Using the Pinoca API, define a retrieval function. At a minimum, your retrieval function should take a single argument, but you can make it more flexible by adding optional arguments as needed. For instance:
retrieval_function(text='What are the employee benefits at our company?', **kwargs)
In its simplest form, this function can return plain text that summarizes the retrieved information. You might also include metadata such as similarity scores, source information, or the ranking of the retrieved text to help you assess the quality of the results.
2 - Submit the Retrieved Text to the OpenAI API
After you retrieve text, pass it along to the OpenAI API to generate a response. Here’s a helpful guide that describes how to do this:
3- Deliver Results to the User
After you process the input through the language model, share the results with the user. Make sure the output is clear and aligns with the user’s original question.
Additional Considerations
Care for the development process. As I’ve learned from building RAG applications, every decision you make during development has the potential to impact the end result.
There are many RAG methods and models available, but starting with a basic implementation will provide you with a foundation to build upon as you gain more experience and knowledge.
Explore the Cookbook form OpenAI and LangChain. You will learn more
Useful from the hidden post ( my bad, just went out naturally)
ohh the foolish people is talking about 9 to 5
This is probably really messy code but again, I have limited knowledge about what I am doing. Thank you for such a detailed response.
main.py file
import openai
import pinecone
import logging
from flask import Flask, request, jsonify
from auth_util import is_request_valid # Import auth utility
# Initialize Flask app
app = Flask(__name__)
# Hard-coded API keys and configuration
OPENAI_API_KEY = ""
PINECONE_API_KEY = ""
PINECONE_ENVIRONMENT = ""
PROJECT_NUMBER = ""
NAMESPACE = ""
# Configure OpenAI API key
openai.api_key = OPENAI_API_KEY
# Initialize Pinecone
pinecone.init(api_key=PINECONE_API_KEY, environment=PINECONE_ENVIRONMENT)
index = pinecone.Index("chateau-assistant") # Ensure this matches your actual index name
# Configure logging for troubleshooting
logging.basicConfig(level=logging.INFO)
# Define the route for handling Google Chat requests
@app.route('/', methods=['POST'])
def handle_chat():
"""Handles incoming messages from Google Chat."""
if not is_request_valid(request):
logging.warning("Unauthorized request.")
return jsonify({"text": "Unauthorized request."}), 403
event_data = request.get_json()
user_message = event_data['message']['argumentText'].strip()
response = process_rag_query(user_message)
return jsonify({"text": response})
def process_rag_query(query):
"""Retrieve relevant context from Pinecone and generate an answer."""
try:
# Generate dense embedding from OpenAI
query_embedding = openai.Embedding.create(
input=[query],
engine="text-embedding-ada-002"
)['data'][0]['embedding']
# Query Pinecone
results = index.query(
vector=query_embedding,
namespace=NAMESPACE,
top_k=5,
include_metadata=True
)
# Extract relevant context texts for response augmentation
contexts = [match['metadata'].get('text', '') for match in results['matches']]
augmented_query = "\n\n".join(contexts) + "\n\n-----\n\n" + query
# Generate response from OpenAI API
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are an AI that answers questions based on retrieved information."},
{"role": "user", "content": augmented_query}
]
)
return response['choices'][0]['message']['content']
except Exception as e:
logging.error(f"Error processing RAG query: {e}")
return "There was an error processing your request. Please try again later."
# Entry point for Google Cloud Functions
def function(request):
with app.app_context():
return handle_chat()
# Debugging and entry for local testing
if __name__ == "__main__":
app.run(host="0.0.0.0", port=int(os.environ.get("PORT", 8080)))
Auth_utl.py file
import logging
from oauth2client import client
# Constants for Google Chat verification
CHAT_ISSUER = 'chat@system.gserviceaccount.com'
PUBLIC_CERT_URL_PREFIX = 'https://www.googleapis.com/servic'
AUDIENCE = '' #
def is_request_valid(request):
"""Verify the validity of a bearer token received by an app, using OAuth2Client."""
try:
auth_header = request.headers.get('Authorization')
if not auth_header:
logging.warning('Authorization header is missing')
return False
token = auth_header.split(' ')[1]
token = client.verify_id_token(token, AUDIENCE, cert_uri=PUBLIC_CERT_URL_PREFIX + CHAT_ISSUER)
logging.info("Verified token: %s" % token)
return token['iss'] == CHAT_ISSUER
except Exception as e:
logging.error(f"Token verification failed: {e}")
return False
datastore_util.py file
import logging
from google.cloud import ndb
from models import Thread # Import the Thread model
# Initialize Datastore Client
datastore_client = ndb.Client()
def store_messages(thread_id, messages=[]):
"""Stores a list of messages for the specified thread_id.
Args:
thread_id (str): Unique identifier for the thread.
messages (list): List of messages to store.
Uses get_or_insert() to ensure only one Thread entity exists per thread_id.
"""
if not thread_id:
logging.warning("No thread_id provided; cannot store messages.")
return
with datastore_client.context():
try:
thread = Thread.get_or_insert(thread_id) # Ensures a single Thread entity per thread_id
if 'messages' in thread.message_history:
thread.message_history["messages"].extend(messages) # Append messages to existing history
else:
thread.message_history = {"messages": messages} # Initialize message history if empty
thread.put()
logging.info(f"Messages stored successfully for thread ID: {thread_id}")
except Exception as e:
logging.error(f"Error storing messages for thread ID {thread_id}: {e}")
def get_thread(thread_id):
"""Retrieves the Thread object for a given thread_id.
Args:
thread_id (str): Unique identifier for the thread.
Returns:
Thread: Thread object if it exists; None otherwise.
"""
if not thread_id:
logging.warning("No thread_id provided; cannot retrieve thread.")
return None
with datastore_client.context():
try:
thread_obj = Thread.get_by_id(thread_id)
if thread_obj:
logging.info(f"Thread retrieved successfully for thread ID: {thread_id}")
else:
logging.info(f"No thread found for thread ID: {thread_id}")
return thread_obj
except Exception as e:
logging.error(f"Error retrieving thread for thread ID {thread_id}: {e}")
return None
model.py file
from google.cloud import ndb
class Thread(ndb.Model):
message_history = ndb.JsonProperty()
timestamp = ndb.DateTimeProperty(auto_now_add=True)
def get_messages(self):
return self.message_history['messages']
Here are some of my observations:
1 - You are not storing the conversation history accurately, treating each conversation as a separate entity. This might prevent the chatbot from learning from previous conversations and extending the conversation. For example, when the user says “Find me the document related to HR benefits,” the chatbot will retrieve something. That’s nice! If the user in the second conversation says “Find more,” the chatbot will not be able to understand that the user is asking for more information about the document related to HR benefits because it does not have the context of the previous conversation.
To solve this issue, get the responses from the model to expand the history of conversation between the user and the chatbot. As you will have multiple users, you must also hold a conversation ID for each user so that you can keep track of the conversation history of each user.
2 - You are not using the “function call” option of the latest release. Function calling is a powerful tool that allows your app to be more flexible and also more reliable, preventing unnecessary costs. To clarify, let’s say I ask the chatbot, “Hi, how are you?” For this question, the chatbot does not need to retrieve any data from the database; it can simply return, “I am fine, thank you. How can I help you?” There are some questions that need to be retrieved, and some questions can be answered without any database query. So, if you implement function calling, you can save some costs. If you do not implement it, you will be charged for each question asked by the user as you will make a database query and also summarize this with LLM.
Here is the link to the function calling documentation: https://platform.openai.com/docs/guides/function-calling
3 - Use much better and faster models such as gpt-4o or gpt-4 mini.
4 - You can also use better embedding models: text-embedding-3-small is a good choice.
Also, check langchain documentation; they have a nice flow for building a chatbot. Here is a nice document: Build a Retrieval Augmented Generation (RAG) App | LangChain
Thank you again for the direction. I think I set up the call functions correctly. Additionally, I have been successful with getting my text and pdf documents all converted and uploaded to my vector database.
I seem to be struggling with getting connected to pinecone through Google cloud. Basically all I’m trying to do is have it reference my index on pinecone when a user on Google chat asks it a question. Utilizing that for context and producing a response with gpt 4. I will keep working on it and see how it works. I’m setting it up on visual studio and trying to debug the problems that way. Seeing if I can simulate Google chat and have it handle it on my local machine. Then try and make it serverless