Hi, I am making an bot that reads titles of posts with their urls in a web page with web scrapping.
When the user ask for information about something I need to verify if there is a related text to the question and return the url, but I don’t know how to stablish it.
The web page contais houndreds of urls and can change in every moment, I make a request to get the urls and save them in a .csv file, I can’t create an embedding in each call by each one cause it will be cost and slow.
I tried to put the info in table format and send it to chatgpt and after it send the message, but it get confused.
def send_initial_context(self):
"""
This method send the texts and urls to beggining the context of the conversation
"""
# reading csv
path = f'{settings.PATH_CSV_FILE}urls.csv'
df = pd.read_csv(path)
# adding text
string = """
########## This is a table about posts ##########
|Index|Texts|Urls|
|---|---|---|\n
"""
openai_repository = OpenAIRepository()
for index, content in df.iterrows():
new_string = f"|{index}|{content['Texts']}|{content['Urls']}|\n"
if OpenAISingleton.is_string_below_limit_token(string+new_string):
string+=f'{new_string}\n'
else:
openai_repository.post_user_message({
'role':'user',
'content':string
})
string = new_string
openai_repository.post_user_message({
'role':'user',
'content':string
})
def send_message(self, array_chat_json):
self.send_initial_context()
text = Template(user_request_template).substitute(frase=array_chat_json['content'].split('.')[0])
openai_repository = OpenAIRepository()
array_chat_json = {
'role':'user',
'content':text
}
openai_repository.post_user_message(array_chat_json)
msg = openai_repository.post_retrieve_message()
What can I do to get the title when a user ask something and return the url?.