Web scrapping and Chatgpt api

Hi, I am making an bot that reads titles of posts with their urls in a web page with web scrapping.

When the user ask for information about something I need to verify if there is a related text to the question and return the url, but I don’t know how to stablish it.

The web page contais houndreds of urls and can change in every moment, I make a request to get the urls and save them in a .csv file, I can’t create an embedding in each call by each one cause it will be cost and slow.

I tried to put the info in table format and send it to chatgpt and after it send the message, but it get confused.

def send_initial_context(self):
        """
        This method send the texts and urls to beggining the context of the conversation
        """

        # reading csv
        path = f'{settings.PATH_CSV_FILE}urls.csv'
        df = pd.read_csv(path)

        # adding text
        string = """
        ########## This is a table about posts ########## 
        
        |Index|Texts|Urls|
        |---|---|---|\n
        """

        openai_repository = OpenAIRepository()

        for index, content in df.iterrows():
            
            new_string = f"|{index}|{content['Texts']}|{content['Urls']}|\n"
            
            if OpenAISingleton.is_string_below_limit_token(string+new_string):
                string+=f'{new_string}\n'
            else:
                openai_repository.post_user_message({
                    'role':'user',
                    'content':string
                })
                string = new_string
        
        openai_repository.post_user_message({
            'role':'user',
            'content':string
        })

def send_message(self, array_chat_json):

        self.send_initial_context()

        text = Template(user_request_template).substitute(frase=array_chat_json['content'].split('.')[0])

        openai_repository = OpenAIRepository()

        array_chat_json = {
            'role':'user',
            'content':text
        }

        openai_repository.post_user_message(array_chat_json)
        msg = openai_repository.post_retrieve_message()

What can I do to get the title when a user ask something and return the url?.

1 Like