Scrapping website and feeding to openai to make a chatbot

virgoprakhar · February 21, 2024, 12:14am

Hi Folks

I am trying to scrape my internal confluence pages (which can be 100 of those from root URL) and feed them into opani to build a chatbot,this chatbot will help users to find out relevant pages they need and should answer their queries and also provide hyperlinks to possible pages which have those answers
I have been trying to do this with beautiful soup but since there are so many pages ,unable to limit the token limit
Please let me know approach I can adopt to build this library
a) scrapes through any number of URLs provided and their subpages
b) maintains the token limit

anon22939549 · February 21, 2024, 12:42am

You’ll want to look into text embeddings^[1] for doing dynamic retrieval of context.

Text Embeddings API Reference ↩︎

virgoprakhar · February 21, 2024, 2:00am

Thanks,can you help to elaborate more on what can be the architectural flow here

will it be:
scrape all web pages—create embeddings —store them in csv etc—use query+embeddings to answer …in this case how do we limit to embeddings data storage and also send user query+relevant embeddings alone to lower the token usage

anon22939549 · February 21, 2024, 2:13am

If they’re your pages, why are you scraping them?

You should have the raw text already.

virgoprakhar · February 21, 2024, 3:26am

I dont have raw text ,those are my teams pages like common production issues page,architecture page of our aws flow erc and many more
We want to make a chatbot which can answer any user query like i have got this error in prod today,whats the possible resolution
Ideally for above it should automatically refer prod support common issues page and get the answer based on existing data
Above is very small use case,queries can be wider as well which span through mutiple pages

Topic		Replies	Views
Create an IA which will crawl the pages and talk about it Community chatgpt	9	2384	January 29, 2024
How to deal with unstructured data scraping for a website using AI? API vector-db	1	3050	July 17, 2024
Custom ChatBot for my startup API chatgpt , chat-completion , custom-gpt	6	4131	December 16, 2023
Web scrapping and Chatgpt api API chatgpt , api	0	699	January 7, 2024
How is ChatGPT able to extract webpages so quickly? API chatgpt	3	2454	July 9, 2024

Scrapping website and feeding to openai to make a chatbot

Related topics