How do I scrape and pass the content from a HTML page to an LLM without crossing the token limit? How to structure the payload
1 Like
What I do is first convert the pdf to an image. Maybe there is a quicker way but I use cloudconvert 1st to automate this, then feed the array into Vision.
We are looking at doing something similar. We were thinking about taking all of the pdfs, parsing, embedding and then using the embeddings to search and generate the doc. We have around 400 docs. Based on @con3ro11 's answer, I wonder if Iām making it too complicated.