I’m currently working on a Langchain app that reads multiple pdfs and xlsx files. Currently I’m facing an issue where after only a couple of mb I hit my token limit. If I want to feed the llm with, let’s say 10-15gb of data, would embedding be the right approach for this? Are there any good alterantives?
Hi and welcome to the Developer Forum!
You might want to look at rate limiting your requests so that you stay within your current limits, Langchain will add on additional tokens for it’s internal prompts, so that may take some effort to work out, if you have a large requirement for data processing then embedding can be of use but it depends on how you are subsequently using that data.
Thank you for the quick response. But if I’m already hitting a token limit with small datasets I do wonder whether the embedding approach makes sense in general. Wouldn’t splitting the requests significantly increase the response time and potentially also lower the quality of data returned? What about fine-tuning for larger datasets or running something locally (a huggingface llm for example)?
Hey champ and welcome to the community forum!
10-15gb ia a lot of text, that definitely won’t fit inside the context window of any of OpenAI’s models. I think this sounds like an application for retrieval augmented generation (RAG), you’ll find more information in this paper if you want to know more