I’m using langchains and the OpenAI API to create a virtual assistant that responds to a PDF, which I converted to Markdown to improve the AI’s reading. I split it into 512-chunk documents using langchain’s RecursiveCharacterTextSplitter. It’s a small text with no spaces, approximately 400 lines. And even so, text-embedding-ada-002-v2 has a very high consumption, approximately 12,000 Tokens per request. I’ve been trying to lower this consumption for some time, but I can’t find much information about it. It’s my first time working with AI and I believe that the consumption shouldn’t be so excessive.
I am tokenizing the text with GPT2Tokenizer and splitting it with RecursiveCharacterTextSplitter
tokenizer = GPT2TokenizerFast.from_pretrained(“gpt2”, clean_up_tokenization_spaces=True)
def tokens(text: str) → int:
return len(tokenizer.encode(text))
splitter = RecursiveCharacterTextSplitter(
chunk_size = 512,
chunk_overlap = 24,
length_function = tokens,
)
Then I create the vectors using OpenAIEmbeddings and FAISS to locate them more easily, and then search for response similarity.
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(chunks, embeddings)
Is it possible to reduce this expense?