I’m using text-embedding-ada-002 as the embedding model and using it to create chunks of Hebrew text with the below parameters and storing them in Pinecone.
Chunk overlap: 20
Chunk size: 512
However when I try to query it, the response which I get from Pinecone and the output generated by the LLM is not as expected. How can I improve this situation?
Thanks for the suggestions. I had tried text-embedding-small but I had issues while retrieving correct data when compared with text-embedding-ada-002. I’ll try to change the model and see whether it helps.
I’m using RecursiveCharacterTextSplitter from langchain/text_splitter JS package. I was not able to find anything related to semantic chunking in langchain JS package. Are you aware of any other alternatives?
Whether you use Langchain or not, saving the input and output results to the LLM as logs can help identify where failures occur.
Tools like W&B might be useful for this.