I have a website with over 2,000 documents that I want to make searchable. I’ve tried a couple of techniques to achieve this, but I ran into some issues.
The first approach I tried was to convert the documents into text files and send along with the search query, to an completion API . However,
I found that this method failed when I had documents with more than 3,000 to 4,000 words, and I received a maximum token exception.
The second approach I tried was to convert one document into a JSONL file and upload it using a file endpoint. This method worked well,
but I don’t want to use it because it involves fine-tuning a model, and I have a large amount of data that I want to search through.
Is there any procedure to achieve that?
I am using .Net Language but i can convert it if any body knows it will be helpful.
Only to search, you can use solr. It can process fast all kind of documents (pdf, doc, txt, html, etc.)
You can use chatGPT API to convert spoken language search queries to solr queries. ChatGPT can act as an interpreter ( human->solr )