QueryGPT - NodeJS QnA chatbot trained on local file using embedding and completion

QueryGPT is a NodeJS application that learns from your local text files. No need to convert to questions, this works fine with raw text. It answers to any questions whose answer can be deduced from the given context. Give it your business product details, books, scientific paper and it will do wonders.

I used the embedding and completion endpoint to do so. Complete code and documentation can be found in QueryGPT | Github. I have looked for NodeJS implementation for this but couldn’t find one, so wrote this.

A chatbot trained on my University data is running on Telegram DU_GPT.

Let me know what you think about this.


Great Bot! Even with very less coding experience, it’s quite easy to follow your documentation. Do you think this system works with very large texts? I am planning to use this as an alternative for customgpt, so basically planning to put an entire sitemap (of course converted into text) in there. But I guess the “ranking system” where he goes through every paragraph would be a bit hard with that amount of data. What do you think? Also, is it possible to tweak it to tell the data location?

Actually I plan something very similar as you Telegram bot with my own university. It should reply students questions about study planning, like where to register, how to apply for courses etc. The explanations for all this is publicly available on their website and as they have a sitemap it should be easy to crawl all data. Of course, it would be sufficient if the data is only crawled one time and not always up to date (I would scrape, transform it into text and feed it into the OpenAI embedding once). As it’s only a hobby project, I don’t like the idea of paying $50+ for customgpt but rather only pay on demand based on my openai credits. I have little coding experience but I’m eager to read into it. Do you think such an implementation is possible in an easy way?

1 Like

Hi, yes the above mentioned requirements can be met. The demo I gave is just for educational demonstration purpose. But in a production environment, I would use a vector database like elasticsearch to store my embeddings and use the built in cosine similarity to score the paragraphs and pull the most relevant ones out.

I am actually professionally working on these projects now on online market places. And for my latest project we had a lot of data coming in, so we distributed the data into multiple elasticsearch indexes depending on the data context. And only search in that index for a particular topic. So yeah what you said can be implemented.

We also have a UI and used telegraf.js to implement the telegram bot. This is the easiest and efficient implementation I think. I hope the best for you in implementating this, or if you need professional help with this project, you can reach out to me.


1 Like

Hello Talha,
very nice project that you are working on. I am working on a similar. I want to train gpt on Emails of a customer support. So if a similar question was asked in customer emails in the past, I can ask a bot who will find the answer and give me a quick response. I think the solution for that would be embeddings like presented in your project. However I want to go further and train the bot on a limited amount of company data that is then used to make the bot also reply to emails (every time the same way). Would you suggest to combine in this project embeddings and fine tuning (first task embeddings, second one fine tuning)?
Thank you