Using my own knowledge base with GPT-4

I am building a Chatbox for IT at my college. It will be a live chatbox where clients can ask questions. We have a college knowledge base, and I was wondering how I could integrate that with the GPT-4 engine?

Welcome to the OpenAI Community.

When it comes to answering questions from large knowledge bases. Embeddings + Completions (gpt-4) is a possible solution.

1 Like

I am looking for something similar for my organization. We are trying to figure out how to take our organizations constant updates and communications and incorporate them into a chatbox anyone could talk with. The main thing we would be looking to build though would be an easy website for our communications team where they consistently and easily upload content that constantly adds to the models knowledge base.

Same here. From what I understand, though, it would cost a lot of money to have an AI engineering set up the custom integration then you got to pay a whole bunch of people to rate the feedback until the model is accurate. I’m not technical, so maybe I’m completely wrong. But my understanding is that we are a fair bit off from being able to create bespoke chat agents with our own data using ChatGPT.

Can anyone add some insight?

The approach you might take will depend on the shape of the knowledge. A Q&A type knowledge base is a lot like an FAQ and I managed to create one using a few different approaches. I’m currently testing both to see which approach is best.

I used embeddings as mentioned by @sps and completions. This post describes how I built it.

This is possible and likely achieved with embeddings and completions as well. One platform you might want to examine is CustomGPT. It makes it very easy to upload documents of any type to create a chatbot that can be embedded in a website or used through it’s own API to create a seamless chat UI like I did here.

Untrue. There are many ways to create conversational experiences using OpenAI models without retraining a separate model.

Untrue again. :wink: I’ve used two pathways to do this for many projects. You can create and deploy a chatbot based on your own data within a few minutes. Give CustomGPT a trial run. It supports 90+ document formats and it provides a surprisingly straightforward approach to creating AI solutions.

I have also succeeded in using everyday Google sheets, slides, and documents to create custom interfaces that are integrated into these document types. This article explains one of the approaches I used.

Here’s a screenshot of an example chatbot built into a Google Document. It uses embeddings and GPT-3 in Google Apps Script to create a very simple and low cost approach to using LLMs for any content.

It’s also possible to do this across Google Drive documents, or even mix database information with drive files to create a unified chat system.


Using your own knowledge base in a web app is not so straightforward.
The chat part is easy, but you also need to have knowledge management using third party services.

And GPT-4 used in a web app is expensive compared with the free/paid ChatGPT.
But GPT-3.5 is decent too for this purpose.

What I find as an obstacle is the time consuming process of learning the APIs.

Probably there are tutorials on how to do this.

1 Like

Thanks @bill.french. Given me some direction on where to look to next. Appreciate it.

1 Like

I agree. This part is best described as everyday software engineering. :wink:

This depends on your skills. The APIs (for many) are not challenging. Engineering a process that successfully wraps AI in the process to achieve a reliable and financially practical outcome is generally the bigger challenge.

Embeddings and cosine similarity are not as complex as they sound, and you can even learn how to build code that does this in Google Apps Script, a script runner that comes free inside every Google spreadsheet container.

This is why embeddings are so important. They serve to give you very powerful inferencing at 1/600th the cost of GPT-3 inferencing. And to be clear, most AI applications do not need GPT-4 or any chat-like behaviors. Far simpler models can provide very powerful solutions at near-free inferencing costs if you put the time into designing the solution’s approach.

1 Like

ElasticSearch is a wonderful way to start. It has it’s own built-in scraper and UI to organize the documents. GPTIndex is also a powerful tool.

I’ve been tinkering with a synergy between graph and vector databases for a week or so now. Using the vector database results as pointers. Still know nothing so if anyone else has more experience, I’d love to hear about it.


Great topic! Thank you for sharing!