"Feed" ChatGPT with Knowledge Base and Data from Business to answer questions

Hi everybody!

I am a product manager for a digitally supported real estate agency. Our aim is to provide information to ChatGPT through API and “feed” it with the following types of information:

  • Knowledge base (can be provided in any format like .txt, .pdf, etc.)
  • Classic data (information about our properties for example with address, can also be provided in various formats)
  • Previous conversations of the users (or their background info)

It is not a knowledge base at the moment but expected to grow. We might want to upload a lof of new data in the future to it. It therefore needs to be scalable.

I saw that there is a section in Playground called Storage which gives you the ability to upload Files and Vector Stores.

I also found the API documentation about these things and think we will be able to manage using it without any issue.

However, my questions are the following:

  1. What exactly ist he difference between Files and Vectore Stores?
  2. Which one of these both is better or do you need both anyway?
  3. If vector stores are needed. Is there any way to let that be done by a third party? Can you suggest any you have had experience with?
  4. What option (files or vector stores) are cheaper? Are the files we upload counted one time as token or what would you recommend?
  5. Is this the right way? Or should I go for this project totally differently as it might not be scalable or so?

If you know any good guide, tutorial or similar (for a beginner/amateur) that would be highly appreciated.

Thanks for your help!

Hi there,

The easies to get started is to:

  1. create a vector store and upload all your files to this.
  2. create an assistant and link your vector store to this
  3. use the playground in assistant to test how it is working

The vector store is basically a pool for all your files to make these accessible. Single files will not help you a lot.

Finally you can then attach a custom app to your assistant to make this accessible in your company.
You can also access APIs with the assistant but that requires some actual coding. Everything mentioned above can be done in the openai web.

Data uploaded as files is chunked and also stored as vector data. Multiple uploaded files are combined into a single vector store, which can be utilized with the Assistant API.

Using this Assistant API is a good way to feed a language model with a knowledge base.

However, since the Assistant API is currently in beta, its specifications may change in the future, and it might not be flexible in some aspects.

The Assistant API may be sufficient at first, but you may want to consider using different vector databases without using the Assistant API.

1 Like

Hiya, welcome.

“Storage” is rather just like traditional cloud storage. It’s where you can save your files with more ready access to the models that will be using them. I believe the first 100 gb are offered for free.

“Vector Stores” are built out of the files in Storage, added in an additional step. They automatically do some neat things with the data, like add custom meta fields, and embeddings. They persist as long as you want them to, and can be related to your company in general, or a specific thread. They enable the AI to understand your files better in context. Their costs are based on usage, gigabytes-per-hour.

These added steps will enable the AI to more accurately relate information between your structured and unstructured data. You will furthermore want to ensure that all of the data you feed your Assistant has been structured appropriately for both human and machine consumption.

A token is about four English characters. (That sentence was about 9 tokens long.)

When you have a model use a file as context, it basically scans them and summarizes them for itself. This is why there can be so many errors—hallucinations—inaccurate or made up data—reading a file that has not been previously optimized or related in a vector store. The more completely you want the model to understand the file, the more completely it must be optimized. This process only reduces errors, and must therefore be closely supervised by people who understand your business.

When read, the file information counts against the Context Window—128k tokens per thread, and something like 8k per response. You must be very targeted and efficient with your information in order to maintain accuracy.

All of this is still in beta and completely subject to change.

Yes, this is scalable. Yes, you should seek third party assistance. There are advanced data science and development needs happening in the background which it is best to have professional human advice.

I recommend LinkedIn Learning as your starting point for guides.

You’re on the right path with feeding your data into ChatGPT. Here’s a simpler breakdown:

Files vs. Vector Stores: Files are just your raw data, while vector stores organize that data into chunks and add helpful tags, making it easier for ChatGPT to search and understand.

****Which is better?: Vector stores are usually better if you want to grow and search your data easily. Just uploading files won’t give you the best results.

****Need help?: Yes, getting third-party help is a good idea, especially if you’re not familiar with setting up vector stores.

Cost & Tokens: Vector stores can be cheaper in the long run, but you’ll need to watch token usage to keep costs down.

Check out LinkedIn Learning for tutorials, and yes, this method is scalable!

Now, as much as I love the tech side of things, let me share my personal struggle with this. I run a website, Find Your Local Agent, and it’s all about making that perfect connection between people and real estate agents. Sounds simple, right? Wrong. Because finding an agent is not like shopping for shoes—where you pick the right size, check if it looks good in the mirror, and you’re done. No, it’s more like finding a teammate for life’s biggest game: the real estate game.

You need someone who’s not only on your team but gets you, knows your goals, and doesn’t just nod while secretly trying to steer you into buying that house with the pink flamingo wallpaper. That’s where I’m facing challenges on my website. I want to make sure when people search for an era agent, they’re not just seeing a list of names—they’re finding someone who understands what they’re really looking for.

And guess what? Organizing my website’s data like a pack of wild papers shoved in a drawer hasn’t exactly been smooth sailing. That’s why I need to feed ChatGPT the right data, so it can understand these subtle nuances. Vector stores? They’re the magic I need to make sure that when someone types in “I need an agent who knows how to handle a stubborn seller” or “find me an agent who understands fixer-uppers,” ChatGPT delivers the right person—just like the dream teammate we all deserve in this home-buying marathon.

Trust me, I’m in the trenches with you on this. The tech side will help, but ultimately, it’s about creating a relationship where your agent is on your side, understanding your vision, and making it happen.