Hello, first I will be vague because I do not want to publicly expose my real purpose.
I would like to retrieve a specific set of characteristics for different objects. These characteristics may derive either from the training data of the LLM or from the object manufacturer’s manual. Sometimes I have the latter, and sometimes I do not. I have built a local database using the assistant API, asking it to extract the desired characteristics from the manufacturer’s manual. Thus, I have a local database containing sets of characteristics for certain objects.
I want to be able to query an LLM regarding these characteristics, with the information coming either from the LLM’s data or my local database. Additionally, I would like the model to indicate whether the information is sourced from the database or its own knowledge. Furthermore, the model should be capable of providing a reliability score for the information. For instance, if multiple sources are consistent, it should give a high reliability score; conversely, it should assign a low score if there are discrepancies.
I have managed to pass a first step with the creation of my local db but I do not have any id on how I will do the rest.
I intended putting my local db exported as json inside a vector store and pass this store to my assistants but is there the right way ?
When passing a file id, does the assistant use exclusively informations from the file or does it also get from its training data ?
Is the LLM able to provide the source of its informations ?
Thanks for your help
2 Likes
Hi @michel4
Here’s how I would approach this:
Start with a combination of application logic and LLM capabilities. The idea is to query your local database first and, if something is found, mark that data as coming from the database. If the database doesn’t have the answer, then you query the LLM, asking it to provide the characteristics based on its own general knowledge. In this case, you mark the response as “from LLM.”
The key is to handle this internally in your application workflow rather than returning everything directly to the user. You can later use these marked results however you need, knowing which source each piece of information came from.
Regarding your question about storing the data — instead of using a VectorStore, I would recommend using a native vector database, like Weaviate or similar. VectorStores are fine for smaller tasks, but if your knowledge base grows, a native vector database would be much more efficient, scalable, and cost-effective.
Lastly, I’d suggest implementing a simple reliability scoring mechanism. For example, if multiple sources provide the same characteristic, you can increase the confidence score. If there are discrepancies, reduce it. This would give you a practical way to assess the accuracy of the information without overcomplicating the process.
Hope this helps!
Just a side note: in 2025 an idea in itself does not really worth as much as some ten years ago (if any at all). It’s the most effective way to implement that idea that is worth gold. But then, I’m sure there are exceptions to this rule.