Thank you for pointing that out! You’ve highlighted a crucial gap in my explanation – at present, our “project” primarily consists of the custom GPT, and we’re in the early stages of figuring out how best to structure and host the supporting architecture.
The idea of incorporating a graph database is intriguing and might be a direction worth exploring. I’m open to any suggestions or insights on setting up a simple, effective stack for this purpose.
Some team members have proposed using a data warehouse, which, while comprehensive, seems like it might be more complex and resource-intensive than what we’re ready to commit to. This approach would require establishing a full database and developing API endpoints for data interaction, which has been considered a significant undertaking by my team.
If you, or anyone else here, could share resources or recommend services that simplify the setup of such systems – making them more accessible for teams with limited experience in this area – I’d be extremely grateful. Any advice on starting points, particularly those that are beginner-friendly, would be immensely valuable to us.
For the record, I do have different project that uses Python on an an Azure function app, using the OpenAI API “Assistants.” But again, I haven’t figured out a good way to host and reference data to supplement my Assistant in that project either.
Very intersting. 
I’m new to Drupal, but it sounds like it offers robust API endpoints that could potentially simplify the way we pass data to our custom GPT. Could you share a bit more about how you’re using Drupal in this context? Specifically:
- Hosting: Do you host your Drupal setup locally, or do you use a cloud service? I’m exploring options that would allow for either direct data storage or a way to efficiently manage data queries and pagination with an external service, minimizing the load on our custom GPT.
- Handling Data and Pagination: How does Drupal’s REST API manage the pagination and data querying issues? I’m particularly interested in solutions that could offload these tasks from the GPT, allowing it to focus on processing the data rather than managing it.
Thank you for the suggestion! Unfortunately, our situation is a bit challenging due to the limitations of the third-party service we’re using. Here’s a bit more detail on what we’re dealing with:
Our third-party service provides an API for invoice retrieval but lacks the capability for more granular searches. Ideally, I’d like to query invoices based on specific tags like this:
GET https://someservice.com/invoices?=tag=search_tag
However, the API only allows for a broad retrieval of all invoices, without the option to filter by tags or other criteria directly in the query:
GET https://someservice.com/invoices
Given this, we’re required to download the entire dataset of over 5,000 invoices and then manually filter them to find the ones we’re interested in. This process is not only inefficient but also puts a significant strain on our custom GPT, which is not ideal for handling such a large volume of data directly.
I’m looking for a workaround that might help us manage this data more effectively, reducing the load on our GPT. Whether it’s through direct database connection, intermediate processing, or any other strategy, I’d really appreciate any insights or recommendations you might have.
TL,DR:
Thanks, @Macha and @SomebodySysop, for your insights. Exploring simpler solutions for our GPT’s architecture, like graph databases or Drupal’s REST API, sounds promising. We’re challenged by a third-party service’s limited API, which complicates data management for over 5,000 invoices. Looking for efficient ways to host, manage, and filter data to ease the load on our GPT. Open to any suggestions or tools that could help streamline this process.