Project ideas guidance - using GPT to answer question based on private data

james.jharrison · May 24, 2024, 12:59pm

Hey all,

Not sure if this threat is in the correct category but looking for a bit of guidance nonetheless.

As a .net engineer looking to learn a bit of Python, I’d like to begin work on a side project that incorporates Open AI GPT.

As a keen runner, I plan to pull in data from Strava API and store in some sort of DB of my own. I then would like to use this data as a private source to answer questions via GPT such as:

What is my fastest 5km time?
What is my average distance per run over the last year?
How many runs have been over 10km in the last year?

It’s a simple idea but my question is how would you suggest I use the data to source the answers to my example questions? Would I connect GPT directly to my database or would I read out of the DB every time a question is submitted, format the data, and pass that into open AI as a data source?

Thanks in advance

James

ewancameron · May 28, 2024, 5:19pm

Am a keen runner too with Strava API access…did you have any luck with this?

jeffinbournemouth · May 31, 2024, 10:15am

I would prob approach this by creating a prompt that creates Strava API requests from user queries and uses the API response to generate natural language answers.

To achieve this you can simply provide the API docs within your prompt and then add a few examples for how you want the user query to be converted into a valid API call format and then converted to natural language answer.

You can extract the API call from the prompt output and send to Strava API.

For the response handling you need to pass the API response back to the prompt which will then output the result as a natural language answer to the user query.

james.jharrison · June 4, 2024, 6:40pm

Okay great thankyou.

Im thinking i may extract my strava data in its enitrety to a document db so that i can display each activity in some sort of list. Then a prompt input below that woukd query the data stored in db.

Just not sure if i should use the full data set to prompt pad every time i ask AI a question, or whether i can set the DB as a data source so no need to prompt pad

vb · June 4, 2024, 6:45pm

Yes, you can look into functions/tools and see if the model can query your DB for the requested data.
Obviously you wouldn’t need a LLM for this task, it’s more of a data presenter, maybe?

jeffinbournemouth · June 4, 2024, 8:52pm

Would be innefficient and slow/expensive to include full db in prompt.

Much better to interact with db using API/SQL/functions/tools/actions - whatever is easiest to extract the info you need.

The LLM can format the correct requests each time if you provide it with an example and instructions. Then you will only need to extract the data you actually need each time.

james.jharrison · June 4, 2024, 9:08pm

Think im missing some understanding here then. As an example for me to be able to ask in an AI dialog, ‘what is my fastest 5k run time?’ Or ‘what is my average run distance over the past year?’, i would need to provide the full data source each time?

Not sure how id use these questions to generate meaningful sql for example which in turn would return data to use as a prompt.

jeffinbournemouth · June 4, 2024, 9:50pm

You can instruct the LLM to write the SQL or API call to query a data source.

The returned data can then be used by the LLM to create the output.

You can also do all of this with a Custom GPT

james.jharrison · June 5, 2024, 9:15am

Hey Jeff,

I understand that I could generate SQL using the LLM. This seems like a separate problem to solve though instead of just feeding the LLM data to create an output.

I think what you’re saying is, just so I’m clear, with the following question as an example:

“What is my longest run in the past year?”

Step1: Use LLM to generate SQL or query doc db such as "Select * runs where Date > DateTimeNow.AddYears(-1).

Step2: Use LLM to query the returned data with the question asked and create an output…

For now it seems Step 1 would be an enhancement but for a side project maybe not needed

jeffinbournemouth · June 5, 2024, 4:35pm

If its just a side project, you can simply build a custom GPT and create an action to query your data and the gpt will handle it all.

thinktank · June 7, 2024, 4:14pm

How you get AI to access your data to answer this question really depends on the scale of the project.

As a simple prompt, yes, you’d have to include the full data source every time. Not delightful.

@jeffinbournemouth is correct, if you just want to explore personal data and analysis over the last year, use a CustomGPT. Here’s a case study.

Download your data as spreadsheets, name them, and reference them in the cGPT’s Instructions and Knowledge Base with Code Interpreter enabled. The GPT will understand the full breadth of your information without having to prompt it.

If you have something more robust in mind than just a personal running coach, you’ll look into connecting the GPT / Assistant to your data via API. Then solutions scale up from there depending on just how much information you need the model to make decisions with.

wclayf · June 7, 2024, 5:12pm

See the create_tool_calling_executor in this project (of mine)

github.com

Clay-Ferguson/QuantaAgent/blob/master/agent/app_ai.py

"""Makes a query to AI API and writes the response to a file."""

import argparse
import os
from typing import Dict, List
from langchain.schema import HumanMessage, AIMessage, BaseMessage, SystemMessage
from langchain.chat_models.base import BaseChatModel
from langgraph.prebuilt import chat_agent_executor


from agent.app_config import AppConfig
from agent.models import TextBlock
from agent.utils import RefactorMode, Utils
from agent.tools.refactoring_tools import (
    UpdateBlockTool,
    CreateFileTool,
    UpdateFileTool,
    update_block,
    create_file,
    update_file,

This file has been truncated. show original

This is the best “tech stack” if you ask me, for doing what you want. Streamlit + Langchain + Langchain Agents. You would just try to define some “tools” that “wrap” the access to your data in ways that would allow an intelligent being to answer questions thru the API but without having “direct” access to the data. Of course one “tool” you could use could be “run SQL command” and let the AI know one of the tools takes an SQL string and returns the data.

EDIT: I need to create a generic template version of that project that has all the plumbing but nothing too specific, so people can use it as a starter project for doing Langchain stuff with Langchain Tools.

eric.zhou0815 · March 17, 2025, 5:23am

Right now you can just create a normal REST API with your private data, then use the tool Interlify to configure your API as function tools that ChatGPT can call. Then you get a chat bot linking directly to your own serivce and your own data.

Topic		Replies	Views
Turning chatgpt API into a assistant for a (complex) website API	20	4127	December 21, 2023
Seeking Guidance on Building a ChatGPT-Style Data Analyst Tool with Database Integration Plugins / Actions builders gpt-4 , chatgpt , api , openai	11	4158	September 23, 2024
Writing a ChatBot (not just for Q&A) is hard! 2 months in and still unsuccessful :/ Prompting gpt-4 , chat-completion	8	3590	January 27, 2025
How do I create a chatbot like this [travel agent] and is it complicated to do? API chatgpt	4	2930	March 13, 2024
Giving GPT4 a contextual database alongside prompt/answer examples Prompting	1	1517	April 15, 2023

Project ideas guidance - using GPT to answer question based on private data

Related topics