Training a GPT model to answer questions on an e-commerce dataset

hassanashaskhan · July 11, 2024, 1:06am

Hi everyone,

I’m new to training custom GPT models and would appreciate some guidance on the best approach for my project.

Problem: I’m working with e-commerce data and planning a data pipeline to ingest relational database data into BigQuery. My intention is to denormalize all tables into a single table that consolidates all the data. This pipeline will be scheduled daily to fetch incremental updates and load them into BigQuery.

Goal: I want to build an interface where business owners can ask questions in natural language, and a GPT model will use the BigQuery dataset to provide answers. Questions might include “What were the total sales of ‘hairclips’ yesterday?” or “Show me the trend in total sales value for the last 12 months.”

Proposed Approach: Currently, I’m considering training the GPT model to generate SQL queries based on the questions it receives. For example, if asked a question, the model would generate a SQL query tailored to the denormalized table in BigQuery. I would then execute this query in BigQuery and return the result.

I’m unsure if this is the most effective approach. Ideally, I’d like the GPT model to directly analyze the data and provide answers without generating SQL queries on a table that it doesn’t even know about. However, providing the entire dataset in the prompt isn’t feasible due to token limits and data size.

Could someone please advise on the best approach for such a use case?
Is it good to go with a denormalized table approach in BigQuery
and in regards to training the GPT model as well, what’s the best approach for such a use case?

Thank you!

Topic		Replies	Views
Seeking Guidance on Building a ChatGPT-Style Data Analyst Tool with Database Integration Plugins / Actions builders gpt-4 , chatgpt , api , openai	11	4463	September 23, 2024
Train on our own BigQuery sql queries API	0	841	April 3, 2023
Assistance Required for Improving GPT's accuracy and consistency in generating responses for structured tabular data queries API	3	120	January 20, 2025
Understand user question and intent to query data Prompting codex	1	1703	May 11, 2023
Developing a Custom QA Chatbot: Challenges with Training GPT-3.5-Turbo and Exploring Alternative Models and Libraries API api , langchain , chatgpt-plugin , chatbot , gpt	0	1443	November 24, 2023

Training a GPT model to answer questions on an e-commerce dataset

Related topics