Assistance Required for Improving GPT's accuracy and consistency in generating responses for structured tabular data queries

catalyser-admin · January 19, 2025, 11:23pm

Hi everyone,

We’re working to improve GPT’s accuracy and consistency in generating responses for queries involving structured tabular data. The objective is to process these structured files effectively to ensure accurate record import into our database.

We would appreciate any suggestions, best practices, or insights to optimize GPT’s performance in this area. Your expertise and feedback would be invaluable!

We’ve been working with a dataset containing fields like events, sessions, and locations, but GPT often struggles to generate accurate responses, especially for queries involving multiple rows or complex relationships. While we’ve optimized our prompts and retried multiple times, the results remain inconsistent, and it’s challenging to achieve accurate outputs efficiently.

Here’s a sample record from our dataset to provide some context:

| charity_id                        	| Events.id | Events.title  	| Events.brief   | Events.description     	| Events.start_date       	| Events.end_date | Events.type | Events.createdAt | Events.updatedAt | Events.kind | Events.status | Events.publishState | Events.timezone | Events.visibility | Events.contactor                                                                                                                                             	| Sessions.id | Sessions.startTime   	| Sessions.endTime     	| Sessions.createdAt | Sessions.updatedAt | Sessions.eventId | Sessions.status | Sessions.hasDate | Sessions.hasWaitList | Sessions.minimumAvailability | Sessions.maximumAvailability | Sessions.registrationDeadline | Sessions.isAvailabilityLimited | Sessions.approxWeekRequired | Sessions.approxHoursOfCommitment | Sessions.name | Sessions.timezone | Cause_Events.id | Cause_Events.eventId | Cause_Events.cause_id | Cause_Events.createdAt | Cause_Events.updatedAt | DevelopmentGoal_Events.id | DevelopmentGoal_Events.eventId | DevelopmentGoal_Events.developmentGoalId | DevelopmentGoal_Events.createdAt | DevelopmentGoal_Events.updatedAt | Locations.id | Locations.fullAddress | Locations.manuallyEnteredCityName | Locations.manuallyEnteredState | Locations.manuallyEnteredCountryName | Locations.isManual | Locations.type | Locations.linkType | Locations.eventId | Locations.createdAt | Locations.updatedAt |
|---------------------------------------|-----------|-------------------|----------------|----------------------------|-----------------------------|----------------|-------------|------------------|------------------|-------------|---------------|--------------------|----------------|------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------|-------------------------|-------------------------|------------------|------------------|----------------|---------------|----------------|-------------------|---------------------------|---------------------------|---------------------------|----------------------------|---------------------------|--------------------------------|--------------|-----------------|----------------|-------------------|-------------------|---------------------|---------------------|--------------------------|--------------------------|--------------------------------|-----------------------------|-----------------------------|-------------|---------------------|-------------------------------------|----------------------------|--------------------------------|----------------|---------------|----------------|----------------|----------------|----------------|
| 58db463f-edc2-4f8a-af5a-d68a733bdd29 |       	| Test Event Title | Short Summary | Giving Activity Description | 2025-01-08 12:26:49.994+00 |            	| 0       	|              	|              	| 0       	| 1         	| APPROVED      	| Asia/Manila   | 0            	| {"name":"Renel Nuestro","email":"renel@catalyser.com","number":"","useDefaultContact":true,"overwriteWithTeamLeaderDetails":true} |         	| 2025-01-30 17:00:00+00 | 2025-01-30 20:00:00+00 |              	|              	| 1          	| TRUE      	| TRUE       	| 1             	| 10                    	| 2025-01-30 17:00:00+00	| TRUE                   	|                       	|                           	| Session 1	| Asia/Manila 	|            	|               	|               	|                 	|                 	|                      	|                      	|                            	|                         	|                         	|         	| 156 General Luna 	| Malabon                         	| Metro Manila              	| PH                         	| TRUE       	| 0         	| -1         	|            	|              	|              	|

Our goals:

Enable GPT to process such data efficiently.
Generate structured and accurate responses consistently.

Challenges:

GPT often misses the mark on the first attempt, particularly for multi-row queries.
Responses require multiple retries, making the process time-consuming.
Gemini performs better in similar tasks, indicating possible limitations or configuration needs for GPT.

Questions for the community:

Are there best practices or prompt techniques for working with GPT and tabular data?
Has anyone faced similar challenges, and how did you overcome them?
Would fine-tuning or specific configurations improve GPT’s performance here?

Any guidance, tips, or resources would be highly appreciated.

Thanks in advance!
Catalyser team

jochenschultz · January 20, 2025, 4:01am

yes

yes - parallel processing might help

Who? You man that %%§$%#+§$%§$ model that Google brought out? I never even got a single useful answer from it. Not even close… by far not! I’d rather ask a monkey to type random characters and expect better results…

yes

not needed - enriching the prompt to reach that goal is possible

jochenschultz · January 20, 2025, 4:05am

No! If you rely on that data, then don’t rely solely on GPT models. They are not meant for that usecase even if it looks like.

They are meant to make suggestions that humans have to approve.

But there are mechanisms to approve it automatically…

Diet · January 20, 2025, 7:29am

Welcome to the community!

Have you tried using the models to generate SQL (or similar) queries that can isolate the relevant data before providing a response?

That in general falls within the paradigm or pattern of “functions” or “tool use” - the idea is to give the model a pickaxe for the job instead of having it scrape through the rock with its fingernails.

Topic		Replies	Views
Accuracy of retrieved data (large CSV) GPT builders	5	1081	April 17, 2024
Efficient Methods for Processing Large Volumes of Tabular Data with ChatGPT Prompting gpt-4	1	1353	August 11, 2024
Seeking Advice on Fine-Tuning NLP Model for Response Generation Community gpt-35-turbo	3	115	February 24, 2025
ChatGPT can handle dates in tables? API chatgpt , api , prompt-engineering , assistants-api	3	366	July 2, 2024
Seeking Guidance on Building a ChatGPT-Style Data Analyst Tool with Database Integration Plugins / Actions builders gpt-4 , chatgpt , api , openai	11	4568	September 23, 2024

Assistance Required for Improving GPT's accuracy and consistency in generating responses for structured tabular data queries

Related topics