A sanity check for future plugins to access private SQL databases

bill.french · May 1, 2023, 11:26pm

Okay - and are the prompts unchanged from your earlier tests proclaiming ChatGPT as a failure (where multiple tables are involved)? Saying the “exact same prompt strategy” is possibly different from saying “exact same prompts,” right? I just want to understand what has changed from your first experiments to the latest video.

Also, I have never seen a single successful SQL inference that was less than 256 tokens. Why did you fix that at 150? And more specifically, have you tested your outcomes at higher and lower values to benchmark the outcomes?

Help me understand - are these queries the actual prompts you are sending to ChatGPT (i.e., stocks in industry Packaged Foods)? If not, could you share the prompt that would have been generated from this query? I also see mixed case in the queries; why the inconsistency?

CleanShot 2023-05-01 at 17.02.08@2x

And how do you actually know this is the case?

When I test outcomes, I perform hundreds of them in an automated process that I can easily repeat. Each outcome is validated by a human and ranked for accuracy and performance. I’ve come to learn that evaluating success one at a time without any formal test protocol or metrics is unreliable and generally biased.

Most importantly, any change to the prompts or other settings requires a full test battery and re-assessment. This approach tells me how prompts compare to other prompt versions. We then rank prompt versions to determine the highest success. A single prompt development process might have dozens of different versions and thousands of tests.

Topic		Replies	Views
Protecting Prompt IP From Internal Teams API gpt-4	94	694	November 11, 2024
How to confirm that you got the correct value from a text other than repeating the same prompt over and over API	39	756	September 1, 2024
Creating a Chatbot using the data stored in my huge database Community embeddings , chatgpt , fine-tuning , api	93	86014	November 25, 2023
Foundational must read GPT/LLM papers Community research , large-language-model	79	69620	May 16, 2024
Phasm - Macro Assembler of User Concepts Community chatgpt , project , macros , phasm	29	709	April 24, 2025

A sanity check for future plugins to access private SQL databases

Related topics