I found there is a document about the sample size you needed to detect the difference https://platform.openai.com/docs/guides/prompt-engineering/strategy-test-changes-systematically

Anyone know how are these numbers calculated?

1 Like

The numbers come from basic statistics.

n = \frac{Z^2_{0.975} \times p(1 - p)}{MOE^2}

Z_{0.975} \approx 1.96, we chose p = 0.5 as the worst-case scenario, and the margin-of-error MOE is the difference we are trying to detect, in the examples given 0.3, 0.1, 0.03, 0.01.

Plugging in those numbers and rounding up we get,

- 11
- 97
- 1068
- 9604

3 Likes