I found there is a document about the sample size you needed to detect the difference https://platform.openai.com/docs/guides/prompt-engineering/strategy-test-changes-systematically
Anyone know how are these numbers calculated?
1 Like
The numbers come from basic statistics.
n = \frac{Z^2_{0.975} \times p(1 - p)}{MOE^2}
Z_{0.975} \approx 1.96, we chose p = 0.5 as the worst-case scenario, and the margin-of-error MOE is the difference we are trying to detect, in the examples given 0.3, 0.1, 0.03, 0.01.
Plugging in those numbers and rounding up we get,
- 11
- 97
- 1068
- 9604
3 Likes