For which classification tasks, zero-shot GPT-3 would outperform fine-tuned Bert?

It’s rather a theoretical question. Suppose we have either 1,000 or 10,000 examples for fine-tuning.