Great news, devs have been waiting patiently for our Shipmas present!
With o1 now having vision capabilities (which I presume will be released in the API in the future) how should we be thinking about using it for use-cases that typically require in-context learning?
Is it effective to provide the model a multi-shot prompt with each example having the ∼5 images that will be encountered in the real world, or is a different strategy recommended?