And so now we have dall-e 3 and vision, that’s awesome, huge, great, super!! But we need some glue for all that stuff, don’t you think?
An image embedding model can be that special sauce, just can’t wait…
(off topic: TTS model is also great, congratz for the wise choice of Brazilian Portuguese as the PT standard. but still sounds like an us-person speaking portuguese (a fluent as fu** in portuguese, but still an us-person. So… any plans for improve the multilingual side of the model?))