Best current alternative for multimodal image review?

I understand we should expect multimodal gpt4 in 2024. I want to prototype some use cases at my company, to get a sense of how this would fit into our product.

Is there a multimodal foundational model available now that I can use to approximate working with gpt4 when that’s available?

As of now, there is no specific multimodal foundational model available that can approximate working with GPT-4. However, you can explore alternative approaches such as using separate models for text and image processing and integrating them together in your prototype. This can give you a sense of how multimodal capabilities might fit into your product before the release of GPT-4.