Plugin Help - Image Analysis

Hello everyone. I thought I would ask this question because it may be a bit advanced (at least for me). I have successfully created an AI chatbot already, I understand everything inside out on that side. The next questions I have are about photo analysis capabilities.

I just got access to plugins. I have been waiting for this for a while now, after seeing the photo analysis capabilities of GPT-4. So this is my question for people who have explored that side of things. And I apologize if this is well known already, or easy… etc. I haven’t had a chance to dive into these aspects yet because I have been tied up with health issues.

Is it possible to make an image go to the API with a bit of text as a prompt, and receive a response from that. I am looking for a simply summary or analysis, that is it. If that is possible then I am sure that I’d be able to tweak the prompt to receive a response that will fit my needs, I am not too worried about that. I am more so curious if this is actually possible now?

Looking forward to any help on this one! It is my primary reason for this right now and I have been waiting for this for many months now.

Thank you!

I don’t think multimodal capabilities have been released yet. Not sure if it’s an infrastructure issue (could very well be) or if it was a marketing stunt at the time.

One popular freely accessible model is LLaVA (Large Language and Vision Assistant), but it’s based on LLaMA.

Thank you for the info. I am going to check out LLaVA!