Image Analysis Options: GPT-4 Too Expensive

Would this work? Unfortunately when parsing through things such as gameplay, I don’t think there would be audio to help identify what’s going on on-screen.