I want to use customized gpt-4-vision to process documents such as pdf, ppt, and docx. What is the shortest way to achieve this. As far I know gpt-4-vision currently supports PNG (.png), JPEG (.jpeg and .jpg), WEBP (.webp), and non-animated GIF (.gif), so how to process big files using this model?
Does “gpt-4-vision” refer to a vision feature that can be accessed through the API?
If so, the ChatGPT tag may not be appropriate.
Also, the gpt-4-vision feature accessible through the API cannot be customized.
On the other hand, ChatGPT has the ability to read PDF and DOCX files as a feature.
1 Like