vb
19
There is a Arxiv paper from Microsoft where they explore the capabilities of the GPT-4V model in depth.
Here is the link to the arxiv pre-print:
https://arxiv.org/abs/2309.17421
For those who are looking for a tl/dr here is the video from AI Explained:
3 Likes
foadgr
20
Looking forward to the API access for this feature.
2 Likes
If you have time this weekend, can you feed it a page or two of this? I’m curious as to what it “sees”…
1 Like
N2U
22
Actually a good idea, it would be quiet interesting to see what happens 
1 Like
After asking, someone on Discord fed it a page, and it identified it, but they didn’t try to translate it …
2 Likes
N2U
24
Alright, it was a long shot 
Just got done reading the research paper, and I’m really impressed so far, it’s much better than I expected.
1 Like
vb
25
After giving it a shot I can confirm that the results are not spectacular.
It mostly goes on and on about the document being medieval, takes a stab at the style (Gothic, 12-15th century) but never really makes any interesting statements.
PS. Sharing links with images is not yet supported. @PaulBellow
1 Like
No worries. Thanks for taking a stab at it! Was curious what it would “guess”… there’s been a lot of theories over the years.
ETA: Saw another screenshot on Discord but it wouldn’t guess… It seems like it’s relying on textual stuff rather than the image… or taking what it “knows” about the image “it’s the Voynich” document but is just gathering vectorized data about the “image”? hrm…
1 Like
_j
27
It does make one wonder exactly how the prompting and context of images works in GPT-4. Can it be trained by example images? Does it have the context required to hold image data or is this processed by a different sub-model of the architecture that only returns language?
Thought it would be rolled out together with Dalle3 until I failed to find it in my chatgpt UI. No?
Yeah perhaps one needs to prod GPT-4 using some prompts and not just ask it to describe what is the document. Ask it to find patterns or whatever and maybe it can tell us more. I think like for us human, you show one a picture of Mona Lisa and the person will just tell you it is Mona Lisa. But if you tell the person about the smile, scenery, etc. and perhaps the person might give their interpretation.
1 Like
It’s something worth trying and if it locates Wally, I will be impressed.
N2U
32
Bonus question: ask for the precise bounding box, then feed the results and the image to the advanced data analysis tool and ask it to draw it 
2 Likes
I wonder if there’s any way to generate an attention heatmap of the image and, if so, how well that would correlate to locating Waldo?
4 Likes
Still waiting here. Anything else cool you’ve tried? Give it some history stuff?
2 Likes
I’m also waiting, that was appropriated from twitter.
2 Likes
Ah, thought it looked familiar!
Hope your weekend is going okay.
1 Like
Yup, all good. Just watched Sam Altman chatting with Joe Rogan, was a fun interview.