In your log, you see a stop token 100260. You will also see 265 in use. It gives you hints of how the AI has been trained on containers for other output than just text…that’s about the only thing interesting to be learned from communication dumps.

As you can see, the endpoint is no longer encoding AI-produced text into tokens, it has to emit them directly. Maybe because someone trained the AI on making them even though they are filtered from input…

1 Like

Anyone has try CHATgpt Lego ?

Take a picture of random Lego pieces and ask ChatGPT to build something for me

2 Likes

I was wondering why, in my previous tests, the numbers from the screenshot were read incorrectly, even though it was a high-quality picture taken from Chrome developer tools. On the same day, I watched some YouTube videos that displayed use cases where the model essentially excelled at this task.

As it turns out, the large image dimensions have a detrimental effect on the quality of the readings. When provided with a screenshot from Chrome developer tools with dimensions of 3840x2160 (4K) and asked for a number, the model can recognize which specific number is referred to but cannot read the exact number. However, when provided with an equivalent screenshot with dimensions of 1920x1080, the model reads the number correctly. Additionally, the overall amount of information on the image plays a role. When cropping the image to a section containing the relevant information and then scaling up to 4k dimensions, the model can read the number even though the image quality is reduced.

Intuitively, this makes sense. Ensure that you only include relevant content for the best results, and this can be further improved by reducing the image dimensions.

1 Like

There is a Arxiv paper from Microsoft where they explore the capabilities of the GPT-4V model in depth.

Here is the link to the arxiv pre-print:
https://arxiv.org/abs/2309.17421

For those who are looking for a tl/dr here is the video from AI Explained:

3 Likes

Looking forward to the API access for this feature.

2 Likes

If you have time this weekend, can you feed it a page or two of this? I’m curious as to what it “sees”…

1 Like

Actually a good idea, it would be quiet interesting to see what happens :thinking:

1 Like

After asking, someone on Discord fed it a page, and it identified it, but they didn’t try to translate it …

2 Likes

Alright, it was a long shot :laughing:

Just got done reading the research paper, and I’m really impressed so far, it’s much better than I expected.

1 Like

After giving it a shot I can confirm that the results are not spectacular.
It mostly goes on and on about the document being medieval, takes a stab at the style (Gothic, 12-15th century) but never really makes any interesting statements.

PS. Sharing links with images is not yet supported. @PaulBellow

1 Like

No worries. Thanks for taking a stab at it! Was curious what it would “guess”… there’s been a lot of theories over the years.

ETA: Saw another screenshot on Discord but it wouldn’t guess… It seems like it’s relying on textual stuff rather than the image… or taking what it “knows” about the image “it’s the Voynich” document but is just gathering vectorized data about the “image”? hrm…

1 Like

It does make one wonder exactly how the prompting and context of images works in GPT-4. Can it be trained by example images? Does it have the context required to hold image data or is this processed by a different sub-model of the architecture that only returns language?

Thought it would be rolled out together with Dalle3 until I failed to find it in my chatgpt UI. No?

Yeah perhaps one needs to prod GPT-4 using some prompts and not just ask it to describe what is the document. Ask it to find patterns or whatever and maybe it can tell us more. I think like for us human, you show one a picture of Mona Lisa and the person will just tell you it is Mona Lisa. But if you tell the person about the smile, scenery, etc. and perhaps the person might give their interpretation.

1 Like

It’s something worth trying and if it locates Wally, I will be impressed.

2 Likes

Bonus question: ask for the precise bounding box, then feed the results and the image to the advanced data analysis tool and ask it to draw it :cowboy_hat_face:

2 Likes

I wonder if there’s any way to generate an attention heatmap of the image and, if so, how well that would correlate to locating Waldo?

4 Likes

Still waiting here. Anything else cool you’ve tried? Give it some history stuff?

2 Likes

I’m also waiting, that was appropriated from twitter.

2 Likes