Day 6 of Shipmas: Halftime is today

grandell1234 · December 12, 2024, 6:24pm

anon25271712 · December 12, 2024, 6:25pm

ah thank you, I did miss it

Ooh my, video to avm <3

ok, I’ve updated the app but I still don’t see the video functionality, but I did see hahah i did see the santa voice, pretty cool

ah, can’t wait to test out the video on avm, doesn’t seem to be out on either android or ios

anon10827405 · December 12, 2024, 6:42pm

Near the end of the video they said it’s coming out very soon for Teams and then a little later for Plus and Pro and then 2025 later for Edu and Enterprise (Paraphrased a bit)

anon25271712 · December 12, 2024, 6:44pm

ah, that makes sense! can’t wait to see it on the pro plan

man, this is amazing! I feel like I can become a better cook like this, might need to invest on getting a tripod or something like that hahah

woah, share screen is going to be a thing too! amazing!! I really hope it also works on pc, I’m 100% going to use it way more on a pc rather then on the cellphone

anon10827405 · December 12, 2024, 6:48pm

I’m very interested in seeing if it’s actually retaining frames without instruction (from the introduction part it definitely does, I just wonder how explicit it needs to be). I already have some tests I’d like to try:

Nonchalantly scan the room while communicating something irrelevant to the visuals and then ask what was recently shown
Roll a ball across the screen in numerous directions and see if the model is able to synthesize snapshots

anon25271712 · December 12, 2024, 6:51pm

ok ok, i got to the end

so in the next week most pro and plus users will be getting this feature?!!

what a time to be alive!!! lets gooo!! I’m so happy for this, this demo was amazing

Gourmet · December 12, 2024, 6:51pm

Like Daniel Simons’ Invisible Gorilla?

anon10827405 · December 12, 2024, 6:52pm

That would be a fun test.

I’m more interested in understanding how they implemented it. The closest thing I can think of is something I’ve been dabbling in lately:

https://arxiv.org/pdf/2410.17434

Which can be tried out here:

Which is capable of understanding videos of long lengths (hours long!). It’s not live, but it’s very fast

anon25271712 · December 12, 2024, 6:54pm

x-mas joke of the day:

whats every elf’s favorite music?

wrap music

anon25271712 · December 12, 2024, 6:57pm

was waiting for this feature, super happy about it!

anon10827405 · December 12, 2024, 7:02pm

LongVU failed

When explicitly asked, it does note of the gorilla though. So… Ehhh… Idk it’s a though one to say because the video does talk about the gorilla in the end. It does correctly guess the walking direction though.

Gorilla wearing a gorilla suit? Gorilla-ception?

Will definitely need to try with cGPT when we can.

anon25271712 · December 12, 2024, 7:04pm

@anon10827405 about the room scanning, I’d love to have it working with multiple cameras… can you imagine loosing something and asking where did I leave my missing item? that would be cool.

It would also be nice to use it as a security camera, if someone breaks in turn all lights red and start playing the theme of doom with a -8.0 pitch, convince the person to reevaluate their actions in a santa voice

also, I’m super surprised how fast it is… must be using the real time endpoint… because technically you can break down a video into frames and send to a vision modal… but that would take so much longer

anon10827405 · December 12, 2024, 7:07pm

Dude. Yes! I am always leaving my keys in the dumbest places.

Imagine this bad boy hooked up with smart-glasses? Then AR? Watch-dogs when?

oh my lord. Then, smoke comes out of a room and the Mortal Kombat theme song starts to play with santa claus noises becoming louder.

I’m really hoping OAI releases some technical details. LongVU’s strategy is super cool and works great with slow processing visual content by running it through numerous models (3 + some aggregation strategy) but, yeah, it is very fast and I’m hoping it’ll be the same to us plebs with somewhat poor & intermittently janky connections

_j · December 12, 2024, 7:10pm

The sample rate is about 1 image a second. Then this is utilized as context when a generation by post-speech silence is triggered. I expect there is a limit to how long of a buffer of images is used as input as a narrative and as part of past chat turns.

PaulBellow · December 12, 2024, 7:12pm

How about a refrigerator… LOL

Never happened to me! Honest!

@_j already on the case dropping knowledge!

I love this place…

anon25271712 · December 12, 2024, 7:14pm

I’d imagine it uses some sort of compression algorithm, similar to how chatgpt does it after a certain point

anon25271712 · December 12, 2024, 7:18pm

I want it everywhere too, in my microscopes and telescopes as well, 2 years from now will be an epic time to be alive

_j · December 12, 2024, 7:20pm

Untiled base images consume 85 tokens or even less. 100 seconds → 8500 tokens of a 30k ChatGPT input context. OpenAI has the ability, and so did particular input format of gpt-4-vision-preview, to use untiled larger image inputs in proprietary products, along with reserving proprietary features for their own products.

Release updates log of previous days:

https://help.openai.com/en/articles/10271060-12-days-of-openai-release-updates

anon10827405 · December 12, 2024, 7:27pm

Oooh… Makes sense. LongVU does the same.
My biggest issue with this, and a reason why I increased the FPS is because sometimes the single frame is blurry (from being live). I’m guessing that’s why they had to hold it for numerous seconds. Hopefully the API version () will give us controls over this.

Where did you find this info?

_j · December 12, 2024, 7:47pm

It was discussed after the spring update demo by a more official source than just my imaginings, a factoid that I bookmarked in my wetware for just this moment.

Topic		Replies	Views
Day 7 of Shipmas: Dear Samta, API, pretty please? Community shipmas	30	1777	January 20, 2026
Day 12 of Shipmas: New frontier models o3 and o3-mini announcement Community shipmas	71	9288	December 26, 2024
SORA Gallery: Share your creations Community sora	84	8169	November 2, 2025
Huge escalation in input tokens when using code_interpreter API	11	376	May 26, 2025
Announcing GPT-4o in the API! Announcements	130	111276	July 4, 2024

Day 6 of Shipmas: Halftime is today

Related topics