Is there a pattern for lengthy actions?

ChatGPT can display GIF images inline.
This allows GPTs to have actions that generate short videos.
For example, I have a GPT for learning chemistry. It displays molecule diagrams. It could show 3D modules spinning around so the user could see it “in 3D”.
It takes about 30sec to generate a little video and export it in GIF format… that is too long for a GPT action.
Is there a design pattern to follow in this case?
All I could think of is to generate the commonly requested ones ahead of time, store it on some web server and just return the link.

Just thinking out loud…

I don’t believe GPT’s have hard time-outs as long as each step responds in time. For that reason, wouldn’t the best approach be calling each individual image frame creation separately, to respond within the timeout limit?

Then a separate call that combines the frames into an animated .GIF (this takes no time as it doesn’t require generative AI).

For example, the GPT/assistant might be written something like:

  1. List the stages involved
  2. For each stage, call function: createFrame()
  3. Enumerate the directory, submit to combineFrames()

I understand that you’re looking to create animation and this is initially going to create something more akin to slides; however, I generally always build complex processes iteratively by starting with very simple/fundamental functionality. My hope is that this is this foundation/approach should help you get to the next step :slight_smile:

1 Like

I think that would add a lot of lag time, you need about 30 frames per second so even for a short 3-4 second animation it would be 100+ calls.
Maybe you can have one that kicks off the generation, returns a UUID and then another action to retrieve the result. Meanwhile the GPT will just spit out some text, that takes a while anyway.

During the interaction, if a commonly requested action occurs, provide a pre-rendered GIF link instead of creating it on-the-fly.

Why not generate all the Gifs in advance?
As far as I can tell producing a single Gif on the fly is a costly and time intensive issue. Plus you have to deal with the model’s hiccups of whatever they may be in this particular case.

If the number of Gifs is very large you can compare the cost of creating a Gif via script to the cost of doing it on the fly and then calculate if this approach is cost efficient. From a time and maintenance perspective, the answer is already clear.

Blockquote @tamas.simon -you need about 30 frames per second so even for a short 3-4 second animation it would be 100+ calls.

This is helpful because I think I see a misunderstanding. If you are looking for 24+ FPS that resembles “video” (animated GIFs typically never run higher than the cinematic standard 23.97 which is about 40.7ms between frames) – in that event you are better off using a generative AI video service, for example parrot.

I don’t know if I’m allowed to link to other AI engines in this forum and to be safe, best to search Parrot AI than risk breaking a rule.

However, for illustrative/educational GIF’s like you mention, it’s very common to use them as one frame per second give-or-take. For example, you can search Google Images for “animated combustion engine” and you’ll see a large number of the style of illustration you’re interested in offering, albeit a different topic.

In short, I don’t believe OpenAI has a product to do full incremental fades and create a high framerate “video”; however, there are plenty of other options. If you are interested in something a bit simpler and more educationally illustrative, there is no need to increase frame rate. Remember that animated GIF’s can be adjusted per frame based on a pause (eg - hold one frame for 1 second, the next for 500ms, etc).

I hope some part of that may be useful :slight_smile: