Computational Storyboarding

AdamSobieski · December 21, 2024, 8:40am

Introduction

Computational storyboarding builds upon traditional storyboarding techniques, combining elements from screenplays, storyboards, functions, diagrams, and animation.

Computational storyboards are intended to be of use as input for generative artificial-intelligence systems to create longer-form output video.

A motivating use case is simplifying the creation of educational videos, e.g., lecture videos. With computational storyboards, content creators could describe single-character stories where the main characters were tutors instructing audiences with respect to provided subject matter, utilizing boards or screens displaying synchronized multimedia content from textbooks, encyclopedia articles, or slideshow presentations.

Screenplays

A screenplay is a form of narration in which the movements, actions, expressions, and dialogue of characters are described in a certain format. Visual and cinematographic cues might also be given as well as scene descriptions and changes.

Storyboards

A storyboard is an organization technique consisting of illustrations or images, thumbnails, traditionally displayed in sequences. Storyboards have traditionally been used for pre-visualizing motion pictures, animations, motion graphics, and other interactive media sequences.

Storyboards’ thumbnails have traditionally provided information about content layering, audio and sound effects, camera shots, character shots, transitions between scenes, and more.

Web of Computational Storyboards

In theory, nodes in diagrammatic computational storyboards could refer to other diagrams by URLs, weaving webs of interconnected diagrams. End-users could click on these referring nodes to expand them, loading referenced content from URL-addressable resources into diagrams.

Wiki of Computational Storyboards

Computational storyboard diagrams could be collaboratively editable, enabling wiki platforms.

Functions

Functions would enable modularity and the reuse of storyboard content. Beyond referring to other diagrams by URLs, function-calling nodes in computational storyboard diagrams could refer to function-like diagrams by URLs while invoking and passing arguments to them.

With computational storyboarding functions, scenes’ characters, settings, props, actions, dialogue, and properties of these could all be parameterized.

Arguments provided to invoked functions could be in the form of multimedia content, structured objects, or text. Arguments and variables in functions could be used to create the prompts to be provided to generative artificial-intelligence systems including those prompts with which to generate thumbnails’ images.

Markers, resembling keykodes or timecodes, could be placed between thumbnails in computational storyboard diagrams. Alternatively, some or all of the thumbnails could be selected to serve as referenceable markers, keykodes, or timecodes in resultant video. With markers, content creators could refer to instants or intervals of video generated from invoked functions.

Metadata

Components in computational storyboard diagrams could be annotated with metadata.

Functions, for instance, could be annotated with metadata describing one or more sample argument sequences. In this way, content creators could have options for generating thumbnails’ images while designing.

Control Flow

With respect to computational storyboarding functions and their diagrams, there are two varieties of control-flow constructs to consider.

A first variety of control-flow construct would route execution at runtime to paths of subsequent thumbnails. Such branching could occur either based upon the evaluation of expressions involving input arguments and variables or upon asking questions of interoperating artificial-intelligence systems.

A second variety of control-flow construct would result in branching or interactive video output, with routes or paths to be selected by viewers during playback. Generated interactive video content could interface with playback environments, e.g., in Web browsers, to provide viewers with features. Uses of interactive video include providing viewers with options, e.g., navigational menus.

Execution Contexts

While computational storyboards were executed or run to generate video, execution contexts, these building on the concepts of “call stacks”, could be utilized. Execution contexts would include nested frames, these building on the concepts of “stack frames”, which would each include those active nodes in functions’ diagrams and those values of their input arguments and variables.

Variation

In addition to computational storyboards’ functions providing their diagrammatic contents with their input arguments and variables, functions could contain nodes for obtaining “random values” from specified numerical intervals or, perhaps, for randomly selecting from nodes in containers.

Random variation could, optionally, be utilized by content creators to vary resultant video.

Optimization

In theory, beyond using “random values” to simply vary generated video contents, diagram nodes for providing “automatic values” could be used to provide values, either scalars from intervals or selections from nodes in containers, which were intended to be optimized across multiple executions or runs while observations and data were collected.

As envisioned, developing and providing these components for computational storyboarding diagrams would simplify A/B testing and related techniques for content creators.

Generating Thumbnail Images

As considered, at least some computational storyboards’ thumbnails would have their images created by generative artificial-intelligence systems. Multimodal prompts, in these regards, could be varied including by using functions’ input arguments and variables.

Generating Video

A goal for computational storyboards is that generative artificial-intelligence systems could process them into longer-form video content.

Towards this goal, computational storyboards could provide materials beyond extensible thumbnails for generative artificial-intelligence systems. Notes about directing, cinematography, and characters or acting could be provided to systems. Multimedia materials with respect to characters, settings, props, and style could be provided to systems. That content intended to be synchronized and placed onto one or more display surfaces in generated video could be provided to systems.

Generated videos could utilize one or more tracks to enable features in playback environments. Transcripts or captions, for instance, alongside accompanying metadata track items, could be sent to viewers’ artificial-intelligence assistants for these systems to be able to answer questions about videos’ contents.

Debugging and Revision

With respect to generating video from computational storyboards, there could exist a “debugging” mode. When generated from such a mode, output video would contain extra metadata tracks providing objects for content creators to utilize to be able to jump into computational storyboards resumed to appropriate execution contexts for points of interest in the generated videos.

Processing Video

In theory, existing video content could be processed into computational storyboards.

Conclusion

Envisioned computational storyboards build on traditional storyboarding techniques while intending to enable generative artificial-intelligence systems to create longer-form output video, e.g., educational video.

Happy holidays! Any thoughts on these ideas?

DavidMM · December 22, 2024, 5:07am

Sir, I am a bit ignorant about the topic you are discussing, but is it something like creating small pieces of what would be a video in small windows so that the AI can learn to sequence the images in order to generate long-duration videos while maintaining sequential logic?

AdamSobieski · December 22, 2024, 6:36am

Hello, yes, I think you understand and, as far as I know, some of the described ideas are new.

While traditional storyboards are sequences of thumbnails, each having image and possibly text content, computational storyboarding blends in concepts like modular, reusable functions and control flow constructs.

As envisioned, content creators would be able to define a computational storyboard sequence, in a reusable function, using one tutor, Alice, and that sequence would work with other interchangeable tutors, Bob or Charlie.

Input parameters, e.g., the character/tutor, actions, spoken content, setting, camera instructions, visual style, and so forth, could all be varied as the portion of content was (re)utilized.

For longer-form content, creators would be able to (visually) build sequences which invoked reusable subsequences. Alternatively, eventually, computational storyboards could be automatically generated with key inputs being the content: a selection of a textbook, an encyclopedia article, or a slideshow presentation.

Towards a concrete example, a computational-storyboard sequence (invoking reusable functions) might involve a tutor: (1) speaking for a bit, looking at the camera, with the camera gradually zooming in (for visual interestingness) followed by (2) that tutor looking to and pointing to specific content on a board behind them while speaking, followed by (3) a visual transition to the board’s slide or contents being fullscreen while that tutor continued to speak.

DavidMM · December 22, 2024, 5:16pm

I’m glad to understand it, even minimally. I believe your system is very versatile and could not only be useful for video generation but also for something Mitchell envisioned, to which I added a small version. It could be implemented in an interactive image and audio search engine in real time. Using its structural base, it would be possible to interact with the machine by navigating through visual images and mini video compositions. For example, if I mention a country during a conversation, I could be shown images of its flag, cities, architecture, culture, etc., while the LLM speaks to me. This would represent a new way of navigating the Internet.

phyde1001 · December 30, 2024, 9:52pm

This sounds like a project I put up here if you are interested - Phas. The idea of the panes on Phas is they can contain ‘any’ data type… Also as you suggested it the design has coding constructs (ie conditionals/loops) integrated into every branch.

Here is a creative example showing a single prompt generating multiple panes of information multi-modal, this could then be accessed/edited by authorised Users/AIs.

Primes

I consider that we are at the event horizon of AI… There is a huge space ahead of us and everyone’s starting to realise that their footprint online is effectively their own ‘Prompt’, the small piece of intelligent direction we have carved out for ourselves. All these Prompts will form the basis of the stories we tell our children.

AdamSobieski · December 31, 2024, 12:53am

Hello. These ideas for computational storyboarding arose while I was exploring interactive diagramming, e.g., BPMN, workflow, and flowcharts, on the one hand, while also thinking about narrative and storyboarding, on the other.

These ideas resemble a conceptual blend between visual programming languages and storyboards.

As computer programs invoke functions to iterate sequences of instructions, computational storyboards will do so to iterate sequences of thumbnails.

A goal is to enable content creators to be able to utilize generative artificial-intelligence systems to produce images for storyboard thumbnails and to generate longer-form video from these storyboards.

Yes, I’m interested in learning more about Phas and panes. Computational storyboarding’s thumbnails are, similarly, intended to be extensible, while each produces a prompt with which to generate a thumbnail image in its storyboarding context.

Topic		Replies	Views
Complex Function Calling Scenarios Feedback api , functions , function-calling	3	3764	November 11, 2023
Large Language Models and Conversational User Interfaces for Interactive Fiction and other Videogames Community research , games	7	1550	July 5, 2023
Feature Request: First-class Object References for Function Calls Feedback api , feature-request , community-feedback	3	781	December 2, 2023
GPT-3, DALL-E, and our Multimodal Future [YouTube Video Series] Community	55	3646	January 3, 2024
New ideas for upcoming AI models Community gpt-4	22	2156	February 28, 2024