Integrated Multimodal SOP Generation from Live Tasks Using Voice and Camera Input

Overview

I’d like to request a feature that allows users to generate detailed, step-by-step Standard Operating Procedures (SOPs) directly from real-world tasks by combining live camera feed and live voice input in ChatGPT (Live Voice or equivalent). This would enable users performing tasks to narrate their process while showing the work visually, and have ChatGPT automatically synthesize this into a well-structured SOP with text instructions and annotated images.

Use Case

For example, as a Facilities Manager, I perform recurring tasks such as cleaning ERV and mini-split filters in our barns. If I could set up ChatGPT to watch and listen while I do this — showing the tools, steps, and describing what I am doing — I would like it to automatically:

  • Capture still images from the live feed at key moments (either automatically or via my verbal cue like “take photo”).
  • Transcribe my explanations into step-by-step text instructions.
  • Compile the captured images and generated text into a draft SOP document.

Desired Features

  • Live multimodal processing: Combine live video input and audio (narration) simultaneously.
  • Smart image capture: Automatically or manually capture still images from the video feed during the task.
  • Speech-to-text transcription: Transcribe spoken instructions into text.
  • SOP formatting: Organize the text into steps, optionally with timestamps or linked images.
  • Output as a document: Provide a formatted document (Markdown, DOCX, or PDF) ready for review and sharing.

Benefits

  • Saves significant time for educators, facilities teams, farms, and businesses in documenting recurring procedures.
  • Reduces the barrier for creating clear, visual instructions (especially for teams or staff training).
  • Ensures knowledge retention and standardized training materials across organizations.

Optional/Nice to Have

  • Ability to annotate images with labels or arrows via voice command during recording (“Label this ‘filter housing’”).
  • Ability to review and edit the generated SOP in a user-friendly interface before final export.
  • Integration with project management or documentation systems (Notion, Google Docs, Confluence, etc.)

Thanks, and keep up the good work!