Seeking Advice: Extracting Text from Keynote and MP4 Files for RAG Implementation

mp4 is a container that internally has audio and video streams that can be demuxed. You can have a clever AI write you a script. Or a clever human can find his links…

You will then want to re-encode the audio with a library that supports most anything like ffmpeg, because there can be bloated multichannel files in a codec that still would be unsupported by the API.

I don’t know what “Keynote files” are, so discover it is Apple presentation software. I would focus on one of these, like HTML, then process with a tag stripper. " Exports to: PDF, QuickTime, JPEG, TIFF, PNG, HTML (with JPEG images) and PowerPoint. Keynote also uses .key (presentation files) and .kth (theme files) bundles based on XML."