🎥 Surveillance Video Summarizer: AI-Powered Video Analysis and Summarization jni

Hey everyone!

I’ve been working on a VLM-driven system that processes surveillance videos, extracts frames, and generates detailed annotations to highlight notable events, actions, and objects. The system is powered by a fine-tuned Florence-2 Vision-Language Model (VLM), which I specifically trained on the SPHAR dataset. And, it utilizes the OpenAI API to summarize and extract the most relevant content, ensuring a comprehensive and coherent overview of the surveillance footage.

:mega: How it Works:

  • Frame Extraction: Extracts frames from video files at regular intervals using OpenCV.
  • AI-Powered Annotation: Each frame is analyzed by the fine-tuned Florence-2 model, generating accurate annotations of the scene.
  • Data Storage: Annotations and frame data are stored in a SQLite database for easy retrieval and future analysis.
  • Gradio-Powered Interface: Easily interact with the system through a Gradio-based web interface. By specifying time ranges, you can retrieve detailed logs with comprehensive analysis. The interface leverages the OpenAI API to summarize video content, ensuring temporal coherence by analyzing the sequence of frames, allowing for a more contextually aware understanding of the events captured in the footage.
2 Likes

Interesting.
I am new to all this but really interested in this topic. (VLM and security/safety).
I this project a home grown activity or part of a bigger project?
Thanks,
-Afshin

Hi ,
Thank you for the response

This project is a homegrown activity.