Hi,
I am trying to analyze a video that I made with my drone.
The video has no voice but rather records activities only.
What I try to get is a summary of what happened in the video.
Example:
08:01 Person leaves house and walks to garden shed
08:05 Person takes equipment out of the garden shed and walks into garden area
08:10 Person does some work in the garden
etc.
Is it feasible to get this information?
Here’s a cookbook example, which is overly optimistic about current vision models and how many pictures they can accept, and uses a method retired in newest models, but gives an overview of frame extraction and asking.
Many more frames need to be discarded to accommodate a budget, and timestamping is not a feature except by what you know about the segment you sent for analysis of video images.
A second round of AI processing may be required to remove the redundancy of what is reported in an image, as there is no actual long-term view of a video, only creative providing of images.
Many thanks for the answer. Seems that I am asking for something that is currently not easily achievable. Let’s wait a couple of months and then see what is possible.
@lreinhard7 Hi. https://videodb.io/ might have what you’re looking for.