For example, when we upload a 1 GB movie to the vector database, is the entire movie stored in the database or only some metadata, which is the same vector, is stored in the database and referenced to that original file from the database?
Hey there and welcome to the community!
Which database are you using?
i dont know exactly, chroma or milvus.
which one is better
Hi!
The original data has to be stored somewhere in order to know what the vectors are actually referring to. You can either add it as metadata to the embedding in the vector database
or
add a reference to another database where the original data is stored as metadata in the vector db.
hiii
So you say that for example a 1 GB movie is not stored in the database?
Rather, it is stored somewhere like object storage and then a reference is given to it in the vector?
I read somewhere that data (for example, a movie as an input to an artificial intelligence system) is broken and stored in the database in chunks.
Converting video to vector data is not commonly done. In theory, it is possible to convert each frame of a video into an image and then vectorize it. However, this would require a large amount of computation.
And it is very difficult to vectorize video efficiently.
Even in the case of text, once the original data is converted to vector data, it is not possible to restore the original text from the vector data.
So, it is necessary to store the original text data separately from the vectorized data during the embedding process.
Even if the video is converted to vector data and stored in a vector database, it is useless if the video itself is not stored separately from the vector data.
Yes, that’s correct.
The embedding vector is ultimately comparable to a summary of the chunk you embedded.
I suggest you spend some time looking into the process. You will then be able to understand the issues with video embeddings.