Can vector base data be stored in chinese?

mavx · December 5, 2024, 6:17pm

Wondering if it’s possible to then ask questions in english about the data stored in chinese in a vector base.

_j · December 5, 2024, 11:26pm

A vector database itself doesn’t contain any prohibitions about “asking”. You can embed most any Unicode text and receive an embeddings vector from an AI model.

It is the qualities of the AI embeddings model that is used that will be able to discern the semantic similarity in topics when the language employed is different.

Similarity is the keyword here. The first part of similarity is simply that “data stored” doesn’t exactly look like “questions”, so already, some transformation of information is beneficial.

That aspect of similarity extends to the world language being used. Large AI models will tend to build further understanding of ideas across languages.

You will have lower thresholds if you are making comparisons between English and Chinese for example, where matches in the database that are in English may score with higher relevance than the ideal target in Chinese with knowledge. This also can be beneficial, as you might not want Chinese search results ranked high for an English query.

If you are making liberal use of AI, you can perform transformations and translations, obtain metadata for embeddings on both the input to make them more compatible with a corpus, or you can also produce embeddings based on a language translation or a question-like summary of your data so you can match inputs of broader types back to a corpus.

So you can ask - and if the data being searched is exclusively in one language, you will likely get top-ranked results that are still relevant (even if only an AI language model you are chatting with can understand).

Topic		Replies	Views
Transferability of multiple languages API	4	514	September 30, 2023
How do you tag data correctly? API embeddings , chatgpt , vector-db	8	4646	December 16, 2023
Understanding Vector Database API api	4	9819	June 5, 2023
Embedded Data for chat bot API gpt-35-turbo , chatgpt , semantic-search	9	1193	November 6, 2023
Is original data be stored in vector db? Community gpt-4	6	3504	March 6, 2024

Can vector base data be stored in chinese?

Related topics