Help with determining if its less efficient to create embeddings based on JSON

Hi All,

I would like help determining if it is more or less efficient to create embeddings based on JSON.

I would like to populate a vector stores like Redis or Milvus with these embeddings and complete a similarity search based off of an end user’s question.
I’m basically creating a Q&A bot but I want to leverage data I scrapped from a web forum like reddit and create embeddings with it.

I store my data as a single json string in the form of [{“question”: “my question”, “response”: “my response”}]. I have over 5K Q/A responses I’m working with.

Thanks for reading and any help here in advance!

You can, but you will just be embedding the same key values over and over and wasting embedding tokens, just deserialize the Q and A text and embed those.

Got it - I was way overthinking it. Thank you!!

Hey, adding to the question, lets say i have different json data like

{
a: some value,
b: some value,
c: some value

}

i want to convert this to documents, do that i can later convert them to embeddings ? Is it possible ? In other words how to i convert unstructured data like json to document format , so that i will be able to change them to embeddings and store them in vector database ? Let me know if you need any additional details

Hi, I am facing a similar challenge, have you found any solution for it?
I am currently preparing a programming assistant for software. I have prepared 10 sample programs and stored them in a JSON file. Each sample program has hundreds of lines of code and related descriptions. I hope that users can ask questions and receive relevant answers through the chatbot (rather than directly displaying sample programs).
In CSV View