Help with determining if its less efficient to create embeddings based on JSON

Hi All,

I would like help determining if it is more or less efficient to create embeddings based on JSON.

I would like to populate a vector stores like Redis or Milvus with these embeddings and complete a similarity search based off of an end user’s question.
I’m basically creating a Q&A bot but I want to leverage data I scrapped from a web forum like reddit and create embeddings with it.

I store my data as a single json string in the form of [{“question”: “my question”, “response”: “my response”}]. I have over 5K Q/A responses I’m working with.

Thanks for reading and any help here in advance!

1 Like

You can, but you will just be embedding the same key values over and over and wasting embedding tokens, just deserialize the Q and A text and embed those.

2 Likes

Got it - I was way overthinking it. Thank you!!

Hey, adding to the question, lets say i have different json data like

{
a: some value,
b: some value,
c: some value

}

i want to convert this to documents, do that i can later convert them to embeddings ? Is it possible ? In other words how to i convert unstructured data like json to document format , so that i will be able to change them to embeddings and store them in vector database ? Let me know if you need any additional details

Hi, I am facing a similar challenge, have you found any solution for it?
I am currently preparing a programming assistant for software. I have prepared 10 sample programs and stored them in a JSON file. Each sample program has hundreds of lines of code and related descriptions. I hope that users can ask questions and receive relevant answers through the chatbot (rather than directly displaying sample programs).
In CSV View