Help with determining if its less efficient to create embeddings based on JSON

jay039 · June 14, 2023, 6:21pm

Hi All,

I would like help determining if it is more or less efficient to create embeddings based on JSON.

I would like to populate a vector stores like Redis or Milvus with these embeddings and complete a similarity search based off of an end user’s question.
I’m basically creating a Q&A bot but I want to leverage data I scrapped from a web forum like reddit and create embeddings with it.

I store my data as a single json string in the form of [{“question”: “my question”, “response”: “my response”}]. I have over 5K Q/A responses I’m working with.

Thanks for reading and any help here in advance!

Foxalabs · June 14, 2023, 7:30pm

You can, but you will just be embedding the same key values over and over and wasting embedding tokens, just deserialize the Q and A text and embed those.

jay039 · June 15, 2023, 3:26pm

Got it - I was way overthinking it. Thank you!!

akhilshekkari21 · August 1, 2023, 6:40pm

Hey, adding to the question, lets say i have different json data like

{
a: some value,
b: some value,
c: some value

}

i want to convert this to documents, do that i can later convert them to embeddings ? Is it possible ? In other words how to i convert unstructured data like json to document format , so that i will be able to change them to embeddings and store them in vector database ? Let me know if you need any additional details

hxyair · August 16, 2023, 7:17am

Hi, I am facing a similar challenge, have you found any solution for it?
I am currently preparing a programming assistant for software. I have prepared 10 sample programs and stored them in a JSON file. Each sample program has hundreds of lines of code and related descriptions. I hope that users can ask questions and receive relevant answers through the chatbot (rather than directly displaying sample programs).
In CSV View

Topic		Replies	Views
How to make embeddings on multiple JSON files API embeddings , api	2	6489	February 14, 2024
What am I doing wrong on my semantic search JSON embeded? API	16	4472	February 21, 2024
About the usage of ChatGPT Embedding API	9	4447	August 18, 2023
Embedding and searching from similar embeddings API	6	6603	October 27, 2023
Is there any sample code to split a json file into smaller chunks? API	11	13352	October 26, 2023

Help with determining if its less efficient to create embeddings based on JSON

Related topics