Creating Embeddings for documents

saif_p · March 16, 2024, 8:26pm

Hello there,
I’m working on a RAG model using Llama index and Openai where I have a API that takes pdf files and stores it in a folder and then that folder is loaded and embeddings are crated for them to chat with the documents.
Now my requirement is how do I do the same without storing the documents/ pdfs locally???

RonaldGRuckus · March 16, 2024, 8:36pm

Lots of options. I’m on mobile so forgive the lack of links:

Weaviate (great for tinkering and building powerful comprehensive solutions)

Pinecone (has great docs, reasonable free tier)

Qdrant (performance wise amazing and slick)

Pgvector (great if you are comfortable with SQL, ties in with Postgres)

… And many more

saif_p · March 16, 2024, 8:56pm

Hey @RonaldGRuckus ,
Im using Qdrant to store all the things like my service content , storage context and documents using llamaindex.
My problem is im loading these documents from a folder called “DATA” which is in my local and I want to take it to production so I cannot have a data folder there so how to pass the documents to my qdrant without storing them?

RonaldGRuckus · March 16, 2024, 10:53pm

If I understand correctly you want to use your files (of whatever format but NOT embeddings) from your DATA folder to use in your production environment without storing them?

joyasree78 · March 16, 2024, 11:30pm

If you are using AWS, you can use S3 to land your PDFs. Other clou providers have similar object stores

saif_p · March 17, 2024, 8:23am

No I do not have any cloud storages! @joyasree78
Is there any way I can do it locally?

saif_p · March 17, 2024, 8:26am

Yes @RonaldGRuckus ,
My user will be uploading PDF files via the API and those pdf should be loaded as it will be uploaded to my Qdrant .
documents = SimpleDirectoryLoader(“data”).load_data()
This is my code snippet that uploads the document:
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context,
service_context=service_context,
)
I just have to get rid of that data folder and perform all the other stuffs as it is.

joyasree78 · March 18, 2024, 5:12am

If you are doing it on premise, please look at MINIO which is similar to object storage

Topic		Replies	Views
Seeking Advice: Uploading Large PDFs for Analysis with GPT-3 API API gpt-35-turbo , chatgpt , fine-tuning , api	7	5386	December 13, 2023
OpenAI Embeddings - Search through ~1000 PDFs API embeddings	0	1499	November 10, 2023
Best way to process PDF File that has over 100k lines? API embeddings , gpt-35-turbo , api	5	5744	November 27, 2023
Implementing a file upload in my application using open ai api API gpt-4 , chatgpt , plugin-development , api , chatgpt-plugin	7	2520	January 25, 2024
Converting PDF Files Text into Embeddings API	4	17214	December 18, 2023

Creating Embeddings for documents

Related Topics