Creating Embeddings for documents

Hello there,
I’m working on a RAG model using Llama index and Openai where I have a API that takes pdf files and stores it in a folder and then that folder is loaded and embeddings are crated for them to chat with the documents.
Now my requirement is how do I do the same without storing the documents/ pdfs locally???

Lots of options. I’m on mobile so forgive the lack of links:

Weaviate (great for tinkering and building powerful comprehensive solutions)

Pinecone (has great docs, reasonable free tier)

Qdrant (performance wise amazing and slick)

Pgvector (great if you are comfortable with SQL, ties in with Postgres)

… And many more

1 Like

Hey @RonaldGRuckus ,
Im using Qdrant to store all the things like my service content , storage context and documents using llamaindex.
My problem is im loading these documents from a folder called “DATA” which is in my local and I want to take it to production so I cannot have a data folder there so how to pass the documents to my qdrant without storing them?

1 Like

If I understand correctly you want to use your files (of whatever format but NOT embeddings) from your DATA folder to use in your production environment without storing them?

If you are using AWS, you can use S3 to land your PDFs. Other clou providers have similar object stores

No I do not have any cloud storages! @joyasree78
Is there any way I can do it locally?

Yes @RonaldGRuckus ,
My user will be uploading PDF files via the API and those pdf should be loaded as it will be uploaded to my Qdrant .
documents = SimpleDirectoryLoader(“data”).load_data()
This is my code snippet that uploads the document:
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context,
service_context=service_context,
)
I just have to get rid of that data folder and perform all the other stuffs as it is.

If you are doing it on premise, please look at MINIO which is similar to object storage