OpenAI Embeddings - Search through ~1000 PDFs

Hey everyone, I’m new to AI world and I’m a bit unsure if I’m in the right spot, but here goes.

I’ve got this puzzle I’m working on with a pile of small PDFs—like 1000 of them. They’re short, around 5-7 pages each.

What I want to do is use some embedding magic to turn these documents into vectors and stick them in a database. The plan is, when I toss a question into the mix, I can scan this vector database to find the closest match to my query. his extra context should help OpenAI serve up a spot-on answer.

I have to questions regarding this:

  1. I want the response to include the document name or title where the answer was snatched from. (Is this even possible?)

  2. I’m curious if I can cut down on the embedding expenses by creating these vectors only once and saving them in a database. That way, when I fire off a new question, I can simply embed the prompt and hunt for the closest match in my pre-existing vector stash. And then use this result as a context to my question to llm?

1 Like