Can I use Retrieval with Completion API (not Assistant)?

The specific use case I have in mind is to search and summarize esoteric topics found in a large pdf document.

The reason I do not want to use Assistant is because I want the ability to discard the Retrieval result if the (cosine) similarity is below a certain threshold (which means the topic is not found), so I do not have to run the result through LLM (save on token cost).

If the Retrieval result has cosine similarity above a threshold, then I will pass the result to Completion API for further processing.

That is the idea. Can I achieve the above using Retrieval?

(I don’t want to chunk and store in some vector database myself, since I am not sufficiently familiar with various chunking algorithms. I much prefer to just use Retrieval to do it all for me)

The knowledge retrieval within Assistants is very rudimentary, always injecting some file into the AI context regardless of relevance, and then adding a search function the AI can call. The search powering the tool is likely some Azure engine and not OpenAI embeddings, also has no threshold, and you also have no control over it and cannot observe what’s been placed into a thread by this tool OpenAI wants to pretend works differently.

So no, there is nothing usable or useful for you outside of running the assistant and paying for the AI to start searching your documents.


Retrieval is a broad concept that includes chunking, vectorization, storing vectorized data into a vector database, and ranking afterwards.
So, using Retrieval necessarily includes the above elements.

Sorry. To be clear, when I say “Retrieval”, I mean the Knowledge Retrieval tool associated with Assistant API.

What I mean to ask is whether there is a separate Retrieval API I can use with Completion API (separate from Assistant API). At a minimum, I would want the Retrieval API to give a cosine similarity (so I can discard the result if it is below a certain threshold) and generate result(s) which I can then use in a thread/ with Completion API.

Thank you.

That is disappointing.

From my basic testing, I am actually fine with the result produced by the Knowledge Retrieval tool within Assistants.

But like you said, I need to be able to observe the cosine similarity and what is actually retrieved. It sounds like I have to build this myself.

If I just wanted something simple/rudimentary, is there a similar API I can use that will do all that (chunking, vectorization, storing in vector database, query, ranking) in one fell swoop?

How should I go about approaching this? Appreciate you pointing me in the right direction.

The API available through OpenAI is “embeddings”, simply returning the embeddings vector state that captures semantics of an input. It can then be compared to other embeddings.

There are more developed database products out there that employ semantic search, like on Azure. However, it is not terribly difficult to make your own database that is the size of some documents for personal use.

I count around 10 lines of operation to capture embeddings of multiple inputs in this off-hand example I use to demonstrate.

The part that’s missing is the chunking logic, where you can customize the size and overlap and metadata for your application to exceed that of someone else’s solution, and the vector database, which besides providing storage, can have optimized methods for exhaustive search across the entirety of the embeddings to return a top-k.

Great question @harvey1 . I have done similar implementation and I thought I’d create a video on how you could do this.

In this video, I explain how you can use PineCone (Retrieval) and Assistant API’s (which you could just change to completions if you want)