Which is best model for job-to-resume matching project?

Hi -

I’m a developer who has used GPT-4’s API from Python some. I work for an online marketplace site for finding (matching), hiring, onboarding, and managing freelance IT professionals (“Talent”) with those who need to hire them (“Hiring Clients”).

In a nut shell: we envision a Hiring Client pasting their page of Job Description text into a field, our site using either a model’s API to come up with the 5 best matching resumes.

Part of this whole process means getting tens of thousands (later x10 that number) of resumes into our model.

Right now the plan is to create a working proof of concept (“POC”) using only 1000 resumes.

As I plan this POC project, it would benefit us greatly to know the following:

  • Which of OpenAI’s models would best suit our needs to build this? We are confident that we will later need Enterprise but for this quick POC, we want to know if 3.5-Turbo, fine-tuned, would best suit our needs?
  • Does your GPT-4 Enterprise solution NOW allow us to fine-tune and/or add tens of thousands of resumes?
  • Any other ideas or recommendations?

Sincerely,
Scott Swain

Hi Scott,

You could embed your list of resumes in a vector DB and then perform a search on that with a requirements list from the client, that would pull back, lets say the top 30 matching entries, then you could pass each one of those to the GPT-4 model for it to decide on a match, you would have to perform some processing of the resumes so that they can be split into approximately 8000 token blocks and for each block to have a meta header to ensure that related embeddings are kept associated with one another, it will certainly require some problem solving, but it seems like a doable project with some time and effort.

1 Like

Hey Foxabilo -
Thanks for taking the time to share that plan/idea!
With regard to the part where I would search my vector database for the initial 30 resumes, would that part (1) use OpenAI’s API or (2) I would have to write some code to do that matching? If the answer is 2, that wouldn’t meet the requirements I’m constrained by. They want the matching to be purely by AI.

It would be via a call to a vector database retrieval system such as pinecone, weaviate or chromadb (to name but a few), along with calls to OpenAI’s GPT API. There is no way to do this via only AI inferencing. Unfortunately clients do not always know the limitations of the technology.

1 Like

Oh! So when you say “via a call to a vector database retrieval system, along with calls to OpenAI’s GPT API,” you mean using the API to ?manage? that database query?

Foxabilo -
May I ask for (1) Which vector DB you recommend for this kind of thing; and (2) Any resources you can recommend for me to learn to use the method you are talking about? Even if just a github project you know of that I can study.
THANK YOU!

Hi Hows your progress going related to the job to resume matching,. I am working on kind of same thing .can you explain what is your workflow regarding this project?

HI, I am also working on the same idea, it s my first time using openai api can you please tell the steps i need to follow .

Hi, I’m working on the same thing; perhaps we could implement it together. I’m tired of the current recruitment system and algorithms; I think they’re rubbish. My email is [my email]

Any update here.

Currently trying to achieve this using assistant API, by enabling Retrieval, function calling and file(resume in simple json)

But the assistant is not giving correct data.
It creates its own data and gives th result.

Any help here?

Personally: weaviate.io their cloud service in particular, very robust for the small buck (we run it for content suggestion system on 2M/visitors/monthly website beatofhawaii.com and the bill is around 75 USD month for 3k objects storage and 2B data points queried monthly)