Which is best model for job-to-resume matching project?

  1. Based on the type of the repository you need an API to access the files within that repository it might be Google cloud or simply downloading through rest API so what is your repository?
  2. Once the file is downloaded it should be converted to plain text so that you can start processing it. Personally, at LAWXER.ai we use convertAPI for that (find on Google)
  3. Once the file is converted to plain text you need to pre-process it using one or several AI models to bring it to standard formatting so that your paragraphs and titles are separated and paragraphs are not broken.
  4. Once you have that you need to identify the titles among the paragraphs so that you can split the text into chunks on those titles. To get your sections within a CV.
  5. Once you have sections you might use another model to understand what is inside the section and summarize it to you, or to get the entities subjects and other things because that’s what will improve the quality of your RAG.
  6. Based on the application requirements you might need to create fields for the object you will be storing in your vector database. Personally for this case you might use something like title, subjects, text, document ID, section ID. This object will represent sections in your database.
  7. You also will need another class in your vector database which will be documents and for the document you will have the following fields: document ID, title, name… And other fields which will apply to your CV.
  8. Store all the sections and the document object in your vector database.
  9. When retrieving the documents it is better in your case to have double search queries were in one query you search for documents and in the other search for sections.
  10. Once the results found you need to combine them together and reclassify based on their relation to the query in each of the sub-queries (like maybe ranking sections higher than documents or vice versa). The goal is to come up with document IDS from both of the queries.
  11. Same applies when you want to extract data.
  12. We’ve implemented a mechanism where data can be extracted in automatic mode after configuring the extraction tasks once. Basically it’s just a list of questions and queries to pass to the rag engine and the model to be able to extract the data in a specified format from retrieved context. Then that data is stored in a separate database which is used for a classification of the documents.

And then you have the whole application to build so that you can display that data to the user.

2 Likes

Thank you sir :pray: that´s awesome. I really appreciate it.
Best!

You’re very welcome. Let me know how it goes

hello, i want to list all the resumes from my vector database matching the description of the job, can you suggest how can i achieve that, whether by applying pagination or not, or do i need new type of database?