I’m using the ChatCompletion API to extract some information from one (or more) OCR’d documents. It works pretty well, but I want to make it more robust by restricting the choice of certain fields with custom knowledge bases.
The data
For instance, GPT is asked to find the customer who issued the document. Since the document might contain multiple named entities, the extraction would be greatly facilitated if GPT could access our address book. It contains both the full and the abbreviated name of the partner companies, along with other information (which is not really required for our purposes):
Name
Abbreviated Name
Address
City
…
Fake Company
FCMP
Fake address
World
…
Another Cool Company
ACC
…
…
…
Yet Another Example
YEAEX
…
…
…
Another field that could benefit from access to a knowledge base is the vessel name. I would ideally want GPT to know the name of all the existing vessels when trying to retrieve such an information, to limit errors and improve robustness.
Name
First Vessel
Another Vessel
Vessel III
…
The solution
I’m fairly new to the API and I’ve been looking at ways to achieve this, though I’m even doubting its feasibility. Passing these lists in the query is not an option due to their size. Embeddings, as of my understanding of what I read online, are also not ideal because of i) the very nature of the task at hand and ii) the type of data, which mainly consists of named entities, abbreviations etc. Finally, I don’t see the logic that could make function calls work.
Does anybody know if passing this sort of knowledge is possible? If so, how? Any input or tip is more than welcome. Any food for thought is appreciated.
I am not yet sure I fully comprehend what you are trying to achieve. Are you trying to match an information about the entity and/or vessel you retrieve from the OCR’ed document with an entry in the address book?
How is your address book stored currently?
If you could provide more context I might be able to share a solution to address a problem that is of similar nature.
Hi!
This sounds like a typical use case for function calling. Since you say that you are new I will link the documentation. It should answer your question quite well.
I am not yet sure I fully comprehend what you are trying to achieve. Are you trying to match an information about the entity and/or vessel you retrieve from the OCR’ed document with an entry in the address book?
Yes. Consider the following example. I am passing the text extracted from a document to the ChatCompletion API, and I’m asking GPT 4 to identify – among other fields – the name of the company who issued the aforementioned document (which we’ll refer to as the customer). The challenge here lies in the fact that the document contains more than one named entity that could potentially fill the customer field. Consequently, GPT often mistakenly identifies the wrong company as the customer.
However, only one of these names is listed in our address book. If GPT was aware of our current customers, it could much more robustly return the correct customer name.
How is your address book stored currently?
We managed to export an .xlsx from our data management system, so now I have an excel sheet that looks like the table I provided in my original post. I am of course more than willing to do further processing if needed.
All of the above also applies to the retrieval of the vessel name. Let me know if this clarified the problem!
Thanks for the links! I spent the last few days digging in the documentation, and I also got the feeling that the best route is to opt for function calling. However, I don’t fully see how to connect function callings to my specific problem.
Say I’m trying to retrieve the vessel name. I thought I could make a function that accepts a list of potential matches, compares each name to a list of existing vessels and returns the one name that was found in the list. Then I would have GPT call this function, and pass as input arguments a list (of arbitrary length) containing potential matches.
However, I feel like this is a workaround and not a direct solution to my problem. I feel like there’s a way to do this much more efficiently and directly.
Assuming you have already confirmed that when passing in a long, unsorted list of vessel names into the OCR process the hit rate goes up then I suppose this list is too long to pass into the prompt (or too expensive) and now this process needs some type of bridge to pass in a short list of potential matches and this cannot work without previously returning a result from the OCR functionality in order to get said matches for the short list and then run the OCR again?
Or, you get a somewhat exact vessel name from the OCR you could probably instruct or fine tune a model to respond in JSON format and skip the function call altogether using the return as input for the database search?
Spelling out the thinking process like this reveals maybe a few new perspectives on the issue.
Edit: you could also take a look at image embeddings, in case you can extract the post of the image that contains the vessel name.
This link may be helpful.
Hi again matdog - thanks for clarifying. With that additional background, I would also agree that some form of function calling seems like the best approach along the lines of what you laid out, i.e. using the names identified from the doc as input to perform a search/query for a match in your address list for the full and/or abbreviated name, which is then returned as input to answer the question. Especially, if your address book is dynamic with new customers and other entries being added, this should result in more reliable results.
Of course, this approach is somewhat contingent on the OCR outputs being accurate in the first place.
As an afterthought: if you have a static list of company names, you could technically also create a finetuned model for this purpose whereby you instruct it as part of the system message to identify the correct company name based on a pre-defined list of company names. As an input for the training you could use as “user input” the list of extracted names from the document, and the model or “assistant’s output” could then be the correct entity.
In the training itself, you would ideally want to supply the list of company names as part of the prompt. If you use enough training examples in the finetuning, you then technically no longer need to supply the list of names once you use the finetuned model in a production environment.
I still think though that function calling would be the way to go.