Using GPT-4-Turbo to fill out complex PDF forms

kevob33 · April 19, 2024, 6:35am

Without providing many details, my company is an intermediary between other companies (clients) and end users (customers). We receive many different PDF forms from these clients which must be filled in with many customers’ data.

These forms can change or be replaced quite regularly and can have many pages worth of fields to be filled in, so hardcoding the logic to fill them in requires a lot of work each time one of the form changes.

What I am trying to do is to use OpenAI’s API to have GPT-4-Turbo fill out the PDF forms for me. In summary, I provide the model with the PDF form as well as all of the customer’s data we have in JSON, and then ask the model to put the correct data point in each field.

In reality, it is not as simple as this, mainly because although the model can view and read the PDFs, it cannot fill them in. So, my current approach is the following:

Tagging the PDF:
Luckily, each PDF has embedded fields/widgets (i.e. input text boxes) which can be read and edited programmatically. I iterate over all of these widgets and write a red integer ID in each, so that it looks like this:

Mapping the user data to each field:
Each page could have up to 50+ fields, and each PDF could have 10+ pages. I found that passing all of this in a single request overwhelmed and confused the model, so instead I send each page 1 by 1 and the results are much better.

I also found that using the Assisants API with an uploaded file of the PDF for retrieval would lead to very poor results where the model could not seem to pair the red IDs with the field names, hence it was filling in the form with the correct data, but assigning the value to the wrong integer ID. I assume retrieval extracted the PDFs as raw text to be embedded, so the spatial aspects of the pairs of field names and IDs and colours of the IDs are lost.

Instead, converting each page of the PDF into a high quality PNG image and adding it as image content to the message in the request solved this issue.

So, I now make a request to the Completions API with GPT-4-Turbo and provide:

a PNG image of a page of the PDF tagged with an red integer ID on each field
a JSON object of all of the customer’s data e.g.:

{
    "passport_number": 
    "X123456789",
    ...
}

Instructions on what the task is, including instructions to respond in JSON (and response_format={"type": "json_object"})in the format of:

{
    "<FIELD_ID>": "<VALUE>".
    ...
}

Filling the PDF:
I then parse the JSON response and programmatically fill each in the widget on the PDF corresponding the the <FIELD_ID>.

Results are very good for a single page!

Problem:
The main problem I am now having is that the model does not have context about what it was doing on the previous page, so if a section spills over on to the next page without repeating the section heading or anything, the results are incorrect.

An example of this is a form which requires personal details of a husband and wife. provide the model with both the husband and wife’s data. Page 1 has a section titled “Husband’s Personal Details” and then 3/4 way down the page, the next section titled “Wife’s Personal Details” begins and continues onto page 2. However, the top of page 2 does not have any title or context that it is a continuation of the wife’s details, so it starts filling in the husband’s data again.

I am not sure how to fix this. I am also sending each page as a new message in the same thread, so each request includes the messages with the previous images too.

Questions:

Does my overall approach seem decent? Having to tag the PDF with these red IDs and then convert it to an image doesn’t seem like the best approach.
Is using GPT-4-Turbo (i.e. Vision) the correct choice over the Assistants API?
Any suggestions on how to solve this main issue of sections being broken between pages.

RexRuthor · August 5, 2024, 2:37pm

Hey. Curious if you figured out a good strategy for this? Been working on something a bit similar and interested to learn the strategies that might have worked.

sudeep.pillai · September 4, 2024, 3:46am

Are these fillable PDF forms? We have a pretty robust solution we put together if you’re interested, DM me.

Full disclosure, we’re building VLM Run - https://vlm.run

RouseNexus · September 4, 2024, 8:46pm

Brutely, you could tell it to PAUSE, not stop, after each page and prompt to continue. If you tell it to stop or it thinks its stopped, it will start from the beginning. Then say, good job, please continue. It’s worked for me in a different environment… there may be a way to automate that… maybe.

tsjohnnychan · October 9, 2024, 3:51pm

Is your PDF the XFA format (the one most government forms use)? I am solving the problem to auto fill XFA format.

txlooofficial · November 29, 2024, 7:22pm

Not sure of an automated way, but the chatgpt allows documents to be uploaded, and they can fill it up for you and return the document back. You can try the free one but its capped, and then if it works just pay. But the manual part is still uploading the files. I’ve tried with a docx file but not a pdf. You can try converting pdfs to docx for process, i think its possible on word or maybe google docs. so that text data become text and not images. which becomes easier to process. And then convert the doc back to pdf when you are done.

Currently using a custom method to try to read a preschool teacher lesson plan and paste into a default evaluation format. If you want more elaboration i’ll be glad to share.

zahadka · February 28, 2025, 4:06pm

Hello
If you dont mind to share the tool that you are using for a mapping pdf form fields with the customer data, as I am working on a similar internal tool, where I need to fill with AI, different employees pdfs, and some pdfs are really wierd.
Thank you

Topic		Replies	Views
Trainining based on complex text API gpt-4 , chatgpt , api	8	1685	July 5, 2023
Handling Multi-Page PDF Parsing with GPT-4o-mini: Image Size Limit Issues API gpt-4 , api	2	1659	November 29, 2024
What is the best way to parse a PDF file with ChatGPT? API	9	49658	November 16, 2024
Document processing solutions API chatgpt , plugin-development , api , assistants-api	6	4641	April 3, 2024
Best practice scanned PDF / What model to use? API chatgpt , plugin-development , api , gpt-4-vision	3	1407	February 19, 2025

Using GPT-4-Turbo to fill out complex PDF forms

Related topics