Hi,
I am working on a project where I want to look up information from a PDF file. For a certain use-case, I need to do a relatively simple look-up in a table. There are 2 tables after each other in the source document. I already converted the PDF to markdown to see if that makes it easier for GPT to select the correct one. But to no success so far.
Context (snippet from the MD file):
Categorie 1 | Categorie 2 | Categorie 3 | Categorie 4 | Categorie 4 | |
---|---|---|---|---|---|
0j | 14,1124 | 14,1124 | 14,1941 | 2.439,9800 | 14,8177 |
1j | 14,7451 | 14,7451 | 14,8815 | 2.552,5300 | 15,5012 |
2j | 15,0368 | 15,0368 | 15,1760 | 2.612,3500 | 15,8645 |
3j | 15,3114 | 15,3114 | 15,5028 | 2.659,6600 | 16,1518 |
4j | 15,5082 | 15,5082 | 15,7579 | 2.684,0300 | 16,2998 |
5j | 15,5082 | 15,5082 | 15,8646 | 2.719,2900 | 16,5139 |
6j | 15,5082 | 15,5082 | 15,9742 | 2.719,2900 | 16,5139 |
7j | 15,5082 | 15,5082 | 15,9742 | 2.754,5100 | 16,7278 |
8j | 15,5082 | 15,5082 | 15,9742 | 2.754,5100 | 16,7278 |
9j | 15,6479 | 15,6479 | 16,1145 | 2.778,8500 | 16,8756 |
10j | 15,6479 | 15,6479 | 16,1145 | 2.779,5900 | 16,8801 |
11j | 15,6479 | 15,6479 | 16,1145 | 2.780,4300 | 16,8852 |
12j | 15,6479 | 15,6479 | 16,1145 | 2.781,1000 | 16,8893 |
13j | 15,6479 | 15,6479 | 16,1145 | 2.781,8400 | 16,8938 |
14j | 15,7877 | 15,7877 | 16,2548 | 2.805,5400 | 17,0377 |
15j | 15,7877 | 15,7877 | 16,2548 | 2.806,3500 | 17,0426 |
16j | 15,7877 | 15,7877 | 16,2548 | 2.806,3500 | 17,0426 |
17j | 15,7877 | 15,7877 | 16,2548 | 2.806,3500 | 17,0426 |
18j | 15,7877 | 15,7877 | 16,2548 | 2.806,3500 | 17,0426 |
19j | 15,9276 | 15,9276 | 16,3953 | 2.829,3000 | 17,1820 |
20j | 15,9276 | 15,9276 | 16,3953 | 2.829,3000 | 17,1820 |
21j | 15,9276 | 15,9276 | 16,3953 | 2.830,4200 | 17,1888 |
22j | 15,9276 | 15,9276 | 16,3953 | 2.830,4200 | 17,1888 |
Categorie 5 | Categorie 5 | Categorie 6 | Categorie 6 | Categorie 7 | |
---|---|---|---|---|---|
0j | 2.579,8000 | 15,6668 | 2.648,0500 | 16,0813 | 3.010,8500 |
1j | 2.672,6000 | 16,2304 | 2.751,6800 | 16,7106 | 3.048,2600 |
2j | 2.727,9200 | 16,5663 | 2.817,5300 | 17,1105 | 3.083,8300 |
3j | 2.771,0100 | 16,8280 | 2.870,0200 | 17,4293 | 3.119,6400 |
4j | 2.801,4900 | 17,0131 | 2.910,6000 | 17,6757 | 3.154,7800 |
5j | 2.819,2400 | 17,1209 | 2.936,9400 | 17,8357 | 3.190,8800 |
6j | 2.837,0100 | 17,2288 | 2.964,2300 | 18,0014 | 3.226,2800 |
7j | 2.854,6900 | 17,3362 | 2.991,3700 | 18,1662 | 3.262,7400 |
8j | 2.872,4300 | 17,4439 | 3.017,8600 | 18,3271 | 3.299,3300 |
9j | 2.897,5200 | 17,5963 | 3.043,8600 | 18,4850 | 3.328,7800 |
10j | 2.898,2000 | 17,6004 | 3.045,0500 | 18,4922 | 3.330,4000 |
11j | 2.898,9700 | 17,6051 | 3.046,1300 | 18,4988 | 3.332,0300 |
12j | 2.899,6800 | 17,6094 | 3.047,2400 | 18,5055 | 3.333,5900 |
13j | 2.900,4200 | 17,6139 | 3.048,3100 | 18,5120 | 3.334,3000 |
14j | 2.925,5000 | 17,7662 | 3.074,3900 | 18,6704 | 3.363,9700 |
15j | 2.926,1700 | 17,7703 | 3.075,5600 | 18,6775 | 3.365,7900 |
16j | 2.926,1700 | 17,7703 | 3.075,5600 | 18,6775 | 3.366,8400 |
17j | 2.926,1700 | 17,7703 | 3.075,5600 | 18,6775 | 3.367,8900 |
18j | 2.926,1700 | 17,7703 | 3.075,5600 | 18,6775 | 3.367,8900 |
19j | 2.950,5300 | 17,9182 | 3.100,5100 | 18,8290 | 3.395,8900 |
20j | 2.950,5300 | 17,9182 | 3.100,5100 | 18,8290 | 3.395,8900 |
21j | 2.951,8000 | 17,9259 | 3.101,7900 | 18,8368 | 3.396,7100 |
22j | 2.951,8000 | 17,9259 | 3.101,7900 | 18,8368 | 3.396,7100 |
Prompt:
query = ‘Wat is het loon van iemand in categorie 6 met 20j ervaring?’
prompt = f"“”
Antwoord op de vraag met behulp van de bijgevoegde context.
Het is mogelijk dat je het antwoord moet vinden door het doorlopen van verschillende stappen.
We gaan een voorbeeld geven van een redenering die je kan gebruiken om het antwoord te vinden.
Wat is het loon van een chef 20 jaar ervarig?
Stap 1: Vind de functiecategorie van een chef.
Stap 2: Vind het loon van een chef met 20 jaar ervaring.
Geef de volgende zaken terug:
Antwoord: Het antwoord op de vraag, gebruik makende van de context.
Redenering: Op welke manier ben je het antwoord bekomen.
Geef de output in json formaat terug.
Context: {txt}
Vraag: {query}
“”"
Code:
create a chat completion
chat_completion = openai.ChatCompletion.create(model=“gpt-3.5-turbo-16k”,
messages=[{“role”: “system”, “content”: “You are a helpfull assistant that helps recruiters find back information. The language spoken from here on out is dutch”},
{“role”: “user”, “content”: prompt}])
The response I get is as follows:
Antwoord: Het loon van iemand in categorie 6 met 20 jaar ervaring is €2.829,30 per maand.
Redenering:
- Zoek de juiste categorie op in de tabel: categorie 6.
- Zoek het juiste aantal functiejaren op, in dit geval is dat 20 jaar.
- Lees het bijbehorende loon af in de tabel, wat in dit geval €2.829,30 is.
Which translated means:
Answer: The salary of someone in category 6 with 20 years of experience is €2,829.30 per month. Reasoning:
- Locate the correct category in the table: category 6.
- Find the appropriate number of years of experience, in this case, it is 20 years.
- Read the corresponding salary from the table, which in this case is €2,829.30.
Has anyone else had some issues with this? I cannot just edit the source document because the final deployment will look at multiple pdf files.
Many thanks in advance! This has been breaking my head.