Api generating wrong responses and skipping data I want

I have a bank statement in pdf format from which I extract text and feed it to chat completetion api. I want 2 discuss 2 examples

Example 1)
const bankStatementData = await openai.chat.completions.create({
messages: [
role: ‘system’,
content: ‘You assist with converting unstructued data to structured key value json format from bank statements. Strictly return response in json without any extra text in beginning or end’
role: ‘user’,
content: List all the transactions from the following bank statement that invloves any gambling website. Your whole response should be in structured key value format with the keys 'date', 'payment_method', 'description', 'paid_out', and 'paid_in': \n ${bankPdfData.text}
model: ‘gpt-3.5-turbo’,
// temperature: 0.5,

As you can see I asked it to list transaction that involves gambling website. Here it is responding Dominos, burger king and IMPERIAL PARK while they are not gamling websites. NATIONAL LOTTERY I WATFORD, and BATH RACE COURSE are the gambling websites that are present in the bank statement

Example 2)
I want to discuss about one more example code was same I just changed user role content to List all the transactions from the following bank statement. Your whole response should be in structured key value format with the keys ‘date’, ‘payment_method’, ‘description’, ‘paid_out’, and ‘paid_in’: \n ${bankPdfData.text It returned data in proper format but it skips entries in transactions in the json like there were 14 transactions done of 4 dec 2023 but it returned only 2 objects on 4 dec. Same problem is with another dates transactions are between 1 DEC 2023 to 30 DEC 2023.


What kind of data are you actually working with? Are these bank statements really large? Do they contain much unnecessary/irrelevant data?

3.5 is not really a “smart” model compared to the others. Do you prime at all to help it understand what is a gambling website? It doesn’t really look like it. You would do that in a system message which in this case is going to really help you out (having some kind of system message I mean).

I wouldn’t know that BATH RACE COURSE is a gambling site just by looking at it, for example.

And another issue is, National Lottery I Watford is not some “known entity” like Dominos or BK. The model isn’t going to know the “website” for these. So overall I would say your user message needs work, and you could make use of a primer system message.

As part of the user message, you could say something like, “Try to understand the purpose of the business by it’s name.” that would at least help depollute the possibilities in response.

Try lowering the temperature to .30 or .40 and see what happens.

Have you investigated function calling at all? If the bank statements are structured (I assume they are), you could flip your strategy backwards, preprocess those bank statements into JSON in a sub function, and then use function calling to give much better context to the model through that JSON. https://platform.openai.com/docs/guides/function-calling. You could then take the replies and process those into JSON however you want.

This will also allow you to better pick and choose the data you send to the model, which will likely lead to more accurate responses. The less relevant the user/system content is, the less relevant will be the model output.