GPT Assistant talks about their task or just posts an example instead of actually performing the task

Greetings,

I have been working on creating an Assistant which includes code interpreter to parse a document and pull out the table of elements in JSON format. Even though I give very clear instructions not to leave any elements or rows out, and to always send the JSON only in the final message, the assistant always leaves lines from the table out, usually citing that they are providing an example or they give JSON with sections absent such as the example below.

Has anyone run into this before and, if so, what are the steps to ensure that the assistant is accurate in their parsing and return of data.

{
    "quote_table": [
        {
            "quote_row": 1,
            "column_values": [
                {"order": 1, "text_value": "SL. NO.", "num_value": null, "comp_designation": 0},
                {"order": 2, "text_value": "ITEM DESCRIPTION", "num_value": null, "comp_designation": 0},
                {"order": 3, "text_value": "UNITS", "num_value": null, "comp_designation": 0},
                {"order": 4, "text_value": "QTY", "num_value": null, "comp_designation": 1},
                {"order": 5, "text_value": "UNIT PRICE IN AED", "num_value": null, "comp_designation": 2},
                {"order": 6, "text_value": "TOTAL VALUE IN AED", "num_value": null, "comp_designation": 3},
                {"order": 7, "text_value": "MAKE", "num_value": null, "comp_designation": 0}
            ]
        },
        {
            "quote_row": 2,
            "column_values": [
                {"order": 1, "text_value": "1", "num_value": null, "comp_designation": 0},
                {"order": 2, "text_value": "1/2\" ERW MS Black Pipes Acc. ASTM A53 GR.B SCH 40 in 6mtr length", "num_value": null, "comp_designation": 0},
                {"order": 3, "text_value": "MTR", "num_value": null, "comp_designation": 0},
                {"order": 4, "text_value": "", "num_value": 12, "comp_designation": 1},
                {"order": 5, "text_value": "", "num_value": 7.5, "comp_designation": 2},
                {"order": 6, "text_value": "", "num_value": 90.00, "comp_designation": 3},
                {"order": 7, "text_value": "UAE / GCC", "num_value": null, "comp_designation": 0}
            ]
        },
        ... (additional quote_row objects representing each row in the table)
        {
            "quote_row": 16,
            "column_values": [
                {"order": 1, "text_value": "15", "num_value": null, "comp_designation": 0},
                {"order": 2, "text_value": "14\" ERW MS Black Pipes Acc. ASTM A53 GR.B SCH 40 in 6mtr length", "num_value": null, "comp_designation": 0},
                {"order": 3, "text_value": "MTR", "num_value": null, "comp_designation": 0},
                {"order": 4, "text_value": "", "num_value": 6, "comp_designation": 1},
                {"order": 5, "text_value": "", "num_value": 600.00, "comp_designation": 2},
                {"order": 6, "text_value": "", "num_value": 3600.00, "comp_designation": 3},
                {"order": 7, "text_value": "UAE / GCC", "num_value": null, "comp_designation": 0}
            ]
        }
    ]
}

First pick one: GPTs in ChatGPT plus, or "Assistants" on the API.

Beyond a certain point, about 500 language tokens, the AI models have been tuned to start limiting the amount that it will produce as language. This can be writing code with “you write more here”, or unsatisfactory copyrighting tasks.

That also means that taking in a function return from the python sandbox and having the AI repeat it back will also be reinterpreted.

You might instead look at annotations, having the output as a saved file, still within the sandbox, returned to you for download, so the AI doesn’t have to talk about the contents. (this has been a bit flaky along with most assistant stuff, like the AI giving you a link to /mnt/mydata.file that you can’t access yourself.

https://platform.openai.com/docs/assistants/tools/reading-images-and-files-generated-by-code-interpreter

Thanks for the clarification. I’m only using the Assistants APIs though custom flask applications that I’ve created to help parse the JSON. I was getting decent responses previously until I asked for more structure to the response. I’ll attempt to save the contents but will probably look to more traditional solutions if that is a no go.

Curious however if, in your opinion, Vision could be used if the documents are broken up into single pages and then the results stitched back together. Thanks for your response.

gpt-4-vision could be used as OCR - if you want to pay 1.5 cents a page for things already scanned…and 50 per day max. (and also not in an assistant).

Then there is the fact that images of a page are downsized to 768 pixels wide with OpenAI’s GPT Vision. That very few pixels per letter for something like a scientific paper.

Assistants has retrieval that likely uses python tools to OCR pages, but it is opaque in operation and non-retrievable.

“Documents” that actually contain the text don’t need image tools.


However, how about a preview of something new? I make a screenshot from my web browser plugin, and then some AI OCR, and then some AI “auto-redact” (of course the private info being sent to Microsoft Azure in order to perform that). All in a default future Windows 11 app.

(was just looking - glad my Fujitsu scansnap scanners came with software that isn’t a “1 year free trial to Adobe PDF cloud”)