Function call doesn't retrieve the information from given file. It makes up the result

Hi, I’m creating a function that extract profile data based on given file. But when the function is called, it doesn’t extract information correctly. I tried gpt3.5 and gpt4, and also switch on & off for retrieval. it doesn’t work.

If I only use the retrieval with prompt asking for json object (without function call), it gives me correct information (but with a lot of unnecessary words). Probably something wrong with my configuration. Need help. Thanks!

My Assistant setting


my function setting:

Real name:

Turn on retrieval. Try again. :woozy_face:

I tried your suggestion. It doesn’t work. The result still make up other names. Seems it doesn’t read the file content

This may have to do with the PDF file itself, and the document extraction that can be performed on it.

Many PDFs do not have searchable text; instead, they are just pictures of documents that have been scanned or rendered as outlines. The PDF file can even be password-locked.

If you have full Adobe Acrobat, you can “enhance PDF” to add plain text by OCR, add bookmarks and hierarchy, etc. See if you can highlight and cut-and-paste individual text sections from the PDF in a reader.

The best format to provide the AI is plain text, that has been separated into sections with headings. Then you don’t have to rely on OpenAI’s document extraction technology that may be hit and miss, and also don’t need to wait as long for a document to be made ready before it can be employed (which also one is not warned of - you have to provide your own delay before using an assistant after it is created).

PDF file is ok. If I don’t use the function, only use retrieval, it can extract correct information. But with the function, it starts to make up answer. I assume there should be some setting that I didn’t do it right.

But I prefer use the function way to get the json object.

The invocation of tools should not be needing user command like you show, unless debugging. Their utility in fulfilling user input should be obvious to the AI from the description, and automatic.

Tools cannot be an output format, as they will need a return value to continue a thread - and then the AI will be confused when it is still told to output into a tool. You’d have to abandon the thread, and assistants is the wrong tool for 0-shot.

I would stop with the “playground” and move your calls over to the real API with programming. Then you can write messages that attempt to absolutely extract exact text from a document, or have the assistant replay in chunks what was placed into context by commands to figure out if, when, and how document retrieval works on the PDF. Then you know if retrieval while employing a function is just plain unworkable.

The playground for assistants is just adding even another layer of unknown untrustable code of somebody else’s to a product already not worth the effort of trying to make it work reliably. PDF will always be problematic even though every middle manager is “I want to chat with my PDFs”.

If you survey the replies of this forum in depth, and you’ll see either new users trying this, or experienced users “nope, it was unreliable, costly, poorly imagined, restrictive. Switched back to chat completions, now a portable solution with my own embeddings vector knowledge database.”

2 Likes

@nodttt consider and worth checking if the PDF is scanned or some part of it is? As suggested earlier by our @_j