I have custom data, where some words are duplicated in JSON format. In that case, it does not give correct results. For example:
From the above example, when I ask questions like “Provide the information of ‘Rate’.”, It gives the value correctly, but not the confidence score.
Have you tried with a oneshot example prompt like as below. Please also see if you get a better response using gpt-4-turbo.
I will provide you a JSON with information about Rate and Interest Rate. You need to answer based on the provided JSON.
Here is an example that you can follow
Provide the information of ‘Rate’.”
I tried this , but it is giving me only value.
The solution is to use the correct language to describe objects the AI receives.
I have provided a PDF file, which has a JSON object with a lot of dictionaries in it. If I ask a question from that PDF, it does not give the correct dictionary answer instead it gives 70% correct answer.
When you say “provided a PDF file”, does that mean that you are using the assistants API method and uploading the file itself?
That relies on OpenAI’s tools and methods for extracting information from a PDF, files which may be different degrees of impenetrability.
If this is data that your API is built on and relying on, I instead would process the PDF to searchable text yourself with a high quality tool such as Adobe Acrobat, and then copy out the text or extract the text with a python library.
By doing so, you can see what is lacking in the AI understanding of the text within, and fix up that documentation so it is plain text that is easily understood by the AI. It might even be placed into named sections as files, so that retrieval can be specific to parts that can be semantically searched upon.
Using a single type of data, we have a format we can reference, but many varied types may need varied types of requests. When in JSON format, you can see that in my chat share for working with your first example, I specified the exact object type and keys for retrieving within the JSON. You can similarly provide a schema before the JSON itself, or even reprocess the key names so they are distinct and easily understood.
The AI is not magic, but you can improve your chance of success by making the language it receives easily understood.
I was providing the PDF file to the ChatGpt-4
That’s interesting, because ChatGPT Plus with GPT-4, the web-based chatbot, used in its default mode, doesn’t have that type of PDF reading natively. What the AI would have to do is, by having the document uploaded to a python code sandbox for advanced data analysis, write its own code for extracting text using existing and available python libraries for parsing PDF.
The amount of data then returned to the AI also cannot be large or searchable; the extracted document text must be constrained in size or cut off, besides the lower quality of text than that specifically prepared for AI consumption.
The approach you would likely want to follow is to build a custom GPT - which is an agent that has more built in methods for document parsing and searching within documents, attaching documents to a data storage and retrieval method. You also can give the AI instructions stating the importance of identifying the correct json and the correct keys within json that the user may be referring to, asking for clarification if necessary.
Whether you’d want to go through this learning and work depends on how casual or how permanent and repetitive the particular task is.
I see that ChatGPT plus GPT-4 is not accepting the scanned PDF. Instead, it is asking for the text PDF.
And also when I made the screenshot from that scanned PDF, it gave me the correct responses.