Seeking assistance to extract specific information from the given prompt without generating new content using the OpenAI model

divya.nautiyal · June 14, 2023, 10:31am

Hey everyone,

I’m currently working on a project that involves extracting question and answers from an exam question PDF document and storing each question (along with options, answer, and explanation) in json format. To accomplish this, I’m using regular expressions to identify the questions in the text, and then utilizing the OpenAI GPT-3.5 Turbo model to generate structured outputs in JSON format for each question.

However, I’m encountering a specific issue with the model. Even though I’ve provided clear instructions in the prompt to only extract options if they are explicitly available in the text, the model still generates options from the explanation section. I want to ignore questions which do not have options in proper format (because some options are in the form of images).

Here’s the prompt I’m using: “Extract information from text. {format_instructions} The response should be presented in a markdown JSON codeblock. Question description: {inputText}.
Please remember that if options are not explicitly present in the prompt text in the form of ‘A. option_a_text’, ‘B. option_b_text’ and so on, do not extract ‘options’ from answer/explanation and set the ‘options’ field as an empty object and provide ‘result’ field as ‘failed’. If options are present, provide the correct ‘ans’ field with valid options (a, b, c, d, e) and provide ‘result’ field as ‘success’. Do not make up options or answers or explanation.”

I would greatly appreciate your assistance in understanding why the model is generating options in violation of the given instructions and if there are any potential solutions or alternative approaches that can improve the accuracy of option extraction.

PS :- I am using zod for schema validation and StructuredOutputParser from the langchain/output_parsers module to parse the output generated by the OpenAI GPT-3.5 model.

Foxalabs · June 14, 2023, 11:47am

My gut feeling is this might be a step to far for 3.5, it sounds very much in the realm of GPT-4’s capabilities though.

An alternative is to break the request into steps, so one might be to remove example question and then another might be to process the remaining ones.

Giving 3.5 more “time to think” by using multiple calls is usually advantageous.

divya.nautiyal · June 14, 2023, 11:55am

Thank you for your suggestion! I’ll give it a try and see if it helps in resolving the issue. Appreciate your input!

Topic		Replies	Views
Extracting long text from document Community gpt-4	3	1309	April 22, 2024
Trainining based on complex text API gpt-4 , chatgpt , api	8	1645	July 5, 2023
For Generate MCQ with question, option a, option b, option c, option d, correct answer, correct answer xaplanation, bloom level, diffciulty level, complexiety level etc... like 12 columns are present API gpt-4	5	656	April 24, 2024
Alternatives to negative prompting Prompting chatgpt	7	2122	October 2, 2023
Prompting based on contextual data in format of questionnaire Prompting api , gpt-35-turbo-1106	7	1222	February 4, 2024

Seeking assistance to extract specific information from the given prompt without generating new content using the OpenAI model

Related topics