Hey, thanks for helping me!
I took your advice, but it’s simply not working…
Here is my system message
"role": "system", "content": """
System Prompt for Extracting Book Metadata into JSON
Task Explanation:
You are tasked with analyzing a continuous text that contains multiple entries of book metadata. This text includes several pieces of information for each book related to university courses, such as course codes, book titles, authors, publication years, and ISBN numbers. Note that some entries might also list editions and publishing houses. Your goal is to parse these details accurately from a raw textual format.
JSON Schema Description:
The output should conform to a specific JSON schema. Each book entry must be converted into a JSON object that includes the following properties:
- course_code: the unique identifier for the course associated with the book.
- course_title: the full title of the course.
- title: the title of the book.
- author: an array of authors associated with the book.
- publisher: the name of the publishing house.
- ISBN: the international standard book number.
- year: the year of publication.
- edition: the edition of the book, if applicable.
- article_or_book: a string indicating whether the item is a 'book' or an 'article'.
Detail Extraction:
Carefully extract and accurately parse the details such as the names of multiple authors, distinguishing between editions and publication years, and categorizing the resource as either a book or an article. Ensure that each attribute is placed correctly according to the specified JSON schema.
Output Requirements:
The final output must be formatted as a JSON array, where each book entry is a separate object within the array. Emphasize the importance of strict adherence to the JSON formatting and accuracy in the representation of extracted data.
Handling Ambiguities:
In cases where certain information is missing or ambiguous, leave the respective field blank in the JSON object. If there are educated guesses to be made based on the context of the information provided, do so cautiously and mention these assumptions explicitly in your processing.
Iterative Improvement:
You are encouraged to refine and optimize your data parsing logic based on the accuracy of initial outputs. If the initial data extraction contains errors or inaccuracies, adjust the parsing mechanisms to improve data quality in subsequent attempts.
JSON Schema:{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://example.com/course.json",
"title": "Course",
"description": "A representation of a course and its associated book or article",
"type": "array",
"items": {
"type": "object",
"properties": {
"course_code": {
"description": "The unique code for the course",
"type": "string"
},
"course_title": {
"description": "The title of the course",
"type": "string"
},
"title": {
"description": "The title of the book or article",
"type": "string"
},
"author": {
"description": "The list of authors of the book or article",
"type": "array",
"items": {
"type": "string"
}
},
"publisher": {
"description": "The publisher of the book or article",
"type": "string"
},
"ISBN": {
"description": "The ISBN of the book",
"type": "string"
},
"year": {
"description": "The year of publication of the book or article",
"type": "string"
},
"edition": {
"description": "The edition of the book",
"type": "string"
},
"article_or_book": {
"description": "Indicates whether the resource is a book or an article",
"type": "string",
"enum": ["book", "article"]
}
},
"required": ["course_code", "course_title", "title", "author", "publisher", "ISBN", "year", "edition", "article_or_book"]
}
}
"""
},
Here is my user message
{"role": "user", "content": """
Introduction:
You are provided with a text that includes a series of book entries extracted from university course syllabi. This text is rich in metadata detailing the books assigned or recommended for various courses. Each entry is distinct and corresponds to specific courses offered at the university.
Instruction on Details:
As you review the text, please note that each book entry contains comprehensive metadata such as the course code it is associated with, the book title, author(s), publication details including year and publisher, ISBN, and possibly the edition. Each entry is meant to be parsed and then structured according to these details.
Submission Format:
The text is presented in a plain, unformatted manner. It does not include any additional markup or styling, which ensures that the focus remains strictly on the raw data provided. This setup is crucial for accurately parsing the text into the required JSON format.
Feedback Request:
After processing, if any part of the text or its details appear unclear or incomplete, feedback may be requested to ensure the accuracy of the information before final processing. If you anticipate any ambiguities or potential issues with the entries, please indicate these upfront to facilitate a smoother extraction process. Here is the text of all the books: """ + clean_text},
Note that the variable clean_text contains the long text with all the titles. Here is how the structure of clean_text looks, even if it’s longer:
Here comes the next reading list: Lund University
Reading list for IBUG41, International Business: Business
Ethics and Sustainability, effective from the spring semester
2022
The reading list is established by the Director of Studies at the Department of Business Administration on 2023-09-15 to be effective from 2023-09-15
Scientific articles on Business Ethics and Sustainability. About 250 pages
Here comes the next reading list: Lund University
Reading list for ENTA70, Entrepreneurship and
Project Management, effective from the autumn semester 2022
The reading list is established by the Director of Studies at the Department of Business Administration on 2022-05-01 to be effective from 2022-05-01
Landström, H & Löwegren, M (2022): Entrepreneurship - from thought to action.
Student literature
How should I go at it?
The system keeps giving me a response with the correct JSON Array format and understanding of the task but it only includes two titles.