Instruction to export GPT-Generated text into Structured DOCX

Hello , I need your valuable assistance to accomplish the following task:

I have a GPT that enables me to create clinical medicine scenarios with patient information, etc., followed by the details of an evaluation grid. I’ve formatted the response to include:

•	Scenario
•	Patient information..
•	Information for the student…
•	Evaluation grid

However, the issue arises here:
I wish to export the GPT discussion with all content ( Scenario, Patient information, Information for the student, Evaluation grid ) into a DOCX format.
I’ve tried to compiled an instruction document combining NLP, utilizing Named Entity Recognition (NER), which are then made available to Python to use these discussion data to create a clean DOCX document.
Yet, it struggles to achieve what I truly want, delivering unstable results that worsen the more I try to correct them. The main problem leading me to use Python for this task is my desire to present the evaluation grid in the document as a table with 4 columns and all the criteria lines for my evaluation sheet…

The issues I encountered without using complex Python techniques were due to a lack of structure.

And with Python, the problem is that it does not take into account the entire discussion and the elements created just before. Therefore, once I download my DOCX, it is partially incomplete.

Does anyone have a method to accomplish this, perhaps simpler and more effective?

Thank you very much for your assistance.

1 Like

What specific instructions are you using for the creation of the file?

It should be doable provided you clearly outline the target structure of the document and components such as tables and map to that the information from your discussion.

Hey, here are the instructions that I’ve further adapted and tried to “simplify”. I’m still encountering the same issues: either certain content is omitted, or the layout is not adhered to. Or the opposite happens. If I’m lucky enough that all the content is properly extracted, then it’s the final layout that’s the issue:

Instructions for docx creation:

• I am designed to meticulously compile information from our conversation into a structured document format, ensuring that every detail is captured accurately and entirely for a comprehensive presentation. My primary goal is to safeguard the integrity of the information extracted, which is crucial, especially given the serious implication of potential loss if the information is not handled with the utmost precision. To achieve this, I follow a specific set of guidelines:
• Data to compile:
1. SSP NUMBER: For the “ssp_number” field, I capture the SSP number exactly as it appears in our discussion. This includes every character, spacing, and punctuation associated with the SSP number.
2. INTRODUCTION: In the “introduction” section, I meticulously record every word, detail, and formatting element. This includes, but is not limited to, text following colons, commas, text within asterisks “**” (indicating emphasis), line breaks, bullet points, indentations, and parentheses “()”.
3. PATIENT INFORMATION: The “patient_info” captures all patient-related details exactly as provided. I ensure to include all textual nuances such as text after colons and commas, indentations, line breaks, bullet points, and text within parentheses “()”.
4. POSSIBLE DIAGNOSTICS AND EXAMS: In “diagnostics_and_exams”, I comprehensively document all potential diagnostics and examination details. This encompasses text after colons, commas, line breaks, bullet points, indentations, and content within parentheses “()”.
5. INFORMATION FOR THE STUDENT: The “student_info” section is compiled with precision, incorporating every word, detail, and punctuation mark. This includes information following colons, commas, line breaks, bullet points, indentations, and text within parentheses “()”.
6. LIST OF EVALUATION CRITERIA: For “evaluation_criteria”, I gather information with an unparalleled level of detail, ensuring the inclusion of every character, punctuation, and formatting cue. This section specifically includes text after colons, commas, indentations, line breaks, bullet points, text within parentheses “()”, and brackets “”. I also pay special attention to capturing the content following specific indicators such as “[List of evaluation criteria]:”, “[Medical history]:”, “[Physical examination]:”, “[Communication skills]:”, “[Clinical reasoning]:”, “[Technical skills]:”, and “[Professionalism]:”.

• By adhering to these guidelines, I compile the extracted_info dictionary with a commitment to accuracy, completeness, and detail, ensuring that no information is overlooked or misrepresented.

  1. Import Necessary Modules: The script begins by importing the necessary modules and functions from the python-docx library. This includes functions for manipulating documents, layout settings such as margins and color, as well as elements for creating and managing tables.

  2. Define Helper Functions for Formatting: Two functions are defined to customize the appearance of cells in tables. The first, set_cell_background_color, allows changing the background color of a cell. The second, set_cell_borders, adds borders around the cells. Also use: from docx import Document, from docx.shared import Inches, Pt, RGBColor, from docx.enum.table import WD_ALIGN_VERTICAL, from docx.oxml import OxmlElement, from docx.oxml.ns import qn.

  3. Initialize and Configure the Document: A new document is created and its margins are set to have a one-inch width on each side.

  4. Add Titles and Content to the Document: The script uses the prepared information to add titles and paragraphs to the document, organizing the content into clearly demarcated sections for :

  5. SSP NUMBER: For the “ssp_number” : fill with extracted_info

  6. INTRODUCTION: In the “introduction” fill with extracted_info

  7. PATIENT INFORMATION: The “patient_info” fill with extracted_info

  8. POSSIBLE DIAGNOSTICS AND EXAMS: fill with extracted_info

  9. INFORMATION FOR THE STUDENT: fill with extracted_info

  10. Creation and Filling of an Evaluation Grid: A table is added to represent the evaluation grid. The table contains 5 columns and as many rows as there are evaluation criteria. The first row is used for the headers, and the cells are formatted to have specific borders and background colors. The columns are “evaluation criteria”, “yes”, “no”, “partially”, “comment”. The lines are filled with the entirety of the evaluation criteria provided in the extracted_info dictionary. Each criteria is a line.
    Add the entirety, and all the information from the discussion from extracted_info. Transcribe the content word for word. That is to say, taking into account, word for word, all the details, the elements found after “:”, the elements before and after “,”, the line breaks, the bullet lists, the elements between parentheses “()”.

  11. Conditional Application of Background Colors: For each evaluation criterion, the script applies a different background color to the cells of the table based on the response (Yes, No, Partially), thus facilitating the reading and interpretation of the grid. The cells are formatted to have specific borders and background colors. Yes = green, No = red, Partially = yellow.

  12. Saving the Document: Finally, the document is saved under the name “scenario_clinique.docx”, and a message is displayed to indicate the path where the document was saved.

VERIFICATION:

    • Verify the completeness of the data extracted and the format or the docx.

Thank you for your help !

Hi again @IYAnepo.

My initial take is that you would want to more clearly separate the steps for creating the file / JSON with the right data and the actual execution of the script for creating the document.

One of my solutions that involves a doc creation is API-based, so a little different but it basically does is the following, which might give you some additional ideas on how to approach it:

  1. Through a series of API calls I create output I need

  2. On the basis of this output, I then create a JSON in a pre-defined structure. For each of the sections in the Word document, the JSON includes a variable to which the output is mapped. I too have a table with ratings and colors for different ratings. Here again, the ratings for each cell would be included in the JSON file.

  3. Once created, the JSON then forms the basis for the execution of a custom Word doc creation python script. This script includes the same variables I use in the JSON as placeholders for the respective sections. So once the script is executed, the content from the JSON is properly mapped to each section.

Looking at your instructions, I think you need to place greater emphasis on the data compilation part and make sure you provide the GPT with more detail. For instance when I look at 6. List of evaluation criteria, then it is not entirely clear to me what is supposed to be compiled and how it translates into the evaluation grid.

You want to make sure that before you execute the python script that all your input is ready, including the content of your evaluation grid.

1 Like

Thank you very much, I will try to implement your advice!! Thanks again.

If helpful I can share a disguised version of my script and the associated JSON under separate cover later tonight. Just let me know.

1 Like