I would like to use AI to scan a pdf document (ESRS/CSRD regulation) for text portions that are related to certain keywords/topics („value chain“ and „materiality assessment“)

  1. The result should be presented in a table format and there should be sufficient information to understand where exactly the text was extracted from within the pdf (page).

  2. Additionally, I would like ChatGPT to use / extract the exact same wording (if possible due to copyright concerns) from the pdf when it extracts the text from the pdf into the result table.

  3. At this point there should be 2 different tables created by ChatGPT: One that shows all text (paragraphs and additional info) related to „value chain “and another one with all „materiality assessment“ related text within the pdf.

  4. Once that has been done, the AI is supposed to scan another pdf document (the implementation guidance for that specific topic [for “value chain”]) / [for “materiality assessment”]. The most relevant information from the Implementation Guidance should then be assigned to the respective (individual) standards/paragraphs/information (regarding “value chain” or “materiality assessment”) already shown in the table.

  5. Once that has been done for the „value chain“ table, the next step is to do the same thing for the „materiality assessment“ table.

Can someone please help me understand how to do that? I tried using chatGPT for that yesterday but I received inconsistent results, would setting temperature and creativity to 0 help?

What other ways could I go about for solving this task?

Many thanks in advance.