I would like to use AI to scan a pdf document (ESRS/CSRD regulation) for text portions that are related to certain keywords/topics („value chain“ and „materiality assessment“)

RGL · July 24, 2024, 11:21am

The result should be presented in a table format and there should be sufficient information to understand where exactly the text was extracted from within the pdf (page).
Additionally, I would like ChatGPT to use / extract the exact same wording (if possible due to copyright concerns) from the pdf when it extracts the text from the pdf into the result table.
At this point there should be 2 different tables created by ChatGPT: One that shows all text (paragraphs and additional info) related to „value chain “and another one with all „materiality assessment“ related text within the pdf.
Once that has been done, the AI is supposed to scan another pdf document (the implementation guidance for that specific topic [for “value chain”]) / [for “materiality assessment”]. The most relevant information from the Implementation Guidance should then be assigned to the respective (individual) standards/paragraphs/information (regarding “value chain” or “materiality assessment”) already shown in the table.
Once that has been done for the „value chain“ table, the next step is to do the same thing for the „materiality assessment“ table.

Can someone please help me understand how to do that? I tried using chatGPT for that yesterday but I received inconsistent results, would setting temperature and creativity to 0 help?

What other ways could I go about for solving this task?

Many thanks in advance.

Topic		Replies	Views
Trainining based on complex text API gpt-4 , chatgpt , api	8	1837	July 5, 2023
Trying to let ChatGPT use guidelines from a document Community fine-tuning , rag	4	1345	August 27, 2024
Best Way to Process 2500 large PDFs for Specific Data Extraction? API chatgpt , api , langchain , pdf	2	2735	November 3, 2024
Train GPT for analyze large number of pdf Community chatgpt	8	2792	August 2, 2024
Correct retrieval of figures from uploaded files GPT builders	2	637	January 14, 2024

I would like to use AI to scan a pdf document (ESRS/CSRD regulation) for text portions that are related to certain keywords/topics („value chain“ and „materiality assessment“)

Related topics