Hello Community,
I have encountered a dilemma and would greatly appreciate your insights. I am currently developing a document analyzer and have made good progress, but I am facing a challenge with a specific feature. For the past two weeks, I have been struggling to define the boundaries of individual documents within a combined PDF, which includes contracts, emails, reports, and more. My goal is to create a script that utilizes the OpenAI API to split these documents into sub-documents. However, I am finding it difficult to accurately define each document boundary using the GPT-4o API, especially since I need to pass the pages as images to the API due to various reasons.
My current approach involves passing the pages one by one to the API until it outputs a JSON body with a “result: true,” indicating that I should proceed with splitting the document. At this point, I create a new document containing all the previous pages. While this approach has taken me halfway, I am still encountering inconsistent results. I would greatly appreciate any advice or suggestions you may have on this matter.
P.S. of course I tried with the tempreture=0 and top_p=100 and other various solutions that exist on the internet but didn’t work for me
Thank you in advance for your help!