good ideas to extract headings and paragraphs from APIi reply?

We are doing quite extendive translations of documents (pdf) and would like to better be able to recognize doc formats. Any good experiences?

We get very nice transcripts, but as they are of several different types, things get but more cunbersome.

Hi there and welcome to the Forum. In principle you can ask the model to identify and return in JSON format headers and paragraphs of a document. Depending on the length of your document, I would chunk it first to make the task more manageable and then afterwards concatenate the results.

thanks for help. works well. I was first thinking not to use JSON, but now here it fits well to the case. cool.

