So in a lot of cases order doesn’t matter but if you’re asking the model “what are the steps to do XYZ?” then order can matter a lot.
One of the bigger issues is that even when things like steps are numbered you may not get all of the steps back. The approach I’m taking is to generally ignore the text predicted by the chunks and instead use the chunks to build a heatmap of sorts to identify the most relevant parts of the document. Then the goal is to retrieve full spans of text centered around those spots.