A canonical way to partition text using GPT3.5/4

Is there a canonical way to partition text using GPT3.5/4.

In most cases a character splitter can be used to partition text, however, in certain cases a more expensive approach can be used where the quality of the partition is important.

GPT3.5/4 can infer where partitions should be based on context however, they are not designed to output this information in a way that is usable in general.

Example text: “If dogs are mammals, then dogs breathe air. Birds can fly through air.”

An initial solution could be to have the language model output the partition text as a json object. Something like:
{partitions: [ “If dogs are mammals, then dogs breathe air.”, “Birds can fly through air.” ] }

The problem with this approach is that you get no guarantee that it is a true partition of the text. Moreover, in large text situations the model is highly likely to summarize the text.

Another, more expensive solution, would be to take an iterative approach by which the model outputs the first segment in the text, which then gets removed from the front of the text. This then repeats until the model determines that there is no more segmentation. Something like:
First message: “If dogs are mammals, then dogs breathe air.”
Second message: “Birds can fly through air.”

With this approach you guarantee that the partition is true, however, this approach is essential n^2 in terms of how many tokens it takes to compute.

The final solution I have thought through and use is for the model to output unique delimiters which then can be used to split the text. Something like
{delimiters:[ “breathe air.”, “through air.” ]

This guarantees that the partition is true while remaining cheap. In comparison to the first approach which takes 2n tokens to partition, this takes n + k tokens where k is the number of partitions.

The problem I’m finding with this method is that it performs worse in terms of the actual partitioning. My theory for why this is is because by not regurgitating the text the contextual understanding is reduced.

Is there a canonical way to partition text using GPTs?