Using gpt-4 API to Semantically Chunk Documents

I agree, but I and @jr.2509 discovered, the hard way, that the models can be pretty unreliable at accurately recreating the first and last sentences – not to mention the regex issues when there are errant spaces, linefeeds, tabs, etc… Using the line numbers has proven to be the most accurate and consistent way to identify blocks of text – at least so far.

1 Like