Cool stuff.
Question:
Is there a reason you did it like this, as opposed to including prompt tokens in (or before) the first chunk and completion tokens with each chunk?
The way you did it can’t be used if a generation is canceled by the client (which is quite common when intercepting or preempting model errors) ![]()