I’m sorry - I’m a little slow here
I thought we’d finally be getting partial assistant output in prompts, but this is not that.
In fact, it looks like we’re billed normally for all output tokens, predicted or not, is that right?
I mean it’s cool tech, and I wonder how you guys are doing it (parallel generation? Multiple token prediction + skip ahead? Quants?)
But for this particular use case aren’t diffs/ fuzzy diffs much faster and cheaper?