Is there a way in the API to cache representations of (part of) a prompt, such that we don’t need to run GPT3 on the entire set of examples for every new input? If not, could someone look into implementing this? I would be happy to implement it myself, via giving an option to return the network representation of a string and the option to input a representation cache when generating.
This is a super common use-case, given the few-shot learning paradigm. It would save costs by 10-20x to not re-process the “training” tokens for every inference step, and it would also massively reduce the environmental cost of unnecessary computation. This would be INCREDIBLY helpful for such a small change!