[Paper] RWKV: Reinventing RNNs for the Transformer Era

For me, the most interesting parts are,

Table 1

Figure 7

Showing hugely reduced demands for inference.

There are still open questions, of course, but if this develops into something genuinely equivalent to Transformers, well… that will be, as they say, huge if true.

The net result should be models which have far lower VRAM requirements and are blazingly fast.

This would have two, immediate, consequences.

  1. It will be much easier to self-host larger, more powerful models.
  2. Even larger, even more powerful models will cost much less to run at scale, causing API prices to plunge.

Imagine GPT-4 API calls at 1/4 the cost of GPT-3.5-Turbo…

2 Likes