RouteLLM from LM-Sys - A framework for serving and evaluating LLM routers

There’s been occasional discussions here and elsewhere about routing messages to different models based on cost. LM-Sys has released an open source project for just that purpose,

I figured some developers might find this useful.


My two cents…

While this particular work focused on selecting between a weak model and a strong model, the natural progression would be to extend this to selecting the best model for any particular job.

For instance, say you have access to several fine-tuned cheap models as well as multiple more expensive models. You should be able to, in theory, have one endpoint you call which invisible accesses any of those it needs to in order to produce results which exceed any one strong model at a cost substantially less.

It’s also worth noting they,

use the latest GPT-4 Turbo as the strong model and either Llama 2 70B or Mixtral 8x7B as the weak model

I would be very interested to see the results they could achieve with, say, Microsoft’s new Phi 3, llama 3 8B, Gemini Flash, Claude 3.5 Sonnet, and GPT-4o (that is if they could scale this to 5 models).