RouteLLM from LM-Sys - A framework for serving and evaluating LLM routers

anon22939549 · July 2, 2024, 7:23pm

There’s been occasional discussions here and elsewhere about routing messages to different models based on cost. LM-Sys has released an open source project for just that purpose,

I figured some developers might find this useful.

anon22939549 · July 3, 2024, 12:10am

My two cents…

While this particular work focused on selecting between a weak model and a strong model, the natural progression would be to extend this to selecting the best model for any particular job.

For instance, say you have access to several fine-tuned cheap models as well as multiple more expensive models. You should be able to, in theory, have one endpoint you call which invisible accesses any of those it needs to in order to produce results which exceed any one strong model at a cost substantially less.

It’s also worth noting they,

use the latest GPT-4 Turbo as the strong model and either Llama 2 70B or Mixtral 8x7B as the weak model

I would be very interested to see the results they could achieve with, say, Microsoft’s new Phi 3, llama 3 8B, Gemini Flash, Claude 3.5 Sonnet, and GPT-4o (that is if they could scale this to 5 models).

Topic		Replies	Views
Creating a GPT-Style Language Model for a Single Question Community	1	425	November 15, 2021
Anyone see the TensorRT-LLM announcement from NVIDIA? Community in-the-news	2	1772	September 10, 2023
Language Modelling at Scale Community	11	745	January 3, 2024
Open source library for apps to use any LLM Community	1	1323	April 14, 2023
Medium Post: Grounding LLM's - Part 1 Prompting hallucinations	21	2919	August 23, 2024

RouteLLM from LM-Sys - A framework for serving and evaluating LLM routers

Related topics