Sampling parameters have advanced a ton in the past few months. OpenAI’s combination of temp, top k, and top p is beyond antiquated. OpenAI should offer the min p sampler and ideally other modern samplers too (ie, quadratic sampling, mirostat). I know it’s more options but I’d bet that most people advanced enough to use the API can handle a few more sliders, and it’d probably improve generation quality a bunch.
This is actively hurting my projects at this point.
yeah but the sampler actually needs to be implemented.
If it really is that good, it will probably eventually come with newer models, maybe they’ll treat temperature as a mirostat bias or something
How is it hurting your projects?
edit: that said, just hypothetically theoretically, the MoE thing could maybe make mirostat obsolete anyways
it’s hurting my projects because basically every API, even the open source ones, use OpenAI’s library for LLM API calls. OAI has managed to dominate the LLM API space and their library’s lack of sampler options is reducing the output quality I can get for my clients. So its relative antiquation is hurting my work.
Also I mentioned mirostat since it’s known, but since it’s so hard to use I’m really more interested in things that Kalomaze has made, ie min_p and quadratic. We wouldn’t have to wait for them to “come with” new models, samplers are how you pick the token that the model generates, they aren’t actually tied to any specific model.
min_p works with MoE (I’ve used Mixtral with min_p before and the outputs are good) so I don’t quite understand your edit. Clarify?
I meant that using dynamic samplers could introduce some weirdness when used on a model with an expert router, because they’re two completely cooks working on sorta the same soup
it’s hurting my projects because basically every API, even the open source ones, use OpenAI’s library for LLM API calls. OAI has managed to dominate the LLM API space
i wholeheartedly agree with this
However, i kinda disagree with this - while yes, you theoretically should be able to play with all this, APIs are products - and the sampler is part of that product. they have their temp, top_p, logit bias and presence penalty, although some of them are kinda wonky with some models if I recall correctly. If they add all these things they’d have to support all the combinations. Maybe not the best idea if they’re already struggling with what they have.
Ideally they’d do their own evals and release models with the optimal default parameters. But considering 0314 is still the best model all around, they’re also struggling with that.