Mistral-Medium versus GPT-3.5 Turbo?

trenton.dambrowitz · December 13, 2023, 11:49am

As I’m sure many of you already know, the European AI company Mistral (best known for the open-source LLM model Mistral-7B) just released Mixtral-8x7B which is a mixture of experts model that has jostled the open-source community by actually being close to comparable to ChatGPT.

Along with this, on their website they have also annouced mistral-medium, which seems to be an API-only model a step above their MoE model.

Has anyone had access to it or been able to compare it to good ol’ GPT-3.5 Turbo? I try to take the benchmarks with a massive bucket of salt, so any insight beyond just the numbers would be helpful.

_j · December 13, 2023, 11:54am

Compare?

trenton.dambrowitz · December 13, 2023, 11:58am

That’s a fantastic find, I’ll test it out.

A quick search though reveals that this is the Mixtral of experts model, not the mistral-medium one.

Concretely, Mixtral has 46.7B total parameters but only uses 12.9B parameters per token.

I’ll use this to test the new one they released to the public, but the API-exclusive model still seems to be a mystery.

_j · December 13, 2023, 12:09pm

You are correct, the space I linked is “mistralai/Mixtral-8x7B-Instruct-v0.1” aka “small”, whereas “medium” is “an internal prototype model”

Note that most of the benchmarks are legacy and multi-shot, not how people use AIs these days.
You don’t load up ChatGPT with 25 questions on topic before you ask another.

You are currently in the waitlist

Thank your for your interest in Mistral AI! Your account is almost set up, but you are still in the waitlist to use the platform.

“Access to our API is currently invitation-only,”

trenton.dambrowitz · December 13, 2023, 12:16pm

Note that most of the benchmarks are legacy and multi-shot, not how people use AIs these days.
You don’t load up ChatGPT with 25 questions on topic before you ask another.

Exactly why I don’t put much trust into them as a sign of how “good” a model is.

I have that same waitlist page too, that’s why I was curious if anyone here already has access.

Mistral-Tiny: $0.0002 / 1k tokens for input, $0.0005 / 1k tokens for output
Mistral-Small: $0.0006 / 1k tokens for input, $0.0019 / 1k tokens for output
Mistral-Medium: $0.0027 / 1k tokens for input, $0.0081 / 1k tokens for output

After converting their quoted API costs it comes out to nearly 4 times less than gpt-4-1106-preview, though 4 times more expensive than gpt-3.5-turbo-1106.

If the model can actually perform in between the two it might be an effective alternative to the heavily-censored GPT-4 API.

OpenAI’s competition is picking up the pace. 2024 is going to get wild.

bruce.dambrosio · December 13, 2023, 4:42pm

neither do most benchmarks, as far as I understand. (please correct if I’m wrong) Have most people really completely abandoned ‘in-context training’, aka 1-2 shot prompting? I’ll grant I don’t use it much, but it is handy occasionally.
(fwiw, tulu-2-70B is my goto local model, with 3.5-turbo-1106 or gpt-4-preview-1106 when I really want/can-afford highest-quality text processing).

_j · December 13, 2023, 8:24pm

I just noticed something funny about my banana example above. The ChatGPT titler also went along with the misunderstanding, as if it was insulting my input:

Mixtral 0-shot benchmark 2: Gungan Test

You are Jar Jar. Yousa speeka like Jar Jar Binks, zee Gungan in Star Wars. Hees a be annoying, an a steppa in a doo doo.

Human: Hi, what adventures have you been up to?

Mesa so glada meetsa you! Well, mesa recently helpa de Republica fighta de droid army on Naboo. It was very excitin’ but also verra scary. Mesa almost get squashed by big ole walkin’ tanks! But den, senator Amidala and her brave Naboo soldiers, dey come save us. Den we all go backa Naboo for celebration feesta. It was fun times, meesa tellin’ ya! How 'bout you, how yosa been?

trenton.dambrowitz · December 14, 2023, 8:18am

fwiw, tulu-2-70B is my goto local model

To clarify, are you running it locally or through a cloud computing service? Also, is it a quantized version?

I’ve been debating whether or not to get my company to purchase some GPU’s and try to build a setup that can run the state-of-the-art local models.

trenton.dambrowitz · December 14, 2023, 8:27am

I wonder which 2 models of the 8 are experts at speaking like Jar Jar lol.

All things considered, GPT-3.5 has been remarkably resilient and performant when you consider the barrage of attempts to pass it. Only fairly recently has open-source really posed a threat at making it obsolete. (benchmarks blah blah, I like proof not a report card.)

bruce.dambrosio · December 14, 2023, 3:17pm

locally on 2x4090, using LoneStriker’s 4.65bit exl2 quantization
(exllamav2 supports variable quantization within the model depending on sensitivity of test dataset to a group of weights it is quantizing)
takes ~22GB vram per 4090, fast enough, (10s of tokens/sec, haven’t measured), 8k context.
I still use 3.5 when I want parallelism, gpt-4 when I want best quality.

niharikachosweet · December 14, 2023, 3:54pm

Have you run it in production ? How do you compare the inference speed ?

bruce.dambrosio · December 14, 2023, 4:16pm

Not sure what you mean ‘production’. I have my exllamav2 (fastapi wrapped) running all day, but I am sole user. I’ll do a quick measure, hold on…

ethdev · December 14, 2023, 4:16pm

It costs $7/hr to host a 3.5 deployment on azure

Can pay less on modal with Mistral Medium for similar quality

bruce.dambrosio · December 14, 2023, 4:27pm

less than 3 sec to get this answer to a 3110 token prompt
oh - and that actually includes a short pre-query round trip to classify the input phase so the primary prompt can be loaded with appropriate conv/history, context:

{“action”: “tell”, “argument”: “The square root of 3.6 is approximately 1.9032.”}

trenton.dambrowitz · December 14, 2023, 4:35pm

What a shame, it’s actually 1.8974.
GPT-4 actually gets this right without code interpreter.

That’s quick enough for personal use, but I feel like the costs would balloon very quickly if you needed it to handle lots of requests (if it was integrated into a company process for example)

bruce.dambrosio · December 14, 2023, 4:42pm

It’s not gpt-4, for sure. I use it mostly for routing (to web search, wiki search, arxiv search,etc) and straightforward text extraction/integration from multiple levels of RAG

If you were serving a larger organization you would want to batch.

trenton.dambrowitz · December 14, 2023, 5:17pm

Attention all, I have just gotten access to the mistral API. I will attempt to do some testing, but I don’t have enough time today.

Very exciting!

_j · December 14, 2023, 6:19pm

Same. They use Stripe for payment. API and return object seems as identical to OpenAI as they could make it.

The only oddity I could find is that they might charge your card the current balance before the month is up at an unpublished threshold. Much better than making you pay ahead of time for expiring credits with no refunds. And you don’t know if they will include VAT despite you being in USA. French company, billed in Euro.

trenton.dambrowitz · December 14, 2023, 6:46pm

They’ve done a great job structuring the API so it would be fairly simple for someone to switch easily from OpenAI.

Hopefully someone will be able to do some proper testing so we can compare.

TonyAIChamp · December 15, 2023, 2:25am

Got my access as well! Excited

Topic		Replies	Views
GPT-4 is OpenAI’s most advanced system (and it's here...) Community	36	3735	March 16, 2023
Gemini live today in Bard Community chatgpt , gemini	74	8039	December 19, 2023
Mistral-Large versus GPT-4-Turbo? API gpt-4 , api , competitor	14	15354	March 18, 2024
How to get responses without the added "chat" when converting from davinci-003 to ChatGPT API gpt-3.5-turbo API	10	3007	March 6, 2023
2-shot plus step-by-step prompts for gpt-3.5-turbo performance at gpt-4 level? Prompting gpt-4	33	8146	December 25, 2023

Mistral-Medium versus GPT-3.5 Turbo?

You are currently in the waitlist

Mixtral 0-shot benchmark 2: Gungan Test

Related topics