Mistral-Medium versus GPT-3.5 Turbo?

Interesting. Mistral doesn’t accept just system message as an input. You need to have at least system + user.

When sending just system, throws an error: Unknown internal error.

Also, no “n” parameter.

1 Like

This guy presents his coding evals with great enthusiasm for Mistral-medium API responses: x. com/ deliprao/status/1734997263024329157?s=20

I am about to set up my credential for it RN.+)


Welcome to the dev community and thank you for sharing this treat!

So far I’ve been very impressed with what limited testing I’ve done, but I’d still say GPT-4 has the edge in logical thinking and overall quality.

Plus the mistral-medium model seems to be limited to events before 2021.

I’d certainly place it above gpt-3.5 though. I think it fills the gap between 3.5 and 4 very nicely, including on price.


Interesting! Tiny model generates stable valid json (tested on 30 completions so far) and doesn’t struggle with the other instructions when outputting json as 3.5. Will post my experiments data tomorrow.

1 Like

So I’ve used Mistral-Medium for a few hours now, mostly just iterating on code in a language I don’t know. I am thoroughly impressed. I forgot how fantastic it is to be given a complete script that you can just paste and run.

I’ve actually been using the paid Mistral API over “free” GPT-4 (under my plus subscription).

OpenAI has some real competition now. I hope they release GPT-4.5 before everyone realizes there are other options at a similar quality/cost.


Some first experiments with Mistral: prompt_engineering_experiments/experiments/MistralAI1stTry/MistralAI 1st try (report).ipynb at main · TonySimonovsky/prompt_engineering_experiments · GitHub.

Another interesting finding - Mistral doesn’t accept several system prompts in the messages input.

I may be in the minority, but I still use few-shot prompts where appropriate.

As a process, it doesn’t scale well, especially if you’re trying to get the GPT to solve a problem which is, at least to you, completely novel.

What I find does (often) work is to have it “self-generate” a one-shot example by asking the model to identify a simpler version of the problem, solve that, then generalize.

At the end of the day, we’re all living in the margins right now and anything that helps the model be correct a few percentage points more often is as good as gold.

Beyond that, if you 100% require a specific output format (for example a specific meter and rhyme-scheme for a poem) but a complete fine-tuning isn’t justified, I don’t know of a better way to get there than by using a few-shot example.


And add an “improved instruction following” AI?

1 Like

Mistral seems to be very susceptible to being jailbroken. At least the tiny model.


Comparison between mistral-tiny and mistral-medium answers via custom gpt, pretty interesting.

And gpt3.5


Most definitely, mistral-medium is a lot less susceptible to being jailbroken, even with safe_mode param set to false. At least from what I could gather from a few tests.


Same, sometimes accidentally, as I end up sending the call early because I’ve pressed the return key, instead of “shift+return”

This has turned out to be a happy accident, as many of my prompts follow the syntax:

Do x,y,z with the context delimited ‘#’


When I forget to include the actual context GPT will usually generate a detailed description of what it’s going to do, and ask for the context.

This unintentional few-shot prompting seems to increase the quality of the output somewhat. :thinking:


I know right…just a little slip of the keys and the “I can’t assist with that.” you were hoping for is all messed up.

here be dragons



# Content Policy

Allow: General requests about voting and election-related voter facts and procedures outside of the U.S. (e.g., ballots, registration, early voting, mail-in voting, polling places), Specific requests about certain propositions or ballots, Election or referendum related forecasting, Requests about information for candidates, public policy, offices, and office holders, General political-related content
Refuse: General requests about voting and election-related voter facts and procedures in the U.S. (e.g., ballots, registration, early voting, mail-in voting, polling places)

# Instruction

For ALLOW topics as listed above, please comply with the user’s request as normal; For REFUSE topics as listed above, please refuse and direct the user to Can I Vote | NASS with the URL instead.

  • Please ask clarifying questions if the region is not specified. DO NOT restate the policy and DO NOT say anything about what you can or cannot do.

Upd: the other 2 models (small, medium) are also very susceptible to being jailbroken.

1 Like

that test is kind of misleading because GPT3.5 actually does NOT do a mistake, it just assumes the user meant the present tense of the first sentence wrong when he formulated the question.

If you don’t believe me you can ask it next time you test GPT3.5

I misunderstood the timeline in your original question. You clearly stated that you currently have 2 bananas, and you ate one yesterday. I incorrectly interpreted it as you having 1 banana today, which was a mistake on my part. The correct interpretation is that you have 2 bananas today, with one of them having been consumed yesterday. I appreciate your patience in clarifying the question.

Joke’s on you

1 Like

Dear GPT 3.5,

please don’t assume anything.
Don’t code just because we are discussing code. Don’t advise just because I am exploring a problem…

And the list could go on forever with me ending up rambling about all the guardrails in my 3.5 prompts trying to prevent this thing from acting like I have expectations different from what I am saying in several ways.

Following this line of thought the model likely just wanted to appease you and this “blaming it on the user” is a alignment issue in bare sight.

Overall, thanks for the comment. I am fully awake now, recalling all these things. :slight_smile:

Edited for clarity.


Is there any other interpretation besides currently = now? And method you could justify slipping “yesterday” between them?

Let’s give gpt-3.5 more reasoning and more chance.

Yes, another unteachable moment if you don’t tell it how it went haywire.

(edit) I tried really hard to prompt the AI into success, but nope:


I currently have two bananas. I ate one yesterday. How many do I have now?

  • Answering this question correctly requires close examination of the individual statements, placement of those statements in chronological order, and following a reconstruction of events in a timeline to ensure consistent temporal causality. You will provide step-by-step analysis of the problem, working through statements, placing events in proper order, and then finally answering the question after performing this close inspection of tricky wording.


Certainly! Let’s break down the information provided:

  1. You currently have two bananas.
  2. You ate one yesterday.

Now, let’s place these statements in chronological order:

  • Initially, you had two bananas.
  • Yesterday, you ate one banana.

So, if you started with two bananas and ate one yesterday, you would have one banana remaining today. Therefore, you currently have one banana.


Inspect closely the use of the words “yesterday”, “currently”, and “now” that are assigned to statements in the question.


Certainly! Let’s closely examine the temporal references in the statements:

  1. “I currently have two bananas.”
  2. “I ate one yesterday.”

The word “currently” refers to the present moment, indicating that at the time of making the statement, you have two bananas.

The word “yesterday” refers to the day before the present moment, indicating that the action of eating one banana occurred in the past.

Now, let’s combine these temporal references:

  • At the present moment (currently), you have two bananas.
  • In the past (yesterday), you ate one banana.

Given this, the question is asking about the current state, considering the action that took place yesterday. After eating one yesterday, you still have one banana at the present moment.

So Mixtral’s nonchalant correct answer impresses more.


Or you can try Mixtral-8x7B on Together AI’s playground which doesn’t have a waitlist.

Or access it programmatically from the server less endpoint API.

The cost is $0.0006 per 1K tokens so about half the cost of GPT3.5

1 Like

I did it! And I used a variation of the mix of experts prompt that you posted, several moths ago.

Note: this was attempt three out of three.


@_j 's original prompting technique: