Open Source is making rapid progress

Mark Zuckerberg said yesterday:

“Our long term vision is to build general intelligence, open source it responsibly, and make it widely available so everyone can benefit. We’re bringing our two major AI research efforts (FAIR and GenAI) closer together to support this. We’re currently training our next-gen model Llama 3, and we’re building massive compute infrastructure to support our future roadmap, including 350k H100s by the end of this year – and overall almost 600k H100s equivalents of compute if you include other GPUs.” Source

At the same time, we recently saw the launch of the Mistral model, which is comparable to the GPT 3.5.

Other interesting models and architectures are arriving rapidly. It is something relevant to think about.


Models be rolling out for sure. But I always hate seeing the unidentifiable wall of metrics every paper has these days.

For overall quality, I like looking at blind testing on real world data, and leaderboards (ref), and actually be part of the crowdsourced testing and weigh in directly.

The crowdsourced opinions are showing OpenAI GPT-4 variants leading overall right now, and Mixtral 8x7B in 7th.

So for overall quality, GPT-4 is king right now.

But the best GPT-3.5-Turbo is 10th. So if you are looking at a 3.5-Turbo competitor, then yeah, lots of OS/OW options. :+1:


The best “useless metric” I have seen so far is the “how often does one model beat another model in multiple tests” and then use that as the basis for an arbitrary scoring system of battle wins/losses and using that value to rank models. This usually ends up putting some 7b model just 10 points behind GPT-4 and the nature of the metric is then hidden away in some acronym.


One day we will make you see the light of Elo :laughing:


Also, I’m less convinced that different models make a lot of difference, and instead, I believe “the best training data” and “the best training hardware / best training schedule” are the determining factors. Note that “best” doesn’t necessarily mean “most.”

This is for LLM text-completion tasks. If you’re going for AGI, I think we’ll need some totally different architecture, so for that case, clearly, some new model will matter, in addition to the training data. But I also think we’ll need additional training data, not just text completion, for that application.

1 Like

@curt.kennedy - But I always hate seeing the unidentifiable wall of metrics every paper has these days.

I agree. Although I think they are helpful for comparing open-source models between one another, they are nowhere near accurate in my experience when it comes to GPT-4 vs others.

I use Bard Pro daily and despite all the claims/benchmarks that it’s as good, it still doesn’t come anywhere close.

However, when you look at the HuggingFace ratings between open source models, I find those very accurate and I am astonished by the performance / quality of Mistral/Mixtral 7B for simple tasks.

For example, imagine you have something that requires thousands of quick calls; Mistral is ideal for this. Or something that will run on a Pi.

It’s nowhere close to GPT-4; however, for the resources it consumes, it’s truly hard to believe what it can do with so little.

Cherry picked metrics need to be called out too.

That’s why I like the blind A/B crowdsourced metrics of overall quality.

I’m using the models in an overall sense, not how well it can generate Fibonacci numbers or whatever :rofl:


And how much is it going to cost me to lease the required hardware in the cloud to run Mixtral 8x7B without quantisation?

Let’s compare that to a representative cost of an agreed volume of GPT 3.5 Turbo API calls … (or presumably their own endpoints?)

1 Like

And how much is it going to cost me to lease the required hardware in the cloud to run Mixtral 8x7B

I don’t understand why someone would lease hardware for a lightweight 7B model, or believe it to be a binary proposition.

Mixtral runs on the local machine and is a compliment to GPT intended for entirely different purposes.

I rely on it heavily for chunked processing of data I pass to GPT, for example.

I also rely on it for simple code-replacement tasks that need to run quicker and without a remote API call, for example, a 100k+ record dataset that I need to regress to some sort of pattern often based on sentiment, etc. Thing that would take far too much time to code / require fuzzy-logic; however, aren’t a fit for GPT.

Don’t overlook the use of open source models as a compliment or separate tool, they can be a great addition to your toolset!

1 Like

The debate of open source vs “proprietary” models will always be ongoing :nerd_face: … I just recently went through a vendor training and that was part of the curriculum. Compare and contrast, list pros and cons between open source models and vendor specific ones (proprietary) … as an example pros for the “proprietary” models, such as ones provided by OpenAi were as follows:

  1. Speed of development
  2. Quality
  3. Continues updates and Support
  4. Integration and Ease of Use

Because in Production running on your local machine is not an option?

Totally get that it’s an additional and welcome tool though.

Open Source will make more sense for Production as hardware costs come down and capability goes up.

For now, my production tasks like auto-tagging require a GPT 4 level model to perform them well enough.

At some point it may be possible to run a 2T model on a leased cloud install but suspect that will be a while away … :sweat_smile:

in Production running on your local machine is not an option?

Can you help me understand why not?

Open Source will make more sense for Production as hardware costs come down

I also have an instance running on a Raspberry Pi that works fine for the purpose.

Based on your comments, I feel as if we may be talking about different models.

And to reiterate again – I’m not for one moment discounting, nor do I believe there is any debate about which is better. I’m strictly referring to using the best tool for the job and that’s not always the most powerful model.

To offer yet another example (since I mentioned my Pi usage) – Home Assistant integration is such a use case. Far better to use a lightweight local model.

I’m as big of a fan of OpenAI as they come; however, I have many instances of Mistral models running for entirely different purposes. All of them on commodity hardware.

Because it’s risky, uncontrolled, difficult to scale, not designed for delivery to the internet and not professional.

Interesting. I respect your opinion; however, none of that is accurate from my perspective or the explanation I’ve attempted to offer.

No further discussion on the topic needed, I can certainly agree to disagree.

I see what you are saying. If you run a business 24x7 operation and you need high uptime, you go to the cloud.

But if you are in the cloud, why not just run some fancy proprietary model through an API? Right?

However, local may not work 24x7, unless you are a bigger company and can afford your own server farm with specialized HVAC, redundancy, etc.

But local can work when doing offline/local, non-external event driven things, like writing code, or doing 1-off things.

But then there’s the frustration factor with local OS models. For example I downloaded Mixtral 8x7b to run locally on my Mac Studio with 128 GB of RAM using the new Apple MLX framework. Got done with downloading the weights (like 90 GB) and then the whole thing failed because my local git repo didn’t have the “large file” thing initialized. :face_with_symbols_over_mouth: So it can be frustrating.

But say I needed to create a high quality training file for another model. The local model could have assisted me, and saved me some money, and boosted my ego :rofl:

1 Like

There is a misconception that you need the most powerful model, yet this is what OpenAI/GPT is for. Typically if you have a use-case for a lower-power local LLM, you shouldn’t notice a major difference between Mistral and Mixtral at 7B.

You’re better off with Mistral 7B at a few gig. Better yet, run something like LM Studio and make your download/spin-up of different models point-and-click easy.

I was trying to stress test my machine with the new MLX framework.

I’ve used locally the smaller quantized Llama variants. Barely hit 3 GB of RAM.

This was more of a stress test, and an ego boost / bragging rights :rofl:

:rofl: Nice. I find Cinebench is great for this purpose (although you’re right, no bragging rights there) – it does save you a good deal of bandwidth.

…Unrelated to the bragging rights, which I respect –

I just want to be sure that other devs aren’t discouraged. It’s extremely easy to install and I think they can be of great help/compliment to OpenAI API access.

1 Like

I’ve had a great time using ollama on my local comp! Can’t wait for the time when having a GPU with 128GB is the norm :).

1 Like