“Our long term vision is to build general intelligence, open source it responsibly, and make it widely available so everyone can benefit. We’re bringing our two major AI research efforts (FAIR and GenAI) closer together to support this. We’re currently training our next-gen model Llama 3, and we’re building massive compute infrastructure to support our future roadmap, including 350k H100s by the end of this year – and overall almost 600k H100s equivalents of compute if you include other GPUs.” Source
At the same time, we recently saw the launch of the Mistral model, which is comparable to the GPT 3.5.
The best “useless metric” I have seen so far is the “how often does one model beat another model in multiple tests” and then use that as the basis for an arbitrary scoring system of battle wins/losses and using that value to rank models. This usually ends up putting some 7b model just 10 points behind GPT-4 and the nature of the metric is then hidden away in some acronym.
Also, I’m less convinced that different models make a lot of difference, and instead, I believe “the best training data” and “the best training hardware / best training schedule” are the determining factors. Note that “best” doesn’t necessarily mean “most.”
This is for LLM text-completion tasks. If you’re going for AGI, I think we’ll need some totally different architecture, so for that case, clearly, some new model will matter, in addition to the training data. But I also think we’ll need additional training data, not just text completion, for that application.
And how much is it going to cost me to lease the required hardware in the cloud to run Mixtral 8x7B
I don’t understand why someone would lease hardware for a lightweight 7B model, or believe it to be a binary proposition.
Mixtral runs on the local machine and is a compliment to GPT intended for entirely different purposes.
I rely on it heavily for chunked processing of data I pass to GPT, for example.
I also rely on it for simple code-replacement tasks that need to run quicker and without a remote API call, for example, a 100k+ record dataset that I need to regress to some sort of pattern often based on sentiment, etc. Thing that would take far too much time to code / require fuzzy-logic; however, aren’t a fit for GPT.
Don’t overlook the use of open source models as a compliment or separate tool, they can be a great addition to your toolset!
The debate of open source vs “proprietary” models will always be ongoing … I just recently went through a vendor training and that was part of the curriculum. Compare and contrast, list pros and cons between open source models and vendor specific ones (proprietary) … as an example pros for the “proprietary” models, such as ones provided by OpenAi were as follows:
in Production running on your local machine is not an option?
Can you help me understand why not?
Open Source will make more sense for Production as hardware costs come down
I also have an instance running on a Raspberry Pi that works fine for the purpose.
Based on your comments, I feel as if we may be talking about different models.
And to reiterate again – I’m not for one moment discounting, nor do I believe there is any debate about which is better. I’m strictly referring to using the best tool for the job and that’s not always the most powerful model.
To offer yet another example (since I mentioned my Pi usage) – Home Assistant integration is such a use case. Far better to use a lightweight local model.
I’m as big of a fan of OpenAI as they come; however, I have many instances of Mistral models running for entirely different purposes. All of them on commodity hardware.
Because it’s risky, uncontrolled, difficult to scale, not designed for delivery to the internet and not professional.
I see what you are saying. If you run a business 24x7 operation and you need high uptime, you go to the cloud.
But if you are in the cloud, why not just run some fancy proprietary model through an API? Right?
However, local may not work 24x7, unless you are a bigger company and can afford your own server farm with specialized HVAC, redundancy, etc.
But local can work when doing offline/local, non-external event driven things, like writing code, or doing 1-off things.
But then there’s the frustration factor with local OS models. For example I downloaded Mixtral 8x7b to run locally on my Mac Studio with 128 GB of RAM using the new Apple MLX framework. Got done with downloading the weights (like 90 GB) and then the whole thing failed because my local git repo didn’t have the “large file” thing initialized. So it can be frustrating.
But say I needed to create a high quality training file for another model. The local model could have assisted me, and saved me some money, and boosted my ego
Got done with downloading the weights (like 90 GB) and then the whole thing failed
There is a misconception that you need the most powerful model, yet this is what OpenAI/GPT is for. Typically if you have a use-case for a lower-power local LLM, you shouldn’t notice a major difference between Mistral and Mixtral at 7B.
You’re better off with Mistral 7B at a few gig. Better yet, run something like LM Studio and make your download/spin-up of different models point-and-click easy.