Question About Specialized AI Models

Hello everyone,

I’m curious if OpenAI has any plans to develop small, specialized models for specific tasks in the coming years.

Currently, generalist models like GPT-4 can handle a wide range of queries, from historical facts like Leonardo da Vinci’s birthdate to scientific details about the solar system. However, many applications don’t require such broad knowledge. Instead, they could benefit from models focused on particular domains or tasks.

For example, in a Multi-Agent System, we could have various assistants each tailored to a specific sector, such as customer support, medical advice, or financial consulting. These specialized models would be more efficient and optimized for their respective tasks, reducing unnecessary processing and potentially improving performance.

Additionally, I’m wondering if OpenAI plans to continue focusing solely on generalized models, or if there are any initiatives to create these specialized models for more targeted applications.

Furthermore, I think there’s potential for OpenAI to offer these specialized services with added security levels. This could be particularly beneficial for big tech companies looking to integrate these models into their specific domains, ensuring both efficiency and security.

Has anyone heard about any developments or plans from OpenAI in this direction? I believe this approach could significantly enhance the efficiency and effectiveness of AI assistants.

Looking forward to your insights!

Best regards,
Razvan Savin

OpenAI has a small model - babbage-002

It is a base completion engine, so in fact, it would have to be fine-tuned to be used in a specialized manner, and it would need an application where something basic is just required, because it is apparent that it is indeed small and of high perplexity.

The cost of inference after fine-tuning is $1.60 / 1M tokens - compared to $0.50/$1.50 / 1M tokens for standard gpt-3.5-turbo-0125, a larger chat-trained model, so you get higher cost for having done all the work yourself.

1 Like

Thank you for your answer.

I see, the babbage-002 model will be excellent for certain cases in a network of agents.

I am considering splitting the Big Generalist Brain into multiple small specialized base models. These models would be activated on request (like neurons in a brain) and fine-tuned with specific data to make each model sharp and more capable of following instructions for a particular purpose. This approach would result in faster models with lower power consumption. Additionally, I want to minimize the chance of hallucinations when these models explore new areas, effectively giving each model its own identity. By making the models highly specialized, they can adapt to different types of projects and be more accurate.

A generalist model is useful as a communicator, for people to have fun, etc. However, I believe that big tech companies will want specialized models for their specific fields. They could have a base model and add their research information to fine-tune it, using locally or in the cloud with updated data for their daily tasks and to enhance their research.

I am working on a personal project for a Multi-Agent System. Here is the diagram. It’s not fancy, but I hope it conveys the idea:

Fine tuning would take careful thought about what you want to do. You can evaluate the base model capability with context and multi-shot and see how far you have to go.

Because of tokenization that may use several tokens per word, this is what it took to even have babbage-002 “complete” a predicted word (although you could regex split a word out with less setup).

image

(you don’t see the last # because it is the stop sequence)

1 Like

I love how you show your image with predicted words. I’ve developed an attention system to predict the next masked word.

Here’s how I envision using it:
Imagine 1,000,000 assistants working together, like bees in a hive, neurons in our brain, or sculpting tools and brushes in Blender.

For example, when shaping a model in Blender, you need many tools. To sculpt a face, one general tool won’t suffice; it would result in a poorly defined model. Instead, using multiple tools or brushes—with different procedures, sizes, colors, and materials—you can predict which tool to use for each situation. This allows you to define specific features like lips, eyes, hair, bone structure, nose, smiles, beards, glasses, and how these additions affect the overall appearance, such as modifying eyes or creating shadows.

Similarly, in a Multi-Agent System, having specialized AI models tailored to specific tasks can significantly enhance performance. Each assistant, fine-tuned for its unique domain, would operate more efficiently, just like using the right tool for a specific part of a sculpture. This approach minimizes unnecessary processing and reduces the risk of hallucinations, as each model would focus on its specialized area.

That’s why I’m trying to use granular models to make accurate predictions and to use them in collaboration. By breaking down tasks into smaller, more manageable parts and leveraging the strengths of specialized models, we can achieve higher accuracy and better performance.

Fine-tuning these models is like sculpting with multiple lasers simultaneously, allowing for precise and refined adjustments.

Imagine 100000 workers, eash assigned a token out of the dictionary, and they read the entire context and compare it to training to find out how likely it is…

That’s how the language AI already works.

Hello sir,

I think that’s the purpose of GPTs, to allow for the specialization of AI.

If I understand correctly one can fine tune them as well, meaning open AI is letting users find the possible uses.

In general, AI is amazing, but a large part of the population is still sceptic. What I’m trying to say is that open AI still faces criticism and in my opinion it will not develop more specific models, because users can do it…

You know what I mean? Was that the sense of your question?

Hi @_j and @ayarportugal,

Thank you for your insights.

I was aiming to create a system where multiple specialized models work together autonomously and update themselves, enhancing efficiency and accuracy by focusing on specific domains. This approach could reduce errors and improve performance in various fields.

So you’re developing a GPT (?) that coordinates with other GPTs who work autonomously yet they create something by working together.

The concept is very interesting. As an user I see this useful if the GPTs are specialized, so the general one interacts with the user but it’s helped by the others.

Best of lucks!

Yes, GPTs, and they already work. I have two layers exactly as you described:

Now, I am improving the project and want to add more layers to scale up. After that, I want to train models, and I plan to gather all information about them when I have time, to know what to do next. My dream is to have a GPT-4o model as a communicator.

Thanks for the good wishes!

Hi @razvan.i.savin

I guess you are talking about developing something like a Mixture of Experts (MoE) system?

Hi @Dus-DB
I call it a Multi Agent System (MAS),
Here is a demonstration video I have on YouTube
, is not a fancy video with a nice chat, is just a CLI.

Returning to your original question about OAI developing specialized models, i had the same thoughts more than a year ago, and of course is a better (natural) option to have or trying to achieve it as per the benefits you described; but i’m not sure that from a cost/benefit perspective ($$$) of the LLM model creator would be so good idea.

Anyhow, today i guess its much better that someone like OpenAI offers a very powerful general model along with tools for the users, companies, groups, etc. who have the whole knowledge of their specific domain to adapt/customize/tunning the model for their needs.

Features like the system prompt and better if combined with RAG and the flexibility of managing via SDK can help you get a powerful specialized bot/assistant/agent.

(i may be wrong on this: i read somewhere that GPT-4 could have a MoE architecture and that the system prompt help in the “activation of the adequate experts”. - sorry if this wrong)

I believe the costs will be quite high, but acceptable for large companies that want to conduct research and use a base model for their sector and tune it with their own research.

I say this because I personally would like to accelerate research to improve the resilience of plants to extreme heat, brutal weather, drought, etc. We need to use every possible means to make corrections as quickly as possible and restore natural balance. AI can help us significantly in preventing the continuous deterioration of the climate. Otherwise, we might end up with AGI and an uninhabitable planet—ironic, isn’t it?

Yes, I agree with you. OpenAI has already provided the world with an extremely good model and many creative possibilities come with it. Using the latest model as a communicator for your Agents/Assistants, I believe, will be fantastic and will offer a lot of power to the users.

I’m not sure if GPT-4 has a Mixture of Experts (MoE) architecture either. It’s possible, but without definitive information, we can only speculate. But one thing I know for sure, they have extraordinary minds, and I appreciate them for that.