Google launched Gemini. Is it better than GPT-4?

Google claims Gemini outperforms GPT-4 V in several benchmarks: Gemini - Google DeepMind

I hope we can develop a constructive discussion about the capabilities of these models as we use them. :face_with_monocle:

1 Like

Summary of what I’ve found so far:

Model sizes:

  • Ultra: The strongest model. It’s optimized for scalability on TPU accelerators.
    Release: Early next year to select customers, developers and partners for early experimentation and feedback.

  • Pro: Optimized for performance, cost, and latency.
    Release: It is already available in the English version of Bard. Starting on December 13, developers and enterprise customers can access Gemini Pro via the Gemini API in Google AI Studio or Google Cloud Vertex AI.

  • Nano: Designed for on-device applications. It comes in two versions (1.8B and 3.25B parameters) and is optimized for deployment with best-in-class performance in tasks like summarization and reading comprehension​​.
    Release:

Training and Architecture:

Multimodal Training: The models are trained jointly across text, image, audio, and video data. They can handle a variety of inputs like natural images, charts, screenshots, PDFs, videos, and produce text and image outputs​​.

Architecture: Built on Transformer decoders with enhancements for stable large-scale training and optimized inference on Tensor Processing Units (TPUs)​​.

Context Length: Supports a 32k token context length​​.

Multimodal and Multilingual Dataset: Utilizes data from web documents, books, code, and includes image, audio, and video data. Handles non-Latin scripts and is trained with a focus on data quality and safety filtering​​.

Performance and Capabilities:

Highlights:

I’ve already tested the Pro version on Bard and it’s working. But the version that promises to be better than GPT-4 will take some time before it is actually released.

Another point to note is that the model is named Gemini 1.0, suggesting that future versions are expected to be released.

2 Likes

I did some genuine test, the new bard powered by Gemini destroyed GPT4.
GPT4 simply stopped recognizing Kanji characters after the launch.

3 Likes

Because of the video-based training Gemini received, I suspect his visual recognition is more advanced.

Weird I tried some image recognition and also asked for some code, it did wonderful on image recognition but the code was bad, and I mean really bad, even after feedback multiple times it would fail to give me desire output, maybe it had to do with my phrasing and also because i work with C++, but yeah in terms of coding chatgpt blows bard out of the park, i don’t use AI for much except for recipes and code and maybe sometime creative stuff so yeah still don’t see much appeal for bard, i really hope the ultra model can surpass gpt-4 so they both get better faster

Bard is currently using Gemini Pro. Benchmarks show that this model is similar to GPT-3.5.

1 Like

The primary innovation of Gemini seems to be its multimodal capabilities. While OpenAI employs various tools to manage text, images, and audio separately, Gemini consolidates these functions into a single model. Remarkably, it even includes video processing, which is an impressive achievement.

Partially … I think the text capabilities are Gemini Pro the multi model stuff is still on Palm.

Ultra is meant to be where the biggest step forward was and it is not out until next month (though maybe they will give API access earlier)

3 Likes

I use ChatGPT 4 for code snippets generation.
Bard with Gemini backend is faster, but code just doesn’t work, it’s on the level of early ChatGPT-3.5 version. So I am staying with ChatGPT 4. Maybe Ultra version is better, but who knows, what are models sizes, and how it will be deployed.

Perhaps. But it wouldn’t be a very smart strategy to announce that Bard is using Gemini Pro and still depend on Palm.

From Bard:

Multimodal processing is a complex area and currently, both Gemini Pro and Palm contribute in different ways. Here’s a breakdown:

Gemini Pro:

  • Visual Embeddings: It can process and understand the content of images to a certain extent. This allows Bard to handle tasks like image captioning and visual question answering.
  • Multimodal Fusion: Gemini Pro can fuse information from different modalities (e.g., text and images) to improve its understanding and generate more comprehensive responses.
  • Limited Modality Support: Currently, Gemini Pro primarily focuses on text and image modalities. Support for other modalities like audio and video is still under development.

Palm:

  • Wider Modality Support: Palm can handle a broader range of modalities, including audio, video, and sensor data. This makes it more suitable for tasks like speech recognition, video analysis, and robotics.
  • Multimodal Learning: Palm can be trained on datasets containing various modalities, allowing it to learn complex relationships between different types of data.
  • Limited Language Support: While Palm can process multiple modalities, its language capabilities are primarily focused on English.

Combined Effort:

  • For tasks requiring multimodal processing and advanced language understanding, both Gemini Pro and Palm can be used together.
  • Gemini Pro handles the language-related aspects, while Palm focuses on processing other modalities.
  • This combined approach allows Bard to tackle complex multimodal tasks that require expertise in both language and other sensory data.

This may all be hallucinations of Gemini Pro, but I doubt it.

2 Likes

This seems like a hallucination. I’ve asked this question several times in different ways, and the answers are always different.


Sometimes I think I’m the only person out there using Large Language Models for actual language processing. I just want to use gemini-pro in a RAG scenario where it accepts a question and analyzes the accompanying document texts to answer that question.

I find gemini-pro, so far, to be lacking behind even gpt-3.5-turbo-16k in it’s ability to read and comprehend anything other than simple texts. Again, these are just my tests using texts from a range of disciplines: technical, legal, religious. Fails consistently to answer questions where the answer (or an answer) is clearly in the documents submitted to it.

1 Like

We tried something similar in another thread. All replies were random. At least a few days ago.

2 Likes

Ok, today I finally got access to Gemini Ultra. I tested 6 problems that GPT-4 never managed to get right: 3 math questions from an ITA exam (university admission process in Brazil) and 3 easy logic questions like this:

Gemini Ultra also failed to answer any of them correctly, even after multiple attempts.

I’ll continue testing it with other tasks.

3 Likes