Gemini Announcement is Manipulative Marketing

Google wowed the word with a hands-on video, but turns out that video was edited to make it look like their model could pull that off in real-time.

Then, Google claimed that Gemini beats GPT-4 in the MMLU benchmark, except they gave themselves 32 Chain of Thought versus GPT4’s 5-shot. Why not apples to apples?

Moreover, the model they launched for Bard is text only and on par with GPT 3.5 Turbo - meaning nobody can confirm their claims and benchmarks. They don’t plan to release the API for Gemini Ultra until next year.

So GPT-4 finished training ~October 2022, and Google needs to resort to such marketing practices to prove to the world that they can beat a year old model, really makes you pause. Also, they don’t release code, weights, or API (for Ultra - the benchmarked model) for whatever reason until next year - that means we can’t measure their other claims for accuracy.

This was a whole bunch of self-congratulation about catching up to last year’s OpenAI model, and using manipulative marketing to make the world think that their model is better than it is. No matter how awesome their achievements are, this casts a shadow on their work and will cause distrust with developers.

What do you think is the most appropriate adjective for this behavior?

  • Misleading: If Google presented things in a way that could give people the wrong impression about the capabilities of their model compared to competitors.

  • Selective: If they chose to compare against MMLU in a scenario optimized for their model over MMLU, rather than a fair side-by-side comparison.

  • Strategic marketing: Companies will often showcase their products in the most favorable light possible. This could be seen as a strategic choice by Google’s marketing team rather than an attempt to misrepresent.

  • Cherry-picking: Choosing specific scenarios or metrics that show their model in the best light, rather than providing balanced comparisons.

  • Manipulative: If there is evidence they intentionally presented things in a way to make their model seem more capable than it is compared to alternatives.

3 Likes

Thanks for pointing out. Would be great if you could share with us that of OpenAI… more interested in the second to be honest. Thanks again

1 Like

I also felt it overhyped, but it still seems like a great model (though we’ll see after 13th of December).

1 Like

LOL

Reminds me of their presentation last year where their model answered a question incorrectly. Afterwards they used the “opportunity” to explain how dangerous hallucinations are.

It blows my mind that all these companies are playing catch-up. Comparing themselves to the public GPT model. Even worse: the competitive model isn’t available.

I don’t even know what to call it. I’ve witnessed a lot of great conversations revolving this topic. You know, the whole “Transformers came from Google”.

I’m part of the ship that thinks that Google truly doesn’t know how to monetize it (without basically cannibalizing themselves) and really aren’t putting too much resources towards it.

So I think I’d call it “staying publicly relevant, let them find the gold and dig it up”.

2 Likes

I would say it’s pathetic what Google showed. It was lika saying: “We’re only one year behind OpenAI in technology we invented”.
But there might be one good thing - I hope competition* will improve the quality of OpenAI’s services…

*Bard can run code now as code interpreter do!!!

1 Like

Nothing brought me greater joy today than to look up on my phone by all major accredited tech journals showing the headline “Google faked their best demo.”

It’s funny, I knew something was odd with that demo. I was like “Okay, but seriously though, how are they streaming that data in real time? There’s no way it could be processing that data and responding to it frame by frame.” When I saw the cup trick I immediately had suspicions.

Also I loved how much prompt engineering they had to do to even elicit a decent answer. “Here’s 3 very distinct, obvious, recognizable hand gestures in a blank background. Hint: it’s a game. What could possibly represent this?”

Let’s be honest for a second: has anyone here had to explain to GPT-4 Vision what the object was they were trying to get GPT-4 to identify before it identifies the object? Does that sound like identification to you? Does that even count as identification?

If I prompt GPT-4 with an image of a hot dog and say “look at this food item. It’s long, goes inside a bun, and you typically put condiments on it. What is it?” How is that response demonstrating anything at that point?

I rest my case: Bard is still worse.

2 Likes

So, I’ve been trying to deploy gemini-pro in my RAG implementation which already includes gpt-4, gpt-4-turbo, gpt-3.5-turbo-16k and claude-v2.

The results have been very underwhelming. Very curious what other OpenAI developer are experiencing.

Gemini Pro, as I’ve heard so far, is actually confirmed by Google to be more on par with GPT-3.5. Gemini ULTRA or whatever the hell is supposed to be the GPT-4 competitor. The ultra version is not yet released. So your results actually line up with what’s been expressed already.

We’re OpenAI developers for a reason. The reason is simple - it’s better.

I don’t think many people are going to switch over to a worse model made by a worse company to give worse answers. Of course I try every model on the market to be fair and open-minded, but Google’s LLMs are consistently bad, and consistently fall behind OpenAI, from its release announcement to now.

I don’t know how else to respond to this in all honesty. Pick a feature, and you can see for yourself Gemini does it worse. Doesn’t take long for us to figure that out anymore.

But hey, it’s not as bad as Grok, so there’s that.

1 Like

OK, this is not an API issue but specific to the prompt screen in Google AI Studio. I have a Drupal PHP form that I am using to perform CRUD operations on a table. It’s a quick and dirty format that I use for most of my CRUD operations on custom database tables. It’s just under 400 lines long. I just don’t want to have to re-write the same code over and over. I mean, that’s what AI is for, right?

These are my exact instructions:

I am working in Drupal 10, php 8.2. Attached is a form class, SolrAIQueryForm.php which performs CRUD operations for the solrai_query table.
Your task is to rewrite this code to perform the same operations on the solrai_log table. The database schema for solrai_log follows. The view url for “List” is “/solrai-log”

And I provide both SolrAIQueryForm.php and the database schema for solrai_log table.

Given the above, both gpt-4-turbo and claude-2 refuse to generate the full code. They both return code outlines, despite repeated instructions to write out the code.

Gave the exact same instructions to Gemini Pro, and it wrote the entire code on the first try. I still have to test it to make sure it works, but my point is, WOW. If I can get the API to work like this, it might just become my new Gold Standard. Big Thumbs Up Google on this one. And, PHP code at that!