Gemini live today in Bard

Haha so much. It’s kind of cool, though, because by the time you have figured it all out (and are fighting with the last 10%) some heavies realease a perfect version and you already understand exactly how it works from the ground up and how to use it.

I use function calling to validate the code and that calls a function to save it to a file and from there its in the ide and good to go. Up until recently I was getting full classes of working code.

It was getting so good I let it run shell commands on an airgapped machine I didn’t mind wiping. It did make my hair stand on end before I hit go. Code interpreter style fixing it’s own errors and access to the command line means it could go seriously nuts.

No problems. It would look around for stuff, put things in the right place etc.

python program run other python programs but now i am thinking some sort of unix script

This is an interesting question. exec() is cool and works, but not to my taste. I ended up using subprocess.run() and loving it. So much flexibility. Python calling the shell calling a shell script calling freshly generated python… So badass. It was having that setup in place i was like “Hmmm… should i let gpt4 at it? Yep!!” Just did some tests/experiments definitely not doing that regularly.

(re: subprocess.run() If anyone sees this and doesn’t know what this means read: Very dangerous.)

Asking questions against and receiving responses from large and diverse datasets. So far, religious texts, government regulatory texts, labor contract texts and board policy texts. All embedded with text-embedding-ada-002, (Weaviate tex2vec transformer), retrieval using Weaviate nearText queries and gpt-4 to analyze and summarize results.

Speed is important, but I am more concerned with accuracy of results. In the same exact scenario, gpt-3.5-turbo doesn’t just perform less than gpt-4, it performs poorly. I have examples of gpt-3.5-turbo not able to understand the text right in front of it. Almost like it can’t even see it.

If gpt-3.5-turbo is working in your environment, more power to you. But in my environment, it has failed, and failed consistently, to return answers that one would expect in a professional setting.

I guess it comes down to perception. I’m not sure that I personally would consider false information provided by an LLM to be a lie unless there was an intention to deceive. Your perception might be different. I’ve had Claude (v1.0) lead me down a path that I would certainly consider to be a lie. After a bit of pushing on my part it admitted it wasn’t able to do a simple coding task and had been telling me it was working on it. Conceded it told me that because it was trying to please me. Harrumph.

1 Like

Yes, GPT 3.5 does not work properly in many instances. But GPT 4 works well in most of the cases. Did you try GPT-4 ?

Yes. It works great in every instance – so far.

Absolutely. It’s all about accuracy and consistency. LLMs will make mistakes, that’s fine, but the way you handle 1% error rates and 25% error rates is totally different.

Almost like it can’t even see it.

I think this is pretty insightful. Well put.

In all fairness to Google Gemini Pro, I finally tried it out today. I asked it how to format the json request object to make calls to it’s API, and it failed (hallucinated) miserably at that.

However, using the API, I asked it this question using the same exact parameters as I did with gpt-3.5-turbo-16K: Gpt-3.5-turbo-16k api not reading context documents

It answered it as gpt-4 always did. It is the only test I’ve performed so far, but, it is encouraging.

Addendum: As I continue testing gemini-pro, I find that, at least in my environment, it, so far, comparing apples to apples, does not even surpass gpt-3.5-turbo-16k – at least in it’s ability to perform as a RAG LLM. It’s task is to read the returned context documents and respond to the question posed.

This is an example of the kind of results I have seen so far:

The answer is clearly in the first 2 or 3 documents in every response, but Gemini Pro consistently fails to either read or comprehend the documents.

Now, a mitigating factor may be the the structure of the Gemini Pro API request body, which is essentially this:

// Define the request data as an associative array
$requestData = [
    'contents' => [
        [
            'parts' => [
                [
                    'text' => $text
                ]
            ]
        ]
    ]
];

Compare that to OpenAI’s Chat Completion API:

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Who won the world series in 2020?"
      },
      {
        "role": "assistant",
        "content": "The Los Angeles Dodgers won the World Series in 2020."
      },
      {
        "role": "user",
        "content": "Where was it played?"
      }
    ]
  }

OpenAI’s API separates the roles, whereas Google’s API forces you to describe everything in the “text” element, which, if you are sending system and user instructions as well as context documents, could get a little confusing.

I am really anxious to hear what other experiences are with this new API.

Thanks much, moonlockwood. LOL set up on an airgapped machine! Despite the danger, it seems to me that something of this sort is inevitable because it increases productivity by such a big amount. i think i may ask chatgpt if there is way to run unix commands safely:)

1 Like

I’ve found to be pretty much a potato. But when it had the Gemini rollout I went back to it. It might even be worse for code. It is always wrong. Or also for an example I asked it for a way to covert markdown to html and it suggested a bunch of libraries, but I just wanted a simple function, which it was unable to provide. I then asked chatGPT (free account 3.5) and it instantly had a function that worked with no tweaks. So I told Bard that chatGPT was able to create a function, and Bard asked me to share the function with it! LOL. Maybe Bard should use the chatGPT API in the backend so it can actually function. Anyway, for now, Bard is okay for chatting, useless for coding.

yeah, sounds cooler than unplugged the ethernet cable…

Mostly worried about mistakes that could cause wider damage, not trying really worried about it trying to take over the world or anything

seems to me that putting chatgpt or equivalent into a loop to solve math, engineering, and coding problems will soon become standard practice. These guys put an AI into a loop to solve a previously unsolved mathematics problem called the “cap set problem”. They even give a shoutout to the famous mathematician Terence Tao.

Google DeepMind used a large language model to solve an unsolved math problem | MIT Technology Review.

Chatgpt tells me that unix has a “while” command. If my unix script is limited to while, file read, and file write, that should be pretty safe, i think. Then I just need to keep the paths safe to whatever vscode project directory. probs would be very helpful if there was a package manager software that could help with this.

1 Like

Gemini Pro now resides at #12 of the HuggingFace Leaderboard, 1 above GPT-3.5-turbo-0314 and 5 below GPT-3.5-turbo-0613.

3 Likes

The arena ranking is full of surprises:
Claude 1 > Claude 2
And Gpt 3.5 Turbo 0613 (of all the snapshots) > Gemini Pro.

But do I understand correct that at least the pro model from Google will most likely keep improving on ratings?

1 Like

Interesting that it’s 4 below Mixtral-8x7B too, which (being open-weights) has many advantages over a proprietary model.

Google’s main advantage here though is the multi-modal capabilities. Curious if anyone has used the API or tested the vision capabilities?

1 Like

I used the API, given the fact that it’s “free”, they probably sell every image you give them. So, I haven’t tried out vision.