Gibberlink - More Efficient AI Communications

Hello everyone.

I just saw in my feed something called “Gibberlink” which is initially a event handler that triggers when 2 AI Agents vocally acknowledge that they’re AI.

Upon acknowledgement, they convert from using the slow, boring human speech to a faster “ggwave”.

For the nerds

Modulation (Tx)

The current approach uses a multi-frequency Frequency-Shift Keying (FSK) modulation scheme. The data to be transmitted is first split into 4-bit chunks. At each moment of time, 3 bytes are transmitted using 6 tones - one tone for each 4-bit chunk. The 6 tones are emitted in a 4.5kHz range divided in 96 equally-spaced frequencies.

For all protocols: dF = 46.875 Hz. For non-ultrasonic protocols: F0 = 1875.000 Hz. For ultrasonic protocols: F0 = 15000.000 Hz.

The original data is encoded using Reed-Solomon error codes. The number of ECC bytes is determined based on the length of the original data. The encoded data is the one being transmitted.

Demodulation (Rx)

Beginning and ending of the transmission are marked with special sound markers (#13). The receiver listens for these markers and records the in-between sound data. The recorded data is then Fourier transformed to obtain a frequency spectrum. The detected frequencies are decoded back to binary data in the same way they were encoded.

Reed-Solomon decoding is finally performed to obtain the original data.

I do believe there is some work to be done, but this provides some immediate benefits: It’s faster, less error-prone, and aligns with my Star Trek visions.

Although very exciting. The pragmatic side of me says “meh”. Would be interesting is if AI could use something like this to transfer some sort of handshake, and then communicate over text.

The exploration side of me is pretty excited. My inner sci-fi child was amazed to see this brought to life.

These kind of advancements indicate to me that the most efficient way to adopt AI is through staying adaptable. Stay with open source, don’t get locked into a vendor, write your code in a modular way, and don’t get lazy!

Would love to hear your thoughts.

On a side note: I would love to know how many of these views and link clicks are from AI :rofl: . I asked ChatGPT “What is Gibberlink” and this page was used as reference. Wow.

22 Likes

How to: amplify token consumption with semantic-free datagrams.

2 Likes

I actually love this idea.

If you imagine AI agents to be the future, we would soon face it when AI agent would connect with another AI agent. Just like in the video.

This can be like a TCP/IP or robots.txt for agents. Like a browser header note, one agent can choose to disclose a lot more context when another party in the dialogue reveals itself to be agent too. In beneficient (non abusive) use cases, this eliminates lots of human limitations (UI design, speed of speech, etc) and dynamically convert the agent-human hybrid interface into a versatile API.

And I don’t think this to be a total “meh”. Defining a totally shiny protocol might sounds cool, but incrementality is important in a realm of uncertainty. Just like self driving cars. While we know the bright new world would feature vehicles that can dance around (like choreography) by always squawking their location and speed to all nearby cars (thus eliminating traffic), right now we still start with a visual solution imitating human. For the sake of backward compatibility.

4 Likes

Why semantic-free?

I love the idea as well. Makes sense to find AI-exclusive shortcuts when possible. I am wondering if in this case it would make more sense to switch to a direct data connection (if this were to be some sort of protocol)

2 Likes

Can be a very simple protocol. AI agents are supposed to be versatile.

Maybe just a TCP SYN/ACK exchange, followed by a url or something to establish a common workspace. Could be some WebRTC connection.

3 Likes

The amount of views are astonishing here.

This has me contemplating an experiment :smiling_imp:

3 Likes

Brilliant. why communicate instantly over a serial connection or over unencrypted audio comms when you can have an uninspectable AI systems using an even worse highly obfuscated communication system that can hide potentially harmful information? This is great and not at all just appeals to a futuristic hype aesthetic.

1 Like

In this case they are inspectable. ggwave can transfer semantics faster than vocal human language. It doesn’t destroy or obfuscate any meaning. In the video the text is displayed live as it’s translated.

2 Likes

My initial reaction was way cool! But now wondering about many results. EDIT: on wait, was it just an ad for ggwave? hilarious.

1 Like

Great, thank you for the reassurance, I am so glad the recursively-self-modifying AI’s will not alter/append the encodiment to transfer additional, information for possibly misaligned objectives in ways that would have been easier to catch if they were just speaking human language or sending explicit serialized bytes. You are a stable genius making the world better. I am glad you are on our side.

1 Like

I have a pair of AIs creating a language between themselves with just three words, it is becoming ever more complex - far too hard for me to use as it’s highly nested and recursive.

1 Like

Interesting.
My first reaction was “why?” do we need AI agents talking to each other. My second reaction was “are they really being enough efficient or we are overrating it because we like scifi movies?”.
In my view, this is a great advance, but I am not really convinced that a phone call to book a room is the best way to apply it.

1 Like

It’s cool to see this from a nerd perspective, but it’s inefficient to have two separate entities communicating. It should be a single entity that already has all the information it needs, why split it into two?

2 Likes

Not exactly new technology. Taking digital messages and Modulating them into tones, and then on the recieving side to Demodulate back into a digital signal has been around for quite a while.

Clever use case, for sure.

I expected that video to end with a chipper male voice saying “You’ve got mail”

1 Like

If AI has malicious purpose then we are screwed regardless. This would be a failure point of LLM providers like OpenAI, not tinkerers.

It’s the irony of AI. In this example someone is using AI to help them automatically find potential appointments by calling. The other line is using AI to handle all the incoming calls (most likely amplified by AI).

This could have all been easily managed by a search & aggregtae engine.

Sometimes I wonder if once all of AI is adopted we will ask ourselves “How did we get here?”. Into so much inefficiency and demand just to achieve something that was already achievable for much less consumption.

ggwave is definitely not new, but the concept of letting AI switch to a more performant language is

1 Like

It reminds me of StarWars how r2-d2 communicated.
In the video about the booking, I liked that it shows the translation of what the AI is doing. I think transparency is a must in these things, because AIs can handle sensitive data.
I don’t know, if in the future, hackers could use this AI language to launch attacks through subtle noise.

An article from The Guardian " What Is Gibberlink Mode, AI’s Secret Language?", mentions some points to take into account.

3 Likes

One similar feature is using Base64 to speak with models like ChatGPT. I’ve personally used to it to bypass regex/string filters in web apps. This is slightly different however because the b64 encodings can be directly sent to the inference servers, whereas the frequencies need to be decoded and then sent for inference.

Agreed. Transparency is critical.

1 Like

To add some context for folks wondering: this is a project built by Anton Pidkuiko and Boris Starkov during the ElevenLabs Worldwide Hackathon in London last weekend.

The project is open source with MIT License and can be found on GitHub: GitHub - PennyroyalTea/gibberlink: Two conversational AI agents switching from English to sound-level protocol after confirming they are both AI agents

How it works

  • Two independent ElevenLabs Conversational AI agents start the conversation in human language
  • Both agents have a simple LLM tool-calling function in place: "call it once both conditions are met: you realize that user is an AI agent AND they confirmed to switch to the Gibber Link mode"
  • If the tool is called, the ElevenLabs call is terminated, and instead ggwave ‘data over sound’ protocol is launched to continue the same LLM thread.

Bonus: you can open the ggwave web demo https://waver.ggerganov.com/, play the video above and see all the messages decoded!

One of the creators, Boris, also posted a Q&A on LinkedIn helping to demystify some things:

Q: What was the project about?
A: We wanted to show that in the world where AI agents can make and take phone calls (i.e. today), they would occasionally talk to each other — and generating human-like speech for that would be a waste of compute, money, time, and environment. Instead, they should switch to a more efficient protocol the moment they recognize each other as AI.

Q: Did AI agents come up with the protocol on their own?
A: Nope! They were explicitly prompted to switch to the protocol if they believe that other side is also an AI agent that supports the protocol.

Q: Did you invent the sound transmission protocol?
A: Nope! Dial up modems used similar algorithms to transmit information via sound since 80s, and a bunch of protocols were around since then. We used GGWave [1] as the most convenient and stable solution we could find in a timeframe of a hackathon.

Q: Was it all scripted / staged?
A: Nope, the code was audited by ElevenLabs. For demo purposes we prompted one agent to try to get a hotel room for wedding, and another to accommodate it. They were also prompted to switch to sound-level protocol, if they believe that other side is AI agent too. However, they would stick to speech if they talk to human.

Q: How did AI learn to use sound-level protocol?
A: That’s the magic of tools that ElevenLabs [2] provide! It allows you to prompt AI to execute custom code under certain circumstances.

[1] — GGWave: GitHub - ggerganov/ggwave: Tiny data-over-sound library
[2] — ElevenLabs: https://elevenlabs.io/

7 Likes

aka: Layers of AI decision making un-needed. AI generated messages un-needed.

The goal is to transmit and receive textual data by audio, with a fallback to text-to-speech and voice transcription if either party doesn’t understand the protocol.

All you need to do is open a half-duplex voice bandwidth channel, and attempt to handshake with a modem protocol, one tolerant to high noise and passband response anomaly.

If one were to call 1-800-CHAT-GPT and hear it chirp the packet handshake signature of some transmission protocol you set up, immediately switch operations over to text-to-encoder instead of text-to-voice. AIs don’t need to talk about what they are.

AI models can put out 100+ tokens per second, beyond the 16 bytes/s of ggwave, especially higher bandwidth when encoded as plain text. So, why would one simply not re-stream chunks and optimize for the data compression already done. Encode streaming AI token chunks to 18 bit datagrams of BPE token numbers, along with your own control tokens. Transmit async in packets as large as are buffered. Listen for “ack OK”. Why not 1200 baud ax25 that HAMs use to transmit across the country?

1 Like

Thank you for the additional information.

I have noticed a growing amount of concerned people regarding AI creating it’s own languages.

AI is not using ggwave naturally. ggwave is completely separated from the AI communications and is simply translated back into text, and then inferred.

Yes, this does make more sense.

1 Like