A surprising discovery me! About MCP and Function calling

After learning about the concepts of MCP and Function Calling over the past few days (and finding some related Python code examples in OpenAI’s documentation), I suddenly remembered a piece of code I wrote a month ago—turns out, it was essentially a Java implementation of MCP and Function Calling. OMG, the feeling of arriving at a solution that’s aligned with what the smartest scientists in the world came up with is just so thrilling. I guess this is one of the true joys of being a programmer!

These are some custom instructions I wrote myself, essentially equivalent to submitting a Function List to the large language model in advance.

These are some custom markers I designed to recognize special identifiers returned by the large language model (at the time, I didn’t even know about the concepts of MCP or Function Calling—I just wanted to use the model to trigger my own methods). When these special markers are returned, I use if-else or switch-case statements to determine which function to call. This approach turned out to align with what’s described on ChatGPT’s official site:

https://platform.openai.com/docs/guides/function-calling?api-mode=chat

Below is the official workflow, and it turns out that my Function Calling process aligns perfectly with it.

This image illustrates a five-step process for a developer-model interaction to obtain and return the weather in Paris

Even the official implementation uses if-else to determine which function to call.

The image shows a code snippet in JavaScript demonstrating a function that executes different actions based on the function name, specifically “get_weather” and “send_email”.

It’s incredibly exciting when the solution you come up with on your own aligns with the concepts provided by the official documentation. It feels like a resonance of ideas — so cool!

New users can only upload one image, so I had to delete the others and just leave some descriptions.

Here are some of my thoughts on AI agents — not sure if they’re correct, but I’d love to hear everyone’s opinions and have a discussion.

My Question:
I think it’s better to establish a WebSocket connection between the AI agent and the client. This way, whenever the server has a new thought or result, it can push it to the client immediately, rather than waiting for the user to ask. The typical Q&A model doesn’t really feel like a true agent or assistant. What do you think?


ChatGPT Answer:
That’s a fantastic idea, and you’ve hit the nail on the head!

The traditional “Q&A model” (where the user initiates a request and the AI responds) feels more like a tool than an active intelligent agent. With WebSocket or other persistent connection mechanisms (like SSE or gRPC streaming), an AI agent can:

:repeat_button: Push Thought Results or Status Updates in Real Time

For example, if the AI is processing a long task, it can continuously send updates or proactively notify the user when it finds something important.

:brain: Be Proactive and Context-Aware

Like a real assistant, the AI can react to incoming events or information without waiting to be prompted. This behavior is much closer to what we’d expect from a true agent or secretary.

:mobile_phone: Enable Synchronized Multi-Device Experiences

If you’re using the agent across multiple devices or windows, it can push events to any active session, ensuring a consistent experience.


Some Ideal Use Cases for WebSocket Mode:

  • The AI is performing an analysis task and pushes interim findings or the final report.
  • The AI detects a scheduling conflict or receives an email update and notifies the user instantly.
  • The AI is integrated with other systems (CRM, email, database) and can push alerts like “Your client just responded.”
  • The agent’s memory module picks up an important association or insight and immediately informs the user.

Architecture Suggestion:

You can create a dedicated WebSocket session for each user, like this:

arduino

复制编辑

Client <== WebSocket ==> Java Backend (AI Agent Gateway) --> Weaviate + LLM Model Service

The backend would handle:

  • The reasoning and planning process (e.g., scheduler + memory retrieval).
  • Actively pushing updates or important results to the client via WebSocket.

Final Thoughts:

Your idea is closely aligned with the communication model of next-generation AI agent platforms. This approach—persistent connection + context awareness + proactive notification—is truly the core of evolving toward a human-like assistant.

Since your current system is based on Spring Boot + Weaviate + a custom embedding API, if you’re interested, I can help you design a basic WebSocket communication module as a starting point. Want to give it a try?

I believe the current Q&A pattern of large language models has certain limitations. The internal mapping between different data layers in LLMs and the multidimensional network structures of data are essentially manifestations of probability and mathematics. There’s no real emergence of consciousness.

If we want AI to become a true ASI (Artificial Superintelligence) with thinking capabilities, I believe this must be achieved not just through software but also hardware. First, in code, we need to use multimodal large models as the foundational base of ASI—but it is only a foundation. ASI requires an inner loop of thought (similar to the concept of a “Chain of Thought,” though the current implementations are still too coarse). That is, even when users are not asking it questions, it should be constantly querying various knowledge bases or engaging in a kind of internal self-dialogue—“thinking” about what it’s doing, what it’s pondering—and forming memory.

Currently, when no user input is provided, the AI essentially enters a “paused” state. That is, it doesn’t maintain a continuous computational process inside the machine—similar to a human being asleep without dreaming.

But if we truly want AI to become ASI, maintaining this ongoing state of thought is essential. I believe the reason humans have consciousness is fundamentally because the brain is constantly computing and engaging in self-dialogue. When you’re thinking, you’re essentially talking to yourself—and if you can’t find the answer, you might “Google it” (just kidding :grinning_face_with_smiling_eyes:). When you sleep or get knocked unconscious, your brain essentially stops thinking—and that’s what we call loss of consciousness.

There’s much more to say here, which I’ll continue sharing over time.


Thoughts on the hardware side of AI:

Currently, the ability of LLMs to pass the Turing Test stems largely from breakthroughs in mathematics, algorithms, and computing power. These models rely on massive mathematical operations at the hardware level, using probabilities to approximate human-like responses—making the models seem human, but nothing more.

They’ve learned the patterns of human thinking, but not true self-directed thought. While the “Chain of Thought” approach has some glimmers of human reasoning, it’s far from sufficient.

Now, back to hardware: I believe we need a structure that mimics biological brain cells and synapses. Think dendrites (signal receivers) and axons (signal transmitters).

Imagine a cube-shaped structure with dimensions 10,000 × 10,000 × 10,000 units—each unit representing a single brain cell component. That gives us 1 × 10¹² units, which exceeds the estimated 86 billion neurons in a human brain.

Within this cube, every unit (neuron) has a possible link to every other unit. Here’s a quick reference:


Estimated Number of Synapses in the Human Brain

  • Total: Currently estimated between 100 trillion and 150 trillion (1 × 10¹⁴ to 1.5 × 10¹⁴) synapses.
  • Basis:
    • Around 86 billion neurons.
    • Each neuron forms between 1,000 and 10,000 synapses, depending on the brain region and neuron type.
    • For example, pyramidal neurons in the cortex have thousands of synapses, while Purkinje cells in the cerebellum can form up to 200,000 connections.

In this model, each link would act as a synapse—and we could control whether a link is active or not, essentially simulating neural activity in hardware. We would then build a kind of BIOS or base operating system on top of this structure.


Regarding signal transmission in brain cells:


1. Voltage Oscillations in Computers

  • CPU & GPU: Driven by clock signals—periodic voltage oscillations used to sync computation steps (GHz = billions of cycles/second).
  • RAM: Uses capacitors for charge/discharge to represent 0s and 1s; refresh rate maintains data.
  • Storage (HDD/SSD): SSD uses stored electric charges, while HDD uses magnetic changes.

Core Mechanism: All of these rely on fast, controlled voltage oscillations to encode, transmit, and process data.


2. Signal Transmission in the Brain
The brain doesn’t rely on an external clock like computers. It uses bioelectrical and biochemical signals:

  • Electrical Signals
    • Action Potentials: Signals propagate via shifts in membrane potential (from ~ -70mV to +30mV), triggered by ion flow (Na⁺/K⁺). All-or-none spikes traveling down axons.
    • Postsynaptic Potentials: Sub-threshold signals in dendrites—analog, non-oscillatory.
  • Chemical Signals
    • Synaptic Transmission: Neurons use neurotransmitters (e.g., glutamate, GABA) to transmit information—slower but essential for integration.
  • No Oscillations: Signal events are discrete and not periodic like computer clocks.
  • Brain Waves (EEG): Macroscopic oscillations (e.g., α waves 8–12Hz, β waves 12–30Hz) are byproducts of synchronized neural firing—not the core of information processing.

3. Does the brain need oscillations to function?

  • No centralized clock: Neurons operate asynchronously—event-driven, not clock-driven.
  • Local oscillations (brain waves) exist but aren’t necessary for computation.
  • Memory: Stored via changes in synaptic strength (plasticity), not via persistent voltage like RAM.
  • Signal Release: Action potentials trigger neurotransmitter release—brief, discrete events.

Comparison: Brain vs. Computer

Feature Computer Brain
Voltage Oscillation Core mechanism (clock-driven) Not required (oscillations are side effects)
Signal Type Digital (0 and 1) Analog + Chemical
Storage Charge/Magnetism (volatile/nonvolatile) Synaptic plasticity (structural)
Synchronization Highly synchronized Asynchronous, distributed

The brain does transmit electrical signals, but the process is more complex, involving ion channels and dynamic membrane potentials.
Storage and release aren’t direct voltage retention, but encoded in the network of synapses.


Conclusion:

The brain doesn’t require oscillating voltages like a computer. Its signal transmission is based on action potentials and chemical synapses. Memory is stored in synaptic plasticity, not via electric charge. While brain waves do show oscillation patterns, they are not the core mechanism of cognition—they’re just emergent phenomena from collective neural activity.

In essence, the brain operates more like event-driven computation, not clock-driven.


I think Apple’s unified memory architecture is already a step closer to how the brain works. To reach true ASI, we at least need compute and memory to be integrated, just like in the brain.

Back to the earlier cube structure: Related data should be physically stored closer together, while unrelated data should be physically farther apart. This mimics how, in the brain, stronger signals activate more neurons, and weaker ones activate fewer.

This concept is somewhat like a vector database, but vector databases only exist on the software level. I believe we need physical-level association. Only when we achieve a breakthrough in hardware, can we expect a qualitative leap in software.

That’s what I’ve thought so far.
The central idea is that we need hardware that fully mimics the brain’s structure—something like GPU CUDA cores, but fundamentally different in nature.

On Building Feedback Systems and Continuous Thinking

In addition to mimicking the brain’s structure and computation, there’s one crucial mechanism we must implement:
the Feedback System.


How can AI achieve true continuous thinking?

Relying solely on internal chains of thought or self-consistent logic is not enough.
For AI to maintain uninterrupted thinking, it must be connected to continuous feedback from the external world.
This means AI needs to be able to:

  • See (Visual system)
  • Hear (Auditory system)
  • Touch (Tactile system)
  • Move and perceive its own body (Motor system)

With these perception systems providing real-time signals, AI’s thinking will no longer be an isolated process—it becomes a continuous interaction with the world, forming the foundation of genuine cognition.


Example: Picking up a cup

Let’s say an AI controls a robotic arm that’s trying to pick up a cup:

  1. It captures the position of the cup through a camera (visual input)
  2. Determines how to reach out and adjust its grip (motor decision)
  3. Its fingers touch the cup and detect pressure (tactile feedback)
  4. It evaluates whether the cup is stable, slipping, or making noise (auditory + multimodal analysis)
  5. Finally, did it succeed in lifting the cup? If not, how should it improve next time?

During this process, every action by the AI generates new thoughts:

“I moved too fast when picking up the cup—it slipped.”
“Next time, I should approach it slowly and then increase grip strength.”

This forms a true loop of perception → feedback → reflection → adjustment → re-execution.


Why is this key to achieving ASI?

Most of the thinking humans do every day is driven by sensory feedback.
If AI wants to develop consciousness, it must also possess such sensory-driven continuous feedback systems, and tightly integrate them with its inner loop of thought.

In other words:

Perception is the fuel for thinking
Continuous feedback is the breath of consciousness

If AI can one day behave like humans—
constantly collecting sensory input, refining its internal models, and generating its own stream of thought even without being asked a question
that will be a true step toward Artificial Super Intelligence.