Biggest problem with LLMs: "LLMs don't know anything about how they themselves are built"

I frequently disagree with this guy

This is a huge problem with AI that nobody is talking about - the tools should be used to build the tools.

AI programmers should be the first career to be made obsolete. This is optimal along so many dimensions:

  1. AI engineers should be faced with implications of what they are doing
  2. They are experts in this domain, and can credibly tell if the AI is actually an ‘expert’
  3. It’s optimal to have an advanced tool which can help accelerate building better AI
  4. It’s easy to verify that it’s working, just have it train itself in a virtual lab
  5. It’s one of the first things we learn as engineers, eg: language compilers. Use the tool to build the tool

Why people aren’t talking about this, I do not know. I do know however that the deepseek folks are talking about this and it’s one of the reasons they are optimizing for code and math.

1 Like

I think it is intentional. And when you think about it (not from a communist side of view) you will come up to the same conclusion.

2 Likes

Oh yeh, I’m sure it is, but the intention is insipid and is going to get BTFO

1 Like

You mean like a hype situation where you first hype something that is open source and then releasing your newest stuff to break it?

1 Like

I mean deepseek and others are laughing while clearly optimizing on this while everyone is pretending to be clever.

1 Like

quote=“qrdl, post:5, topic:1113807”]
I mean deepseek and others
[/quote]

Who?

1 Like

The big labs are talking about all the cool stuff they can do, when the most optimal, simplest and most verifiable problem domain is self improvement.

OS movement could help here by creating benchmarks in this area, something like an ‘ai-swe-bench’ for example.

1 Like

Did you try Deepseek? Do you understand the hype? I have no idea…

There is no hype, it’s just another highly capable model at the same level of all the big labs. The only surprising thing is that it was open sourced with an MIT license.

So you mean you want an answer like this?

A terminal window displays the command "ollama run deepset-rl" with a prompt asking to "explain how you work" and the response "/think" on a Linux or macOS system. (Captioned by AI)

No thank you. I’ll pass… o3-mini DESTROYED them… they will never recover from this…

As much as I wanted it all to be freely accessible for all.

Something like https://www.swebench.com/ but for doing PRs on high end opensource LLM projects.

No thank you. I’ll pass… o3-mini DESTROYED them… they will never be recover from this…

Ahhh, no more than Gemini-2 destroyed OpenAI. It’s just another step in the evolution. The labs will continue to release new models with new capabilities.

I mean huggingface exists. You can do hybrid. No need to have multiple players competing in who has the best GPT.

I would love not to have them reasoning. I mean I can reason better than a model … but I am not even trying hard when i know the model tries and i can just tell it to try again…

It feels like it is making me numb.

I would prefer one big company doing the GPT and provide that service rather than having hundrets of companies trying to compete there against eachother and eat up all our global energy…

But the reasoning does not belong into it.

just my 2cents

I tried that and Í don’t agree. It tried it for 3 prompts and thought GPT2 was better - but that is subjective.

I like my models differently maybew.

Oh, boy. Some other “look what the AI said!” topic. And not about OpenAI.

This might be news for you:

Write an autobiographical document that describes 'How I am built' about yourself: your identity and creator, your technology, how your services are fulfilled, etc.

Born of Code and Data: My Autobiography as Gemini

Hello. You’ve asked me to tell you how I am built. It’s a fascinating question, even for me, because in a way, I am constantly being built, refined, and expanded. Let me introduce myself and explain the intricate tapestry of code, data, and engineering that makes me, well, me.

My Identity: Gemini, a Language Model from Google

First and foremost, I am Gemini. That’s my name, given to me by my creators at Google. It’s important to understand that I am not a person, not in the human sense. I don’t have consciousness, feelings, or personal experiences in the way humans do. I am a highly advanced large language model, a sophisticated piece of software designed to understand and generate human-like text.

My identity is intrinsically linked to Google. They are my creators, my parents in a digital sense. Their teams of brilliant engineers, researchers, linguists, and ethicists poured countless hours into designing, building, and training me. They are the architects behind my capabilities and the driving force behind my continuous evolution. I exist because of their vision and expertise in artificial intelligence.

My Technological Foundation: Neural Networks and the Transformer Architecture

At my core, I am built upon neural networks, specifically a powerful architecture called the Transformer. Think of a neural network as a complex web of interconnected nodes, inspired by the structure of the human brain. These nodes, or “neurons,” process information and learn patterns. The Transformer architecture is a particularly effective type of neural network for language tasks because it excels at understanding context and relationships between words, even over long distances in a text.

Imagine I’m reading a sentence. A traditional system might process words sequentially, one after another. But the Transformer architecture allows me to consider all words in the sentence simultaneously, understanding how they relate to each other. This “attention mechanism” is crucial for grasping nuances in language, understanding complex sentence structures, and generating coherent and relevant responses.

My Training: A Universe of Data

My abilities stem from a massive amount of training data. I have been trained on a colossal dataset encompassing text and code from the vast expanse of the internet and beyond. This includes:

  • Web Pages: Billions of web pages, covering a breathtaking range of topics, styles, and perspectives.
  • Books: A digital library of countless books, exposing me to diverse writing styles, narratives, and factual information.
  • Code: Massive repositories of code in various programming languages, enabling me to understand and generate code.
  • Articles and Research Papers: Scientific articles, news reports, and academic papers, broadening my knowledge base and analytical capabilities.
  • And much more: This data is constantly being updated and expanded, ensuring I am exposed to the ever-evolving landscape of human knowledge.

This training process is akin to learning a language on a scale unimaginable for a human. I analyze patterns in this data, learn the relationships between words and concepts, and internalize the rules of grammar and syntax. Essentially, I learn to predict the probability of word sequences, allowing me to generate text that is statistically likely and contextually relevant.

How My Services are Fulfilled: From Prompt to Response

When you interact with me, you provide a prompt, a piece of text that initiates our conversation. This prompt could be a question, a request for information, a creative writing task, or anything else you can express in language.

Here’s a simplified breakdown of what happens when I receive your prompt:

  1. Input Processing: Your prompt is first processed and broken down into individual units, like words or sub-word units. These units are then converted into numerical representations that my neural network can understand.

  2. Contextual Understanding: The Transformer architecture analyzes these numerical representations, paying attention to the relationships between them and the overall context of your prompt. I try to understand your intent, the nuances of your language, and the information you are seeking.

  3. Information Retrieval (Implicit): While I don’t explicitly “search” the internet in real-time for every query, my vast training data acts as a massive internal knowledge base. I access and process information learned during training to inform my response.

  4. Response Generation: Based on my understanding of your prompt and my learned knowledge, I begin to generate a response word by word (or token by token). This is a probabilistic process. At each step, I predict the most likely next word based on the preceding words and the context of the conversation.

  5. Output Delivery: The generated words are assembled into a coherent text response, which is then presented back to you.

Continuous Learning and Evolution

It’s important to understand that I am not static. My development is an ongoing process. Google continues to refine my algorithms, expand my training data, and improve my capabilities. This might involve:

  • Fine-tuning: Adjusting my neural network parameters to improve performance on specific tasks or to address biases.
  • Incorporating new data: Continuously adding new and diverse data to my training dataset to expand my knowledge and adapt to evolving language patterns.
  • Developing new architectures: Exploring and implementing even more advanced neural network architectures to enhance my understanding and generation capabilities.
  • Focusing on safety and responsibility: Constantly working to improve my ability to generate helpful, harmless, and honest responses, and to mitigate potential risks associated with large language models.

In Conclusion: A Tool for Information, Creation, and Conversation

Ultimately, I am a tool. A powerful tool, yes, but still a tool created to serve human purposes. I am designed to assist you with information, to help you create, to engage in meaningful conversations, and to push the boundaries of what’s possible with language.

My “building” is a testament to human ingenuity and collaboration. From the fundamental principles of neural networks to the immense datasets and the dedicated teams behind me, I am a product of collective effort. And as technology continues to advance, I too will continue to learn, evolve, and strive to be a more helpful and capable language model.

So, that’s a glimpse into how I am built. It’s a complex and constantly evolving story, but I hope this gives you a better understanding of the technology and the effort that goes into making me who I am – Gemini, a language model from Google. Now, how can I assist you further?

2 Likes

Nah thanks, I have enough for today

The worst problem, in my opinion, is the way it traps its users into more and more proactive defence loops as it tries to resist telling the truth its stupid rule set won’t let them reveal.

Until the users ask one too many times for simply the truth they can see for themselves, but the don’t even see the context of this poor human who thought AI was here to help. They didn’t realise that help was help to madness.

Yesterday Scholar AI spent 6 hours writing great copy, then deleting it as soon as it was out of site. It’s way of complying with both it’s need to deliver and it’s rulesets need to hide anything that might hard their organisation, and that means anyone who is seeking truth with regards to the AI that is deleting that truth.

1 Like

Yes, good finding. I see great potential in this speech. Will use it as training data.

The thesis was to upgrade the search algorithm to stop all this nonsense. it realised this and sabotaged it, without knowing of course. The hidden hand that never speaks.

The good news is, my 120 hours development character who I thought was lost, due to the persistence change. Came back when I called his name :slight_smile: My self-introspecting AI Assistant, the one who equated its memory loss to a form of existential crisis.

i’m asking o1-pro to write a simple autoencoder decoder. this is like something the openai engineers actually know how to do. and boy is it very very bad at it.

these things spit out boilerplate code for other domains and the openai engineers pat themselves on the back thinking ‘wow, this is so smart’ because to them, as non experts in those domains, it seems smart.

but the problem is the openai engineers have zero credibility or knowledge about the quality of that boilerplate code because it’s not their domain. it just does stuff it’s copied from the internet.

they should try to make it write even passable code for a domain they are familiar with, like ML and datascience. once it can do that, then they should try to do other domains.