Perfect\sterile data set creation

sfs · February 13, 2025, 9:33pm

Hi everybody! I don’t post a lot and this is my first post here. I had an idea of creating a system that will produce a “sterile data set” that could be ,maybe, who knows at this point, useful for machine learning in terms of helping it get itself rid of small inconsistencies, things that we can say but are not really in the realm of “things that exist:)”. Basically the idea is to create (relatively) small mathematical\logical rules that will output results that are, exactly like, or indistinguishable from, physical properties. Some of the logic is surprisingly useful (I think) and some would probably need a little bit of compute time. How accurate all this is (0 is an option) and in an off chance that it is, how far can it be pushed, is an open question. The whole approach seemed a bit childish a couple of years ago… now, given how much data and in what way, at least to my eyes, AI can understand, not so much. If anybody is kind enough to read this my question is mainly if this is the place to discuss something like this? English is not my native language so sorry in regards to anything related to that.
Not to over complicate things but to give some context- System is manipulating null values that in every sense of the word don’t exist, in various relations like ways, to create physical properties that should be indistinguishable from what is real. Even though the whole point is to make everything as simple as possible it’s still complicated so I will just list what I think is going well. Sequence of events. Sequence of events from a big body point of view and the arrow of it. Underlying explanation how point can repel and attract at the same time. And a bit more. Hopefully I’m not just imagining things or convincing myself to see some logic there. Again, thanks to everyone who didn’t give up after a couple of sentences:)

jochenschultz · February 13, 2025, 9:49pm

I make that. Use mine. It won’t have any political oppinions in it - I promise

_j · February 13, 2025, 10:00pm

What the AI summarizer had to say about the meaning behind your post:

sfs introduces a new idea on creating a system that produces a “sterile data set” aimed at refining machine learning by eliminating small inconsistencies. The concept involves using small, relatively simple mathematical and logical rules to generate outputs that are indistinguishable from physical properties. The post explains that by manipulating null values—in every sense of the word—across various relations, one might simulate actual physical properties and behaviors. The idea is ambitious, exploring areas such as the sequence of events from a macroscopic perspective, the arrow of time, and the possibility of simultaneously replicating both repulsive and attractive forces within a simplified framework.

While some logical parts of the proposal appear promising and potentially useful, the approach might demand considerable compute time, and its overall accuracy or potential limits remain open questions. sfs reflects on past skepticism toward such approaches but notes that the increased availability of data and the advanced capabilities of modern AI now provide a more favorable backdrop for this type of idea. The author also acknowledges that English is not their native language and apologizes for any language-related issues.

It sounds like you are discussing world modeling and learning and predicting from physical rules, rather than discussing possibly how to remove ambiguities in language.

For example, AI could learn to catch a ball through its failures, rather than being explicitly programmed to calculate trajectories. Or learn to drive around a child as road obstruction. These also operate much better in sterile rather than chaotic environments with unlearned phenomena.

You can take a look at past deep reinforcement learning work from the previous decade, before OpenAI and the rest of the world found generative AI and transformers.

https://spinningup.openai.com/en/latest/spinningup/rl_intro.html

sfs · February 14, 2025, 6:44am

Thank you for the reply! I’m a bit new to this-Where can I find The thing you created/developing?

sfs · February 14, 2025, 7:38am

Thank you for the reply! I cannot be sure of course but at this point I can make a guess- This approach will be far less useful for world modeling since it would take a lot of compute time to derive simple equations, which are known for almost a century anyway. So, for now, I think it will be useless for a lot of it. (Thank you for the article, I actually understood some of it) Why I suspect it might be useful in clearing up some inconsistencies or ambiguities is because when looking at the language part of the training data there is at least one thing that, in my mind, seems less helpful than it can be. A lot of facts are explained by simply naming things (It’s not just that, in reality, but it’s close enough for the argument). Some quick examples would be- fundamental particles with charge can repel if they are the same charge but attract if they are the opposite charge. And the reason for that is because it’s what we observed. I’s because of the charge. So one thing can repel and attract at the same time and that is because of the charge. In my mind the answer might as well be “we don’t know but we have a name for it”. So the idea is if you find a simple explanation that can be always calculated It will give a better fundamental understanding for any type of intelligence. Better chance to deduce things. Or why a property previously bound does not change until you “measure it”. I’m sure there is a beautifully complex and deep underlying explanation why that happens. But you can create a string of events that can be triggered by an interaction from a system and give a perfectly understandable explanation that is coincidentally very easy for Inputting in a computer. So something like that is what I’m trying to do. For now best case scenario is that it’s at very edge what I’m capable of doing anyway, so chances of success are slim! Opinion of other people is very important when you concentrate on one thing because you lose track of everything else. And if somebody already tried this, what was the outcome… These are the things I hope I will get in contact with by posting the progress.

_j · February 14, 2025, 9:03am

Have a read - The Cyc project’s core aim has been to build a massive AI commonsense knowledge base by manually encoding facts, rules, and relationships about the world—an approach that has been ongoing for decades, but certainly hasn’t changed the world for you.

sfs · February 14, 2025, 12:12pm

Thanks again! There is a lot to read here. I mean there is a lot more about it than in this article- It spanned over a couple of decades as far as I can tell. I read the article and asked AI to explain as much as possible in manageable amount of information. It leans towards “what people thought AI needs, to understand the world”. If you asked me if that approach is helpful I would have said yes and I would be wrong, obviously. Thankfully that’s not what I’m trying to do here. It’s more like giving meaningful explanations to what is already understood as an abstract concept. Or giving underlying logic to what is a “fundamental property”. Slim chance for success, if any. I will post here or wherever is a place for this type of thing. It will be obvious for people who are way longer in this than I am if I am on the right track or not. NOT- is a perfectly fine answer since I am spending time that I don’t have on this anyway

Topic		Replies	Views
Fine-tune a "BS detector" model Community	10	1496	August 11, 2022
Constructive Solid Geometry Generative Free Form AI Community	7	1832	June 26, 2021
Working with fine-tuning based on stackexchange answers Prompting	1	494	February 6, 2024
Symbolic reasoning, natural language pddl, knowledge bases API	13	1174	December 18, 2023
Use "private" dataset as basis for AI responses Prompting	29	2902	December 16, 2023

Perfect\sterile data set creation

Related topics