Semantic Chunking of Documents for RAG - API Tool Launch

Hi everyone,

I wanted to share the API we will be releasing soon that semantically chunks long texts to prepare them for RAG engine embeddings.

You might have read already some of my posts on the subjects (since 2022, there have been a bunch of them), so today I’m “officially” ready to announce the pre-launch of the API that does this job for us already. As an internal tool for now, but the plan is to put it out there and let you guys use it to skip semantic chunking bootstrap in your projects.

Here is the link: https://www.simantiks.com

Some examples of what it does: Examples | SIMANTIKS

Our team is currently fully taken by another project LAWXER (legal doc analysis), so we have to seek for investment and/or pre-sales to hire extra developers to finalize the web platform. If you see how this is useful to your projects, please support the launch and share the message to your networks to help us speed up the launch.

Pre-Sale Offers: Pre-Sales | SIMANTIKS

Sure, I’m around to answer any questions (including helping you with your issues :wink: )

Thanks for your support.

5 Likes

Hiya Everyone,

I’ve gotten to see the simantiks.com “atomizer” in action.

It’s pretty sweet, and I think it can play a critical role in later steps involving advanced reasoning.

Background

I’m coming to AI from a Creative perspective. My strengths include writing and illustrating. I’ve won an essay contest or two.

When you submit a literary essay, it is devoid of any personally identifying marks or comments—they even prefer it in a certain font. So, when you’re just looking at the words I put on to paper, I’m pretty good at semantically structuring philosophical argument in natural language.

I first came across Simantiks’ ideas in the Using GPT 4 to Semantically Structure Arguments Thread. Here, he discusses several core ideas surrounding the identification of “Atomic Ideas.”

After some initial misunderstanding on my part on what constituted an Atomic Idea, which I was confusing for a related concept, having seen the product in action, I can confirm it does something essential to the formation of complex ideas and argumentation.

Experimenting with Emerson

To demonstrate the product, Serge used “Compensation” by Ralph Waldo Emerson. I am very familliar with this essay.

The product produced two jsons that a) chunked the full essay and b) identified atomic ideas in something like 2 minutes. You can see it here on github.

The highlight of the json is this, the Outline field, which is the summation of the process:


[
    {
        "class": "Element",
        "properties": {
            "title": "",
            "name": "",
            "content": "",
            "outline": "Section sans titre:\n   Introduction to the Doctrine of Compensation\n     Temporal Dichotomy: The Contrast of Day and Night,\n     Natural Balance Between Mountains and Oceans.\n     Feud of Want and Have in Relation to Natural Phenomena,\n     Measurement of Spatial Variations by Electric Star and Pencil.\n     Symbolic Representation of Earth's Solitude and Insignificance in the Universe,\n     Metaphor of Man and Wealth as Elm and Vine Indicating Strength and Deception ;\n     Reassurance of Divine Protection for the Vulnerable,\n     The Rewards of Effort and Nature's Gifts,\n     Desire to Write a Discourse on Compensation and Its Perceived Superiority Over Theology.\n     Confirmation of Doctrine of Compensation through Sermon on the Last Judgment.\n     Questioning the Interpretation of the Doctrine of Compensation in the Afterlife.\n     The Misinterpretation of Deferred Justice and Retribution\n     Critique of Preacher's Concession and Misguided Notion of Success.\n     Critique of Contemporary Religious Works and Popular Theology.\n     Introduction to the Law of Compensation Chapter.\n     Description of Polarity and Dualism in Natural Phenomena.\n     Principle of Dualism and Magnetic Polarity.\n     Principle of Dualism and Compensation in Natural and Biological Systems.\n     Illustration of Mechanical Forces Theory: Power-Time Trade-off.\n     Astronomical Errors and Their Compensatory Nature.\n     Influences of Climate and Soil on Political History.\n     The Dualism of Human Nature and Balance of Excesses and Defects.\n     Natural Equalization of Social Conditions and Circumstances.\n     Nature's Role in Softening the Morose Citizen Through Familial Bonds.\n     The Cost of Presidential Power and Public Perception.\n     The Sacrifices and Responsibilities of Genius and Eminence.\n     Immutable Nature of Universal Law and its Influence on Governance.\n     Consequences of Government Actions on Social and Legal Dynamics.\n     Human Fulfillment and Satisfaction Independent of External Conditions.\n     The Universality of Character Influence Across Different Governments and Historical Periods.\n     Universe Representation in Every Particle and Unified Nature in Metamorphosis.\n     Repetition of Universal Characteristics in Each New Form.\n     Holistic Representation of Human Life in Occupations and Trades.\n     Analogy of Perfection and Completeness in Small Entities.\n     Doctrine of Omnipresence and Universal Balance.\n     The Living Universe and the Moral Law\n     The Immediacy and Universality of Justice.\n     The Inherent Balance and Reciprocity of the Universe.\n     The Inescapable Nature of Retribution and Justice.\n     Doctrine of Causal Retribution and Inseparability of Cause and Effect.\n     Conflict Between Holistic Worldview and Human Partial Actions.\n     Human Ingenuity in Separating Sensual from Moral Qualities.\n     Contrast Between Spiritual and Physical Desires.\n     The Soul's Pursuit of Comprehensive Existence and Enrichment.\n     Individual Ambitions and Misconceptions of Greatness.\n     \"Failure of Attempts to Divide and Detach Elements from the Whole\".\n     Inherent Interdependence of Pleasure, Profit, and Power.\n     Indivisibility of Sensual Goods and Natural Conditions\n     Inevitable Conditions of Life and the Futility of Denial.\n     Consequences of Evading Life's Inevitabilities and Their Resulting Retribution.\n     The Inescapable Failure of Separating Good from Tax and the Consequences of Intellectual Corruption\n     Role of the Human Soul in Reflecting Truth Through Various Forms of Expression.\n     Greek Mythology: Rationalizing the Deeds of Jupiter.\n     Prometheus and Minerva's Control Over Jove's Secrets and Thunders.\n     Confession of Universal Influence and Moral Purpose.\n     Ethical Consistency in Indian Mythology and Moral Imperative in Fables.\n     The Myth of Aurora and Tithonus' Immortal Aging.\n     Achilles' Vulnerability Due to Unwashed Heel.\n     Siegfried's Vulnerability Due to Leaf Covering Mortal Spot.\n     Inevitability of Imperfection in Creation.\n     Doctrine of Nemesis: Unavoidable Retribution and Inescapable Laws in Nature.\n     Doctrine of Nemesis: Universal Vigilance and Inescapable Retribution.\n     The Furies as Attendants on Justice and Cosmic Retribution.\n     Mythological Allegories of Sympathetic Retribution in Objects.\n     Account of Rival's Death Due to the Fallen Statue of Theagenes.\n     The Divine Inspiration and Collective Spirit in Writing.\n     Critique of Historical Figures' Influence on Artistic and Cultural Evolution.\n     The Universality and Authority of Proverbs in Expressing Absolute Truths.\n     Principle of Reciprocity and Consequences in Proverbs.\n     \"The Dominance of Natural Law Over Human Intentions\".\n     The Self-Judgment and Consequences of Speech.\n     Consequences of Pride and Exclusion in Social and Religious Life.\n     Consequences of Departing from Simplicity and Equity in Social Relations.\n     The Role of Fear in Revealing and Avenging Social Injustices.\n     Expectation of Change Following Suspension of Voluntary Activity.\n     Value of Paying Obligations Promptly and Consequences of Excessive Frugality.\n     Borrower's Self-Incurred Debt.\n   Question on the Net Benefit of Receiving Favors Without Reciprocation ?\n   Questioning Gains from Borrowing Neighbor's Property Through Indolence or Cunning ?\n   Acknowledgment of Benefit and Debt in Transactions Altering Mutual Relations.\n   Realization of the True Cost of Dependency and Asking for Favors\n   Prudence in Fulfilling Obligations and Avoiding Deferred Debts.\n   Reciprocal Obligation and the Natural Law of Benefits.\n   Prudent Advice on the True Cost of Labor.\n   Application of Practical Wisdom in Everyday Purchases.\n   Importance of Employing Skilled Professionals for Efficient Management of Various Tasks.\n   Inescapable Consequences of Dishonesty in Labor and Life.\n   Intrinsic Value of Labor: Knowledge and Virtue as Inimitable and Unstealable.\n   Necessity of Genuine Mental Effort and Pure Motives for Achieving Labor's True Goals.\n   Inability of Dishonest Individuals to Acquire Knowledge Through Deception.\n   Principle of Natural Law: Power Through Action.\n   Doctrine of Universal Compensation and the Ethical Significance of Human Labor.\n   The Inescapable Consequences of Vice and Crime in Natural Law.\n   The Certainty of Law in Rewarding Virtuous Actions and The Transformative Power of Goodness\n   Poetic Illustration of Strength and Power's Reliance on Virtue,\n   Virtue Benefits from Weakness and Defect.\n   The Utility of Defects and the Harm of Pride.\n   Value of Personal Faults and Adversity in Understanding Truth and Talent.\n   Consequences of Social Unfitness and Development of Self-Help Skills.\n   Strength Derived from Adversity and Hidden Indignation.\n   The Transformative Power of Adversity for Personal Growth.\n   The Wise Man's Strategy of Self-Examination Through Adversity.\n   Healing and Resilience Against Adversaries' Triumph.\n   Perceived Safety in Criticism Over Praise.\n   General Principle: Resilience as a Source of Strength.\n   Analogy of Strength Gained from Resisting Temptation.\n   Guards Protecting Against Disaster and Moral Corruption.\n   Critique of Physical Security Measures and Trade Shrewdness as Indicators of Wisdom.\n   The Illusion of External Deception and Self-Inflicted Loss.\n   Existence of an Implicit Third Party Ensuring Contract Fulfillment.\n   Service to an Ungrateful Master and Divine Compensation.\n   Historical Ineffectiveness of Persecution Regardless of Perpetrators.\n   Definition and Characteristics of a Mob.\n   Persecution of Principles and Rights by Violent Actions.\n   Comparison to Futile Efforts of Boys with Fire-Engines.\n   Indomitable Spirit and the Honoring of Martyrs.\n   Recognition of Truth and Justification of Martyrs in Moments of Clarity.\n   Doctrine of Indifferency and the Duality of Circumstances.\n   Distinction Between Doctrine of Compensation and Indifferency.\n   The Inherent Nature and Affirmative Essence of the Soul.\n   Perceived Lack of Retribution for Evil Acts and Its Ultimate Consequences.\n   No Penalty for Virtue and Wisdom; They Are Additions to Being.\n   The Transformative Impact of Virtuous Actions.\n   Limitlessness of Love, Knowledge, and Beauty in Their Purest Forms.\n   Affirmation of Optimism and Rejection of Limits by the Soul.\n   Life as Continuous Progress and Instinct of Trust.\n   Instinctual Valuation of Human Virtue and Bravery.\n   \"No Taxation on the Good of Virtue: Divine and Absolute Existence\".\n   Taxation and Ephemeral Nature of Undeserved Material Goods.\n   Entitlement to Natural Wealth Through Legitimate Effort.\n   Desire to Avoid Unearned Good Due to Associated Burdens.\n   Renunciation of External Goods Due to Associated Burdens.\n   No Taxation on Knowledge and Inner Peace.\n   Limitation of Personal Harm and Wisdom of St. Bernard\n   Compensation for Inequalities in the Nature of the Soul.\n   Distinction of More and Less as a Source of Pain and Indignation ?\n   Perception of Inequities and the Role of Love in Reducing Social Disparities.\n   Unity and Brotherhood: The Dissolution of Bitterness.\n   The Transformative Power of Love and Recognition of Brotherhood.\n   The Soul's Power of Appropriation and Incorporation.\n   Calamity as a Catalyst for Growth and Evolution.\n   Intrinsic Necessity of Personal Growth and Transformation\n   Frequency of Personal Revolutions in Relation to Individual Vigor.\n   Transformation and Personal Growth Over Time.\n   The Continuous Renewal and Evolution of Man's Outer Life.\n   Resistance to Divine Growth and Its Consequences.\n   Resistance to Letting Go of the Past and Embracing the Present.\n   Recognition of Calamity's Compensations Over Time.\n   Immediate Perception of Loss as Irretrievable and Uncompensated.\n   Long-Term Healing Potential of Adverse Events.\n   Transformative Impact of Loss on Personal Development and Life Changes.\n   Impact of Loss on Personal Development and Social Growth.\n   Selected Criticism on \"Compensation\"\n     Citation: Pommer, Henry F. \"The Contents and Basis of Emerson's Belief in Compensation.\".\n     Citation: Panek, Le Roy Lad. \"Imagery and Emerson's 'Compensation.'\" 18 (4 Quarter 1962): 218-221.\n     Citation of Lee, Roland F., \"Emerson's 'Compensation' as Argument and Art\" in New England Quarterly 37 (Sept 1964): 291-305.\n     Reference Citation for Dale Riepe's \"Emerson and Indian Philosophy\" Article.",
            "path": "000",
            "parentPath": "",
            "parentName": "",
            "document": "12345678-1234-1234-1234-123456789012",
            "order": 0,
            "type": "container"
        }
    },

First, Emerson’s Essay on Compensation was written a hundred fifty years ago. This is old-school English with some archaic concepts mixed in with commentary on the happenings of the day and some eternal principals here and there.

I know from my knowledge of the essay that this system has done a great job taking the information from the essay and outlining it in a way devoid of human feeling. THAT is pretty exciting: The model doesn’t add anything of it’s own. This is just what Emerson was saying, broken down to each mildly important idea contained in the level of the sentence.

Additional Analysis

You’ll recall that I was initially confusing the Central Idea for the Simantiks Atomic Idea.

We cleared that up when I took the knoweldge.json into the ChatGPT UI to perform analysis.

We reloaded the Essay, then asked for a comparison of ideas. From this comparison I asked the model to identify themes present in all of the Atomic Ideas identified by the Model. ChatGPT 4o identified the following eight themes, along with the Central Idea without needing to be asked:

Based on the detailed content and structure provided in the knowledge JSON, the “Atomic Idea” or central theme of Emerson’s essay “Compensation” revolves around the concept of universal balance and justice. Here are the key elements supporting this central idea:

  1. Cosmic Significance of Compensation: Emerson introduces compensation as a fundamental principle embedded in the fabric of the universe, reflecting a cosmic order where every action has an equivalent reaction or consequence.
  2. Natural Dualism and Polarity: The essay emphasizes the dual nature of reality, where every positive aspect is counterbalanced by a negative one, and vice versa. This dualism extends to all facets of life and nature.
  3. Human Experience and Moral Law: Emerson argues that compensation is not just a natural phenomenon but a moral and ethical principle that governs human actions and their outcomes. Virtue and vice, success and failure, are all subject to this law of balance.
  4. Inherent Virtue and Lack of Penalty: The idea that true virtue and wisdom are their own rewards and do not incur penalties, highlighting the intrinsic value of moral actions.
  5. Calamity and Growth: Emerson discusses how adversity and calamity serve as catalysts for personal and social growth, suggesting that challenges are integral to the process of compensation and development.
  6. Instinctual Trust and Progress: The essay posits that life is a continuous progression guided by an instinctual trust in the inherent balance and justice of the universe.
  7. Solidarity and Mutual Benefit: The interconnectedness of all individuals and the mutual benefits derived from compassion, love, and unity further underscore the theme of universal compensation.
  8. Peace and Understanding in Knowledge of Compensation: Emerson concludes by emphasizing the peace and understanding that comes from recognizing and accepting the principle of compensation.

To better visualize the concepts between these eight ideas and the Atomic Ideas, I asked the model to draw a knoweldge graph of the information.

THIS is an incredible piece of abstract philosophical analysis with an added bonus of being devoid of human bias. Clearly, the Central Idea is strongly implied by the sum of it’s Atomic Parts.

The Central Idea

“The Central Idea,” the “immovable mover,” the “archetypal idea,” is the idea that holds all of the other ideas of a document together, composed of many smaller Atomic Ideas that are all interrelated. It is the center of the piece’s gravity. (My mistake was in thinking Simantiks’ Atomic Idea == the Central Idea.)

Understanding the Central Idea is critical to understanding any document, person, place or thing.

To find the Central Idea, you must be able to identify and interrelate the Atomic Ideas. “If you recognize the children you can find the mother.” — Lao Tzu

This system successfully identifies Atomic Ideas for further analysis, and thus will play a critical role in advanced reasoning as demonstrated here. With the knowledge.json as it is, imagine all of the things you could do with that unclouded information.

A knowledge graph is just the tip of the iceberg.

3 Likes