How to Find out "Ambiguous Content" across articles

When working with a massive collection of articles, contradictions and ambiguities are inevitable. One common challenge is identifying conflicting information across different sources.

Example of Contradiction:

Suppose Article A1 states that the price of Watch X is $100, while Article A56 mentions the price as $120. This is a clear contradiction that needs to be detected and resolved.

Challenges in Detecting Contradictions:

  1. Conflicting Data Points – Identifying inconsistencies between articles, such as different pricing for the same product.
  2. Context and Meta Information – Articles may discuss different aspects, features, or use cases of a product. If Article A1 talks about Watch X, and Article A56 refers to Watch Y, then these are different watches, and the pricing difference is valid rather than contradictory.

Key Considerations for Resolution:

  • Entity Matching: Ensure that both articles refer to the same product, model, or version.
  • Time Factor: Prices may change over time, so timestamps should be considered.

What are your thoughts? How would you approach detecting and handling such contradictions?

1 Like

If this is happening you are storing your data incorrectly or relying on resources that perhaps shouldn’t be recalled by the LLM.

I would have a price look-up function that was clearly defined and make sure than no other documents published a price.

Can you be more specific with your questions?

What environment are you operating in?
What is the system you are using to re-call these multiple-sources and compare them?
Are you talking about feeding in multiple data points to the LLM and asking it do this?
Or are you talking about raw processing this data internally within kind of system?

What kind of “handling” of the contradictions are you expecting to do?

Note, the example you gave is NOT necessarily a contradiction. Items are sold for different prices at the same time by different distributors.

You need to be more clear about what are you doing, what your goals are, what system are you using, are you programming an inhouse system or using a web app for chat gpt, etc…

Personally, I would identify the scope of work and area of interaction first, then would come up with a data model for the entity that acts as “fact holder” depending on the domain, then use llms to populate fields in the new entities and simply use db queries to detect the potential conflicts in the table…

1 Like

HI @merefield ,

the watch price was just an example, there could any type of content written, like:
A1 mentions the company was founded in 2004 , while A50 mentions it was in 2005, so here its out of control of how articles are written by different writers,

We need to extract key information from articles and use an effective approach to find out ambiguous content across these, helping writers improve their quality and trust factor

@sergeliatko ,

for my usecase, I can’t have pre-defined checks for Pricing alone, articles can be talking about different topics / multiple topics inside them,

eg: In our community.openai.com Site, People write many articles on various topics, so there is high change ambiguous content could be provided,

Persone 1 (p1) says openai 4o is free and openai was founded in 2004 while p2 says its 20$ (this is contradictong)

2nd case:
p34 tells openai was founded in 2005 (seems ambiguous, P1 tells its 2004), so here we don’t need to validate if information is right or wrong, we just to need to find such conflictions/ ambiguous information across Articles an show that,

what approach can we take for this, LLM(openai gpt4o) or NLP Libraries Or any other !!

@20euai044 as I said, I would start by bringing all those “statements” to a common denominator to avoid apple/spacecraft comparisons (hallucinations door wide open), a sort of an entity description of what we are actually comparing:

{
  "name": "statement_decomposition_response",
  "strict": true,
  "schema": {
    "type": "object",
    "required": [
      "statements"
    ],
    "properties": {
      "statements": {
        "type": "array",
        "items": {
          "type": "object",
          "required": [
            "raw_text",
            "core_claim",
            "subject",
            "predicate",
            "conditions_qualifiers",
            "quantifiers_modifiers",
            "implied_assumptions",
            "comparison_base_benchmark",
            "causal_inference",
            "uncertainty_speculation_indicators",
            "negation_refutation",
            "temporal_context",
            "geographic_context",
            "audience_target_group",
            "degree_of_specificity_vagueness",
            "source_origin",
            "timestamp_context_reference"
          ],
          "properties": {
            "subject": {
              "type": "string",
              "description": "Primary entity or topic of the claim (noun phrase)."
            },
            "raw_text": {
              "type": "string",
              "description": "The original, unaltered statement text as extracted from the source."
            },
            "predicate": {
              "type": "string",
              "description": "Action or attribute asserted about the subject (verb or descriptive phrase)."
            },
            "core_claim": {
              "type": "string",
              "description": "Extracted core factual assertion from the raw text, representing the primary claim to be analyzed."
            },
            "source_origin": {
              "type": "string",
              "description": "Origin or source of the statement, such as a speaker, publication, or URL."
            },
            "causal_inference": {
              "type": "string",
              "description": "Text indicating any implied cause-and-effect relationship (e.g., 'because of the policy change')."
            },
            "temporal_context": {
              "type": "array",
              "items": {
                "type": "string"
              },
              "description": "Array of temporal references for the claim in ISO 8601 DateTime format or as free-text (e.g., 'in 2023')."
            },
            "geographic_context": {
              "type": "string",
              "description": "Geographical location or region relevant to the claim (e.g., 'in France')."
            },
            "implied_assumptions": {
              "type": "array",
              "items": {
                "type": "string"
              },
              "description": "Array of unstated premises or background beliefs underlying the claim."
            },
            "negation_refutation": {
              "type": "array",
              "items": {
                "type": "string"
              },
              "description": "Array of textual indications of negation or refutation (e.g., 'not', 'never')."
            },
            "audience_target_group": {
              "type": "string",
              "description": "Intended audience or target group referenced by the claim (e.g., 'students', 'adults')."
            },
            "conditions_qualifiers": {
              "type": "array",
              "items": {
                "type": "string"
              },
              "description": "Array of clause(s) that specify conditions or qualifiers limiting the claim's scope (e.g., 'if trends continue')."
            },
            "quantifiers_modifiers": {
              "type": "array",
              "items": {
                "type": "string"
              },
              "description": "Array of modifier words or phrases that adjust the claim's scope or intensity (e.g., 'most', 'nearly all')."
            },
            "comparison_base_benchmark": {
              "type": "string",
              "description": "Reference point or benchmark for any comparative elements within the claim (e.g., 'compared to last year')."
            },
            "timestamp_context_reference": {
              "type": "array",
              "items": {
                "type": "string"
              },
              "description": "Array of temporal context references or timestamps indicating when/where the statement was made (in ISO 8601 format or as descriptive text)."
            },
            "degree_of_specificity_vagueness": {
              "type": "string",
              "description": "Assessment of the claim's specificity or vagueness (e.g., 'specific', 'vague')."
            },
            "uncertainty_speculation_indicators": {
              "type": "array",
              "items": {
                "type": "string"
              },
              "description": "Array of keywords or phrases that indicate uncertainty, speculation, or predictive language (e.g., 'might', 'could')."
            }
          },
          "additionalProperties": false
        }
      }
    },
    "additionalProperties": false
  }
}

Something like above.

Then I would build an object that would go into my database, (weaviate for me personally) and make sure I properly configure what fields are actually embedded and indexed to be able to query my “statements” by both vector similarity (on all or a subset of fields) and simple db queries (on indexed fields in weaviate) to start working with it in a more approachable manner to figure out the “fact-checking” workflow suitable for my application.

1 Like

example of a products description passed through the LLM claim extraction (asked to focus on features):

{
  "statements": [
    {
      "subject": "AI Comment Moderator",
      "raw_text": "The AI Comment Moderator is a state-of-the-art WordPress plugin designed to streamline the management of user comments through advanced AI technology.",
      "predicate": "is designed to streamline the management of user comments",
      "core_claim": "AI Comment Moderator is a state-of-the-art WordPress plugin that automates comment moderation.",
      "source_origin": "TechSpokes Inc.",
      "causal_inference": "",
      "temporal_context": [],
      "geographic_context": "",
      "implied_assumptions": [],
      "negation_refutation": [],
      "audience_target_group": "WordPress website owners",
      "conditions_qualifiers": [],
      "quantifiers_modifiers": [
        "state-of-the-art"
      ],
      "comparison_base_benchmark": "",
      "timestamp_context_reference": [],
      "degree_of_specificity_vagueness": "specific",
      "uncertainty_speculation_indicators": []
    },
    {
      "subject": "AI Comment Moderator plugin",
      "raw_text": "The AI Comment Moderator plugin automates the common comment moderation tasks within your WordPress website that would otherwise be done by humans.",
      "predicate": "automates common comment moderation tasks",
      "core_claim": "AI Comment Moderator automates comment moderation tasks.",
      "source_origin": "TechSpokes Inc.",
      "causal_inference": "",
      "temporal_context": [],
      "geographic_context": "",
      "implied_assumptions": [
        "moderation tasks are typically done by humans"
      ],
      "negation_refutation": [],
      "audience_target_group": "WordPress website owners",
      "conditions_qualifiers": [],
      "quantifiers_modifiers": [
        "common"
      ],
      "comparison_base_benchmark": "",
      "timestamp_context_reference": [],
      "degree_of_specificity_vagueness": "specific",
      "uncertainty_speculation_indicators": []
    },
    {
      "subject": "AI Comment Moderator plugin",
      "raw_text": "Considering the base salary of a human moderator in the United States costs a business around 25 USD per hour, the plugin pays for itself if your moderation task takes you more than 1 minute a day.",
      "predicate": "pays for itself if moderation tasks take more than 1 minute a day",
      "core_claim": "The plugin is cost-effective if moderation takes more than 1 minute a day.",
      "source_origin": "TechSpokes Inc.",
      "causal_inference": "because the cost of a human moderator is $25 per hour",
      "temporal_context": [],
      "geographic_context": "United States",
      "implied_assumptions": [
        "moderation tasks take time",
        "human moderators are costly"
      ],
      "negation_refutation": [],
      "audience_target_group": "WordPress website owners",
      "conditions_qualifiers": [
        "if moderation task takes more than 1 minute a day"
      ],
      "quantifiers_modifiers": [
        "more than 1 minute"
      ],
      "comparison_base_benchmark": "base salary of a human moderator",
      "timestamp_context_reference": [],
      "degree_of_specificity_vagueness": "specific",
      "uncertainty_speculation_indicators": []
    },
    {
      "subject": "AI Comment Moderator plugin",
      "raw_text": "When a new comment is submitted to your website, the AI Comment Moderator plugin initially stands by, letting WordPress and other plugins detect spam or approve legitimate comments.",
      "predicate": "initially stands by, letting other plugins detect spam or approve legitimate comments",
      "core_claim": "The plugin operates in conjunction with WordPress and other plugins to manage comments.",
      "source_origin": "TechSpokes Inc.",
      "causal_inference": "",
      "temporal_context": [],
      "geographic_context": "",
      "implied_assumptions": [
        "other plugins are effective at detecting spam"
      ],
      "negation_refutation": [],
      "audience_target_group": "WordPress website owners",
      "conditions_qualifiers": [],
      "quantifiers_modifiers": [
        "new"
      ],
      "comparison_base_benchmark": "",
      "timestamp_context_reference": [],
      "degree_of_specificity_vagueness": "specific",
      "uncertainty_speculation_indicators": []
    },
    {
      "subject": "AI Comment Moderator plugin",
      "raw_text": "If a comment slips through and lands in the moderation queue, that’s when our hero takes center stage.",
      "predicate": "takes center stage if a comment lands in the moderation queue",
      "core_claim": "The plugin actively engages when comments need moderation.",
      "source_origin": "TechSpokes Inc.",
      "causal_inference": "",
      "temporal_context": [],
      "geographic_context": "",
      "implied_assumptions": [
        "comments can slip through initial moderation"
      ],
      "negation_refutation": [],
      "audience_target_group": "WordPress website owners",
      "conditions_qualifiers": [
        "if a comment slips through"
      ],
      "quantifiers_modifiers": [],
      "comparison_base_benchmark": "",
      "timestamp_context_reference": [],
      "degree_of_specificity_vagueness": "specific",
      "uncertainty_speculation_indicators": []
    },
    {
      "subject": "AI Comment Moderator plugin",
      "raw_text": "It checks if the comment includes links to domains that have been flagged as spam in the past.",
      "predicate": "checks if the comment includes links to flagged spam domains",
      "core_claim": "The plugin checks for links to flagged spam domains.",
      "source_origin": "TechSpokes Inc.",
      "causal_inference": "",
      "temporal_context": [],
      "geographic_context": "",
      "implied_assumptions": [
        "some domains are flagged as spam"
      ],
      "negation_refutation": [],
      "audience_target_group": "WordPress website owners",
      "conditions_qualifiers": [],
      "quantifiers_modifiers": [
        "flagged"
      ],
      "comparison_base_benchmark": "",
      "timestamp_context_reference": [],
      "degree_of_specificity_vagueness": "specific",
      "uncertainty_speculation_indicators": []
    },
    {
      "subject": "AI Comment Moderator plugin",
      "raw_text": "The plugin scores the comment based on the value it adds to the discussion.",
      "predicate": "scores the comment based on its value",
      "core_claim": "The plugin evaluates comments based on their value to the discussion.",
      "source_origin": "TechSpokes Inc.",
      "causal_inference": "",
      "temporal_context": [],
      "geographic_context": "",
      "implied_assumptions": [
        "comments can add value to discussions"
      ],
      "negation_refutation": [],
      "audience_target_group": "WordPress website owners",
      "conditions_qualifiers": [],
      "quantifiers_modifiers": [
        "based on the value"
      ],
      "comparison_base_benchmark": "",
      "timestamp_context_reference": [],
      "degree_of_specificity_vagueness": "specific",
      "uncertainty_speculation_indicators": []
    },
    {
      "subject": "AI Comment Moderator plugin",
      "raw_text": "Depending on the score, the plugin applies the rules you’ve set up to determine what to do next—whether to discard, trash, mark the comment as spam, or approve it.",
      "predicate": "applies rules based on the score to determine the next action",
      "core_claim": "The plugin takes action on comments based on their scores.",
      "source_origin": "TechSpokes Inc.",
      "causal_inference": "",
      "temporal_context": [],
      "geographic_context": "",
      "implied_assumptions": [
        "rules can be set up for moderation"
      ],
      "negation_refutation": [],
      "audience_target_group": "WordPress website owners",
      "conditions_qualifiers": [],
      "quantifiers_modifiers": [
        "depending on the score"
      ],
      "comparison_base_benchmark": "",
      "timestamp_context_reference": [],
      "degree_of_specificity_vagueness": "specific",
      "uncertainty_speculation_indicators": []
    },
    {
      "subject": "AI Comment Moderator plugin",
      "raw_text": "Comments that get a medium score, which might need more scrutiny, are returned to the moderation queue for a human to review.",
      "predicate": "returns medium-scored comments to moderation queue for human review",
      "core_claim": "Medium-scored comments are reviewed by humans.",
      "source_origin": "TechSpokes Inc.",
      "causal_inference": "",
      "temporal_context": [],
      "geographic_context": "",
      "implied_assumptions": [
        "some comments require human review"
      ],
      "negation_refutation": [],
      "audience_target_group": "WordPress website owners",
      "conditions_qualifiers": [
        "that get a medium score"
      ],
      "quantifiers_modifiers": [],
      "comparison_base_benchmark": "",
      "timestamp_context_reference": [],
      "degree_of_specificity_vagueness": "specific",
      "uncertainty_speculation_indicators": [
        "might"
      ]
    },
    {
      "subject": "AI Comment Moderator plugin",
      "raw_text": "This plugin makes your moderation queue much smaller and easier to manage, especially with the ‘sort by score’ feature.",
      "predicate": "makes moderation queue smaller and easier to manage",
      "core_claim": "The plugin simplifies the moderation process.",
      "source_origin": "TechSpokes Inc.",
      "causal_inference": "",
      "temporal_context": [],
      "geographic_context": "",
      "implied_assumptions": [
        "moderation queues can be large and difficult to manage"
      ],
      "negation_refutation": [],
      "audience_target_group": "WordPress website owners",
      "conditions_qualifiers": [],
      "quantifiers_modifiers": [
        "much smaller",
        "easier"
      ],
      "comparison_base_benchmark": "",
      "timestamp_context_reference": [],
      "degree_of_specificity_vagueness": "specific",
      "uncertainty_speculation_indicators": []
    },
    {
      "subject": "AI Comment Moderator plugin",
      "raw_text": "This economical solution reduces dependency on costly API calls, especially effective during high spam periods with its ‘fortress’ mode.",
      "predicate": "reduces dependency on costly API calls",
      "core_claim": "The plugin is cost-effective and efficient during spam periods.",
      "source_origin": "TechSpokes Inc.",
      "causal_inference": "especially during high spam periods",
      "temporal_context": [],
      "geographic_context": "",
      "implied_assumptions": [
        "API calls are costly"
      ],
      "negation_refutation": [],
      "audience_target_group": "WordPress website owners",
      "conditions_qualifiers": [
        "especially effective during high spam periods"
      ],
      "quantifiers_modifiers": [
        "economical"
      ],
      "comparison_base_benchmark": "costly API calls",
      "timestamp_context_reference": [],
      "degree_of_specificity_vagueness": "specific",
      "uncertainty_speculation_indicators": []
    },
    {
      "subject": "AI Comment Moderator plugin",
      "raw_text": "Moderate 1,000 comments for just about $1 with the AI Comment Moderator.",
      "predicate": "moderate 1,000 comments for about $1",
      "core_claim": "The plugin allows moderation of 1,000 comments for $1.",
      "source_origin": "TechSpokes Inc.",
      "causal_inference": "",
      "temporal_context": [],
      "geographic_context": "",
      "implied_assumptions": [
        "moderation can be done at low cost"
      ],
      "negation_refutation": [],
      "audience_target_group": "WordPress website owners",
      "conditions_qualifiers": [],
      "quantifiers_modifiers": [
        "just about"
      ],
      "comparison_base_benchmark": "",
      "timestamp_context_reference": [],
      "degree_of_specificity_vagueness": "specific",
      "uncertainty_speculation_indicators": []
    },
    {
      "subject": "AI Comment Moderator plugin plans",
      "raw_text": "Starter plan: $9/month; Smart plan: $79/year; Professional plan: $299/year; Enterprise plan: $599/year.",
      "predicate": "details pricing for various plans",
      "core_claim": "The plugin offers multiple pricing plans.",
      "source_origin": "TechSpokes Inc.",
      "causal_inference": "",
      "temporal_context": [],
      "geographic_context": "",
      "implied_assumptions": [
        "different pricing plans cater to different needs"
      ],
      "negation_refutation": [],
      "audience_target_group": "WordPress website owners",
      "conditions_qualifiers": [],
      "quantifiers_modifiers": [],
      "comparison_base_benchmark": "",
      "timestamp_context_reference": [],
      "degree_of_specificity_vagueness": "specific",
      "uncertainty_speculation_indicators": []
    },
    {
      "subject": "AI Comment Moderator plugin",
      "raw_text": "All plans come with full access to the totality of AI Comment Moderator plugin capabilities.",
      "predicate": "come with full access to capabilities",
      "core_claim": "All plans provide complete access to plugin features.",
      "source_origin": "TechSpokes Inc.",
      "causal_inference": "",
      "temporal_context": [],
      "geographic_context": "",
      "implied_assumptions": [
        "users need access to all capabilities"
      ],
      "negation_refutation": [],
      "audience_target_group": "WordPress website owners",
      "conditions_qualifiers": [],
      "quantifiers_modifiers": [
        "full"
      ],
      "comparison_base_benchmark": "",
      "timestamp_context_reference": [],
      "degree_of_specificity_vagueness": "specific",
      "uncertainty_speculation_indicators": []
    },
    {
      "subject": "Professional and Enterprise plans",
      "raw_text": "Professional and Enterprise plans include premium support, providing you with direct assistance and priority response to maximize the value of your investment.",
      "predicate": "include premium support",
      "core_claim": "Professional and Enterprise plans offer premium support.",
      "source_origin": "TechSpokes Inc.",
      "causal_inference": "",
      "temporal_context": [],
      "geographic_context": "",
      "implied_assumptions": [
        "users may require assistance"
      ],
      "negation_refutation": [],
      "audience_target_group": "WordPress website owners",
      "conditions_qualifiers": [],
      "quantifiers_modifiers": [
        "premium"
      ],
      "comparison_base_benchmark": "",
      "timestamp_context_reference": [],
      "degree_of_specificity_vagueness": "specific",
      "uncertainty_speculation_indicators": []
    },
    {
      "subject": "AI Comment Moderator plugin",
      "raw_text": "With the AI Comment Moderator plugin, streamline your moderation process, improve interaction quality, and protect your site from spam.",
      "predicate": "streamline moderation process, improve interaction quality, and protect from spam",
      "core_claim": "The plugin enhances moderation efficiency and quality.",
      "source_origin": "TechSpokes Inc.",
      "causal_inference": "",
      "temporal_context": [],
      "geographic_context": "",
      "implied_assumptions": [
        "moderation processes can be improved"
      ],
      "negation_refutation": [],
      "audience_target_group": "WordPress website owners",
      "conditions_qualifiers": [],
      "quantifiers_modifiers": [],
      "comparison_base_benchmark": "",
      "timestamp_context_reference": [],
      "degree_of_specificity_vagueness": "specific",
      "uncertainty_speculation_indicators": []
    }
  ]
}

When asked to focus on product plans and pricing:

{
  "statements": [
    {
      "subject": "AI Comment Moderator",
      "raw_text": "Starter $ 9/month ($ 98.10/year/site*)",
      "predicate": "is the pricing plan for individual site owners.",
      "core_claim": "The Starter plan costs $9 per month or $98.10 per year for one site.",
      "source_origin": "AI Comment Moderator Pricing Page",
      "causal_inference": "",
      "temporal_context": [],
      "geographic_context": "",
      "implied_assumptions": [],
      "negation_refutation": [],
      "audience_target_group": "individual site owners",
      "conditions_qualifiers": [],
      "quantifiers_modifiers": [],
      "comparison_base_benchmark": "",
      "timestamp_context_reference": [],
      "degree_of_specificity_vagueness": "specific",
      "uncertainty_speculation_indicators": []
    },
    {
      "subject": "AI Comment Moderator",
      "raw_text": "Smart $ 79/year ($ 6.58/month/site*)",
      "predicate": "is the pricing plan for committed bloggers or small businesses.",
      "core_claim": "The Smart plan costs $79 per year or $6.58 per month for one site.",
      "source_origin": "AI Comment Moderator Pricing Page",
      "causal_inference": "",
      "temporal_context": [],
      "geographic_context": "",
      "implied_assumptions": [],
      "negation_refutation": [],
      "audience_target_group": "committed bloggers or small businesses",
      "conditions_qualifiers": [],
      "quantifiers_modifiers": [],
      "comparison_base_benchmark": "",
      "timestamp_context_reference": [],
      "degree_of_specificity_vagueness": "specific",
      "uncertainty_speculation_indicators": []
    },
    {
      "subject": "AI Comment Moderator",
      "raw_text": "Professional $ 299/year ($ 4.98/month/site*)",
      "predicate": "is the pricing plan for small businesses managing multiple sites.",
      "core_claim": "The Professional plan costs $299 per year or $4.98 per month for up to 5 sites.",
      "source_origin": "AI Comment Moderator Pricing Page",
      "causal_inference": "",
      "temporal_context": [],
      "geographic_context": "",
      "implied_assumptions": [],
      "negation_refutation": [],
      "audience_target_group": "small businesses managing multiple sites",
      "conditions_qualifiers": [],
      "quantifiers_modifiers": [],
      "comparison_base_benchmark": "",
      "timestamp_context_reference": [],
      "degree_of_specificity_vagueness": "specific",
      "uncertainty_speculation_indicators": []
    },
    {
      "subject": "AI Comment Moderator",
      "raw_text": "Enterprise $ 599/year ($ 2.50/month/site*)",
      "predicate": "is the pricing plan for large organizations or agencies.",
      "core_claim": "The Enterprise plan costs $599 per year or $2.50 per month for up to 20 sites.",
      "source_origin": "AI Comment Moderator Pricing Page",
      "causal_inference": "",
      "temporal_context": [],
      "geographic_context": "",
      "implied_assumptions": [],
      "negation_refutation": [],
      "audience_target_group": "large organizations or agencies",
      "conditions_qualifiers": [],
      "quantifiers_modifiers": [],
      "comparison_base_benchmark": "",
      "timestamp_context_reference": [],
      "degree_of_specificity_vagueness": "specific",
      "uncertainty_speculation_indicators": []
    },
    {
      "subject": "AI Comment Moderator",
      "raw_text": "Every plan is subject to a 10% discount on subscription renewals to recompense your loyalty.",
      "predicate": "indicates the discount policy for renewals.",
      "core_claim": "All plans offer a 10% discount on subscription renewals.",
      "source_origin": "AI Comment Moderator Pricing Page",
      "causal_inference": "",
      "temporal_context": [],
      "geographic_context": "",
      "implied_assumptions": [],
      "negation_refutation": [],
      "audience_target_group": "existing subscribers",
      "conditions_qualifiers": [],
      "quantifiers_modifiers": [],
      "comparison_base_benchmark": "",
      "timestamp_context_reference": [],
      "degree_of_specificity_vagueness": "specific",
      "uncertainty_speculation_indicators": []
    },
    {
      "subject": "AI Comment Moderator",
      "raw_text": "Professional and Enterprise plans include premium support, providing you with direct assistance and priority response.",
      "predicate": "describes the additional support features of certain plans.",
      "core_claim": "The Professional and Enterprise plans include premium support.",
      "source_origin": "AI Comment Moderator Pricing Page",
      "causal_inference": "",
      "temporal_context": [],
      "geographic_context": "",
      "implied_assumptions": [],
      "negation_refutation": [],
      "audience_target_group": "subscribers to Professional and Enterprise plans",
      "conditions_qualifiers": [],
      "quantifiers_modifiers": [],
      "comparison_base_benchmark": "",
      "timestamp_context_reference": [],
      "degree_of_specificity_vagueness": "specific",
      "uncertainty_speculation_indicators": []
    }
  ]
}

Some adjustments may be needed, but not bad to start for 4o-mini

source text: https://www.techspokes.store/products/ai-comment-moderator/

1 Like

Thanks @sergeliatko, I’m currently working on this approach.

What are your thoughts on search? How can we identify the most violating content across articles after extracting key information?

My initial thought is that chunking might not be ideal, as it could be too granular and make detection challenging.

Would love to hear your insights on an effective search solution after extracting key information from each article.

I would use a schema above as a starter, adapt to your application, add source fields, then convert to an object schema in weaviate. Instead of comparing the chunks from text, I would compare the “statements” with what was “proved truth” and run results through an LLM filter to score the “weirdness” of new statement. Then I would trigger notices based on what thresholds are way too much…

Thanks @sergeliatko , that’s cool stuff, Will check on this !!