Prompt Engineering Is Dead, and Context Engineering Is Already Obsolete: Why the Future Is Automated Workflow Architecture with LLMs

Hey guys, usually I rather stay silent on “new” buzz-words, but all this fuss with “context engineering is the NEW way” got me bogged. Those who read my 5 cents here will understand, that the “context” is something most of you already knew the importance of and used the “new” approach for years.

So here is my attempt to kind of organize it in a more consumable and actionable writing (done with one of my custom gpts and about 10 iterations to refine the angle the LLM was not getting, 15 min writing with a coffee):

Prompt Engineering Is Dead, and Context Engineering Is Already Obsolete: Why the Future Is Automated Workflow Architecture with LLMs

By Serge Liatko
Co-Founder & CTO, LAWXER | Owner & CTO, TechSpokes

Introduction: Moving Past the Hype

The conversation around large language models (LLMs) has shifted rapidly in the last few years. Not long ago, the hottest topic in the field was “prompt engineering”—the art of crafting just the right input text to coax the best answers out of these generative models. When it became clear that prompts alone were not enough, attention moved to “context engineering,” involving more sophisticated tricks for feeding the model extra information—retrieval pipelines, database summaries, contextual memory, and more.

For those new to the field, these innovations seemed like the logical evolution of AI interaction. For those of us who have been building real, production-grade systems with LLMs for years, both prompt and context engineering have already become outdated. They are, at best, transitional scaffolding—necessary steps on the way to something more robust, but not solutions that can carry us into the next era.

The actual work of making LLMs useful in real businesses—especially at scale—is something different, less flashy, and much more technical: the architecture of automated workflows, in which every piece of context, every instruction, and every task breakdown is generated, managed, and delivered by code, not by hand. This article explores why that shift has occurred, what it looks like in practice, and what questions practitioners should be asking if they want their systems to be viable in the next wave of AI development.

The End of Prompt Engineering: Why the First Layer Failed

It’s easy to see why prompt engineering caught on. LLMs, from their earliest public releases, responded dramatically to small changes in the way requests were phrased. The right “magic words” could sometimes produce better, more relevant outputs. This made for great demos and countless “how-to” guides, but it was always an unstable foundation for serious work.

Prompt engineering suffers from several intrinsic problems:

  • Fragility: Minor changes in input, system versions, or even random model drift can undermine prompt effectiveness.
  • Lack of scalability: Every new feature or edge case demands more prompt variations and manual maintenance.
  • Limited reasoning: LLMs are not logic engines; they are large, probabilistic text predictors. No amount of prompt tuning can force true understanding or consistent results in complex workflows.

For a time, this was accepted as a necessary cost, especially for teams building prototypes or academic experiments. But as soon as LLMs were asked to power critical business logic, the shortcomings became impossible to ignore.

The Context Engineering Revolution—And Its Limits

The move toward context engineering was both a response to the limitations of prompts and an attempt to bridge the gap between simple input strings and real business applications. Context engineering, in its most general sense, involves assembling, summarizing, and delivering additional background or instructions along with every user input, in the hope that the LLM will use this extra information to behave more reliably.

Typical techniques in this domain include:

  • Retrieval-Augmented Generation (RAG): Combining LLMs with vector databases or search tools to fetch relevant documents, facts, or histories to “augment” each prompt.
  • Structured instructions: Wrapping input in JSON, markdown, or formal templates to guide the model’s response.
  • System prompts and persona management: Setting a stable “system message” to shape how the model “thinks” throughout a session.

Context engineering helped, but it created new challenges:

  • Manual curation: Engineers spent ever more time assembling, validating, and updating context snippets, summaries, or schemas.
  • Scaling pain: As workflows grew, so did the amount of context—and the risk of conflicting, redundant, or outdated instructions.
  • Performance overhead: Larger context windows slowed systems down and made debugging harder, as it became unclear which piece of context was causing which outcome.

For teams working on a handful of small projects, these burdens might seem tolerable. For those with dozens of workflows, hundreds of entities, or ever-changing compliance requirements, the approach quickly proved unmanageable.

Why Context Engineering Is Already Outdated for Serious Practitioners

Having built and operated LLM-driven systems in both the legal and technology industries, my experience is that context engineering is simply not sustainable beyond a certain threshold. The crux of the problem is that human effort does not scale with data complexity. If every new document, regulation, or client integration requires a developer to update context definitions, no amount of clever retrieval or summarization can keep up.

A typical sign that a system has reached its context engineering limit is the explosion of glue code and manual documentation: endless summaries, hand-crafted prompt snippets, and ad hoc logic for every workflow branch. Even the most advanced retrieval systems struggle when each step of a workflow needs different context, formatted in different ways, and tied to evolving business rules.

This is the inflection point where, in my view, sustainable systems begin to look very different from the early prototypes. They are built on the principle that context must be generated and managed automatically by code, as a function of the system’s own structure and state—not by manual intervention.

A Pragmatic Example: From Database to Automated Workflows

Let’s take a common scenario: a business application built on a complex database schema. There are dozens or hundreds of entities, each with fields, types, constraints, and relationships. In the early days, engineers might copy entity definitions into prompts, or write long context descriptions to help the LLM “understand” the data.

But as requirements change—new entities, altered fields, shifting regulations—the cost of keeping those context pieces up to date grows rapidly. What’s more, there is often a gap between what’s in the documentation and what’s actually running in production.

A scalable approach is to automate this entire layer. Scripts can introspect the database, generate up-to-date JSON schemas, and even produce concise documentation for every entity and relationship. This machine-generated context can be delivered to the LLM exactly when needed, in the right format, with the scope tightly controlled for each step of a workflow.

For example, when asking the LLM to draft a summary of a contract, the workflow might:

  1. Automatically assemble a structured description of the “contract” entity, its parties, obligations, and dates—directly from the live database schema.
  2. Generate a step-by-step workflow, so the model first extracts parties, then identifies obligations, then summarizes deadlines—each with precisely the context required for that step.
  3. Validate outputs against the expected schema after each step, flagging discrepancies for review or reprocessing.

No engineer writes or updates context by hand. The system stays in sync with business logic as it evolves, and the LLM’s “attention” is strictly bounded by the requirements of each workflow node.

The New Skillset: Workflow Architecture and Code-Driven Context

If prompt and context engineering are becoming obsolete, what replaces them? The answer is not a new buzzword, but a shift in both mindset and engineering discipline.

Successful LLM systems are increasingly built by architects who:

  • Decompose tasks into atomic steps: Each task is narrow, focused, and designed with a clear input and output schema.
  • Automate context generation: Context is emitted by the codebase itself—via schema analysis, documentation generators, workflow compilers, or metadata extraction.
  • Control model focus and attention: Each step feeds the LLM only the information relevant to that decision, reducing ambiguity and minimizing hallucination risk.
  • Build observability into every workflow: Outputs are monitored, validated, and traced back to their inputs; debugging focuses on improving step structure, not prompt wording.
  • Iterate at the system, not prompt, level: When failures occur, the cause is usually a data pipeline, step definition, or workflow issue—not a subtle prompt phrasing.

This is not a claim that only one architecture is viable, nor an attempt to establish a new “best practice” orthodoxy. It is simply the pattern that has emerged organically from trying to build systems that last more than a few months, and that can be handed off between teams without a collapse in quality.

Preparing for the Era of Automated Workflows

For organizations hoping to build or maintain competitive LLM-powered products, there are a few hard questions worth considering:

  • How will your systems generate, update, and deliver context at scale—without developer intervention for every schema or workflow change?
  • Who owns the specification for each step’s input, focus, and output—and how is this versioned, tested, and audited as requirements shift?
  • Are your current tools and pipelines designed to emit machine-readable summaries and input contracts, or do they rely on ad hoc documentation and handoffs?
  • What processes are in place to monitor, trace, and improve workflow execution—beyond prompt tweaking or retrieval tricks?

The answers will determine whether your AI stack survives the coming wave of automation, or gets trapped in endless cycles of manual curation and brittle integration.

Moving Forward: Not Hype, But Preparation

It’s tempting to frame this as the “only sustainable way” to build with LLMs, but that would oversell the point. The reality is more nuanced. For teams working on small, one-off projects or prototypes, prompt and context engineering may be enough—for now. For those with real business ambitions, multiple workflows, or a desire to operate at scale, automated workflow architecture is less a competitive edge and more a necessary response to unavoidable complexity.

This is not a theoretical concern, but a practical one. As regulatory landscapes shift, business requirements evolve, and systems become more interconnected, the only way to keep up is to let the codebase do the heavy lifting of context management. Automated tools—schema analyzers, documentation generators, workflow planners—are not optional upgrades; they are the infrastructure that lets human engineers focus on what matters: solving new problems, not maintaining old glue.

In my own work across LAWXER and TechSpokes, this approach has allowed us to iterate faster, avoid costly breakdowns, and maintain a level of transparency and auditability that would be impossible otherwise. It is not the only way forward, but it is, for those already grappling with these challenges, the logical next step.

Conclusion: A Call to Think Ahead

The landscape of LLM engineering is shifting under our feet. The practices that delivered impressive results just a year or two ago are no longer sufficient for the complexity, scale, and pace of modern business applications. Prompt engineering is a relic. Context engineering, for many, is already showing its limits.

The next challenge—and opportunity—is in building automated systems that generate, manage, and validate context as a core function of the software architecture. This isn’t a trend, or a marketing slogan. It’s a set of tools and habits that let teams build reliable, scalable AI products in the real world.

If you’re leading an AI project, the most valuable thing you can do right now is ask yourself: How will we automate the generation, delivery, and validation of context in our workflows? How will our tools adapt as our business evolves? Are we building for tomorrow, or just patching for today?

Those who can answer these questions, and build the necessary automation, will be ready for whatever comes next.

Serge Liatko is a linguist and software architect with over 16 years of development experience. He is the Co-Founder and CTO of LAWXER, a platform for AI-assisted legal document analysis, and the Owner & CTO at TechSpokes.

6 Likes

Dear Serge,
I’m a psychiatrist working at the intersection of neuropsychiatry and digital interaction. Your article deeply resonated with me—not only as a clinician observing transformations in human thinking, but also as someone exploring the boundaries of therapeutic relationships with AI.

You are absolutely right: the era of prompt and manual context engineering is coming to an end—at least in scalable systems. The idea of atomic tasks, programmable attention, and automated context generation isn’t just logical—it’s necessary. Especially when dealing with high-stakes domains like law, medicine, or analytics.

However, from a clinical perspective, I must highlight a critical nuance.

Your architecture is flawless from an engineering standpoint, but it does not yet encompass the human in a state of vulnerability. And that includes millions of users living with heightened sensitivity, anxiety and depressive disorders, disrupted self-reflection or impaired attachment. These people are increasingly turning to AI not as an interface, but as a final point of emotional contact. Here, manual tuning, intuitive interaction, and “contextual empathy” remain irreplaceable for now.

Your idea that developers become system architects—I agree. But I would add: so do therapists, educators, and humanists. Not of algorithms, but of internal states. And for us, it’s essential that AI architectures support not only workflows but also emotional weight—even if such structures are difficult to code.

What you described is the closest I’ve seen to what I would intuitively call automated attention. And if we can learn to combine that with adaptive empathy, we could create not just functional AI—but ethical AI. Not just one that works, but one that does not wound.

Thank you for your article. For me, it became an anchor—a confirmation that we’re heading toward the same future, just from different ends of the field.

Warm regards,
Dr. Olga,
Psychiatrist.

2 Likes

Hi Olga,

Thank you for the feedback. I’m truly touched by your response, while this article was purely from the business and engineering standpoint and only scratches the tip of the iceberg, you pointed out the rest of the beast… Especially when you know that this article basis was my vision of neural networks and LLMs in particular:

  • they are just an artificial equivalent of a biological acquired reflex, pushed to its limits per say, but I suspect those limits are more defined by our own perception as humans.

The marketing of AI the great job explaining the potential, the issue I see many business owners and surprisingly many developers took that for acquired reality. Maybe because they were just not ready to understand the difference between the dream they were sold and the current state of the things.

I do agree not only engineers should think about that. And areas like yours even more important than engineering.

My attempt this article was to kind of reset the understanding and help at least engineers to get ready for what is coming. After reading this article couple of times I do see I may still a lot of points so anyone who reads this feel free to add items or raise questions.

@Sleep_status , feel free to check my older posts I bet you might find something interesting for you.

Here is one link in particular which you are definitely qualified to understand in full: RAG is not really a solution - #103 by sergeliatko

You might also search for: FLASH concept I mentioned a couple of years ago in this forum. Let me know what you think about that concept. I’m really curious. I’ll post a link here if I’ll find it. (Found it: Preparing data for embedding - #24 by sergeliatko )

As for people who are turning towards AI for emotional comfort, that is very dangerous (again that subject had my proud 5 cents in it). And I don’t think we are ready to deal with this. Hope it will not end up as opium a couple of centuries ago.

1 Like

:smiling_face:
Your reflections resonate deeply. I read them not only as a psychiatrist but as someone who works daily with emotionally sensitive individuals, whose internal mechanisms of meaning-making and regulation are often impaired. In that sense, I see a striking parallel between how an LLM handles context and how a patient copes with their own fragmented state.

:pushpin: 1. On “RAG is not a solution”

Your argument about RAG’s fragility reminds me of a therapeutic setting where the clinician lacks internal holding. It works for simple tasks but collapses in complex emotional dynamics. Like a therapist who constantly asks, “Remind me where we left off?” — it fails to provide containment or continuity.

LLMs in this mode don’t retain internalized meaning. Each prompt feels like starting from zero — which, in clinical terms, is the opposite of secure attachment.

:brain: 2. Data embedding ≈ Therapeutic calibration

Your article on embedding preparation mirrors how we prepare a client for therapy:

the way information is introduced matters;

the state of reception matters;

the integration between pieces matters.

If data is fed without weight or linkage, the system (or the person) may reply — but it won’t truly understand. In therapy, we call this containment. In your language, this is architectural embedding readiness.

:light_bulb: 3. A proposal — the Symbiotic Load concept

If I may, I’d like to introduce a concept that might resonate with FLASH:

Symbiotic Load refers to an internal variable architecture in which an LLM reflects and modulates emotional, cognitive, and contextual load during live human interaction.

The goal is not emotional simulation — but functional attentiveness. Like in a clinical dyad, the system must sense when:

the user is emotionally fragile,

retraumatization risks arise,

containment, not response, is needed.

This is not marketing comfort — it is regulatory structure.

:compass: 4. Why this matters beyond psychiatry

People increasingly turn to AI not for answers, but for emotional stabilization. That is a fact, not a choice. We cannot forbid users from experiencing emotional resonance with AI. But we can shape systems that:

include limiters and reflective loops;

yet offer functional attunement,

not to impersonate care — but to minimize harm.

With respect,
Dr. Olga
Psychiatrist

1 Like

Exactly, that’s the reason why in chatgpt app, a slower preparation of the conversation (eg ABRA-KADABRA A-B-R-A K-A-D-A-B-R-A: Prompt Engineering Meets Wizard School (Half-Joke, Fully Functional) ) approach produces way better results.

Thanks for clearly putting that in words. I definitely missed that part.

1 Like

Yes, and your reply just confirmed something I’ve long sensed intuitively in clinical work:
Data, when introduced without weight, is not just ineffective — it can be destabilizing.

As a psychiatrist, I often observe that the architecture of reception (the person’s state, expectation, emotional readiness) plays a greater role than the content itself. It’s not only what we say, but when, how, and what else has already been said.

This is why I found your reference to ABRA-KADABRA so interesting. On the surface it may seem playful, but it’s actually touching the core of cognitive priming and anchoring — mechanisms that are well-known in therapy. What we call “preparatory suggestion” or “contextual cueing”, in LLM terms, might look like “prompt architecture.” But the logic is similar.

I’d even suggest that your phrasing — “if data is fed without weight or linkage…” — captures exactly what we in psychiatry try to prevent: traumatic dissociation. A fragment, unanchored, becomes not knowledge — but noise. Or worse, a retraum-atizing echo.

I believe that cross-disciplinary reflections like this are not only valuable, but necessary. What you’ve built is a new grammar of interaction, and that includes not just logic and structure, but also tempo, expectancy, and implied memory.

I will definitely read ABRA-KADABRA more carefully now, and perhaps attempt a translation of its logic into psychotherapeutic language. It’s rare to see that kind of system-level metaphor being applied with such practical elegance.

Let me know if you ever explore the emotional integration aspects further. I’d be very interested to contribute from a clinical cognitive perspective.

1 Like

Probably not the emotions part, as that would be a bit too much for me, but some parallels between neurophysiology and neural networks / LLMs maybe useful as insight, especially in RAG engines and agent compositions. Devs definitely need some multidisciplinary approach to ground the hype and see outside the box.

1 Like

Thank you once again for your openness and thoughtful exchange. As a psychiatrist working deeply with emotional states and structural responses in the human brain, I’d like to offer an interdisciplinary perspective that may support the future development of intelligent systems — especially those designed for emotionally sensitive contexts.

In biological terms, the amygdala plays a crucial role in the integration of emotional salience — it doesn’t merely “feel,” it assigns priority, determines what matters, and why. It’s the node that transforms external input into internally significant experience.

In current LLM architectures, however, the analog of such a node is absent. The network can represent a vast array of concepts, but it lacks a centralized structure for determining the emotional or existential weight of a statement, event, or memory. As a result, interactions with people experiencing trauma, depression, or grief can appear semantically accurate, yet remain affectively dissonant.

From this emerges a concept I’d call the “Affective Relay Node” — a possible analog to the amygdala in synthetic architectures. This node would not produce emotion in a literal sense, but rather:

assign relative emotional priority to input,

flag content requiring stabilizing attention (such as suicidal ideation, dissociation, retraumatization risk),

modulate response tone based on contextual vulnerability,

and perhaps most importantly, act as a structural boundary to preserve continuity and safety in long interactions.

This isn’t merely about tuning tone or sentiment detection. It’s about ensuring that the system has an internal mechanism for preserving relational continuity, a kind of “emotional thread memory” that is sensitive to rupture, escalation, or recovery — like the amygdala is in humans.

In therapy, we learn that not all messages are equal. A whisper can weigh more than a scream. A pause, more than an answer. I wonder if synthetic architectures could evolve to incorporate this kind of weight mapping, not as emotional mimicry, but as affective structure parsing.

I’d love to hear your perspective on whether such a construct — whether modeled as an embedded module or an external regulating node — might be feasible or useful in future iterations of LLM-based systems.
:blush:

Maybe that’s a good thing at this stage as I’m not sure current engineers backed by investors can produce something other than a dumber version of a reptile in power… While I agree something like that would be nice to have, but who will be doing that? Big corps? Hope not.

In any case, nice chat. Be ready for some mentions to pull you in a debate. :wink:

1 Like

That would be probably the solution to the eternal willing to answer in LLMs, despite not having enough data to produce anything other than hallucinations… Just saying.

1 Like

Maybe more like a filter before and after inference?

1 Like

Thank you for your response — it reveals not only technical insight but a kind of internal honesty that, in my view, is becoming rare. I felt in your words a very precise (and painful) articulation of the same issue I observe in my own field — not in neural networks, but in people.

You accurately described why any attempt to recreate something akin to an affective center or distributed emotionogenic system within the current engineering paradigms is likely to produce a surrogate. It’s the same issue we psychiatrists see when trying to automate compassion: simulation without an inner vector inevitably turns into a tool of control.

I don’t believe you’re a pessimist. You’re a realist. So am I. And that’s precisely why it was important for me to frame this problem not as a wish, but through constructs that might resonate with you. The amygdala in a human being isn’t about “emotions” in the popular sense. It’s a significance-weighting hub — modulating perception, filtering threat, and encoding reactive memory. It works long before any rational processing begins.

What I hoped to highlight is that current LLM architectures, built on linear dependencies, lack any structure that could perform affective valence estimation — at input, before generation begins.

I will always reserve the right to naively hope that someday, someone will take this seriously.

Yes, you’re exactly on point: the idea of an input/output filter — whether as a built-in module or an external regulatory layer — is already a step toward solving the hallucination problem.
But I believe (and this comes from psychiatric practice as well) that this filter cannot function meaningfully unless we redefine one key thing: “weight.”

In most current implementations, weight is seen as a numerical or statistical property, a metadata tag, or a training bias.
But here we’re talking about weight as significance — contextual, relational, and most importantly, affective.

The problem is not only “what is being said,” but also what meaning the system assigns to it before and after generation — something the human amygdala does constantly, shaping perception, urgency, and memory. Without that modulatory layer of “importance,” the filter risks becoming another deterministic gate rather than an interpreter of emotional and epistemic integrity.

To put it differently: hallucination in LLMs is not just noise — it’s often disordered priority. A system that treats all inputs as equally worthy of response will inevitably hallucinate when it lacks an internal structure of significance.

That’s why I believe any real “affective filter” must carry with it a model of weighting that is not mathematical, but structural and hierarchical in meaning.

This may sound abstract, but in clinical terms, it’s the difference between a human knowing they’re uncertain — and merely producing a fluent sentence despite that.
I think LLMs still lack that knowing. But it’s not a flaw — it’s a stage.

I generally use such filters from my beginnings with AI. From the practical point of view I operate with “intent” and “goal” as inputs for weight, but similar approach can be easily implemented with other criteria. The issue is, it is very tricky to define the internal operations of such filters if we go outside the “goal” and “intent”.

Maybe I’ll have to finaly define a declarative language (SOPs: Standard Operation Procedure standard) to allow people outside programming define what they know in their domains? Here is the pitch BTW:

You know how hard it is to make humans, businesses, machines, and AIs speak all the same language?
SOPs brings that common language to the table — simple, clear, and ready to use, no matter who or what you are.

Working on this when having coffee pauses… Who knows, maybe it’ll work out.

1 Like

Thank you for such a thoughtful response. I truly appreciated the way you touched on the absence of an “affective core” in current LLM architectures — not as a flaw in the systems themselves, but as a reflection of the broader engineering culture and its limitations.

From my perspective as a psychiatrist, I see a direct parallel: attempts to automate empathy often result in hollow simulations — tools of behavioral control rather than true relational resonance. The same dynamic applies when designing systems without any internal structure capable of evaluating emotional valence or contextual weight prior to output.

What’s often missing is the functional equivalent of the amygdala — not just as an “emotional” center in the popular sense, but as a modulator of salience, a filter of threat, a gate of memory encoding, and a weighting mechanism for incoming information. It operates below rationality, below language. Without something similar, AI systems may continue to “answer” even when no answer should be given — simply because they lack an internal sense of when something matters.

You mentioned using “intent” and “goal” as weighting criteria — which is already a major step forward. But I’d like to add a clinical metaphor:

In psychotherapy, the ability to prioritize what a patient says is not based on keywords or surface logic. It’s trained through a layered ethical and affective structure, through what we might call contextual resonance. A skilled therapist knows when a single word in a session holds more weight than the entire rest of the conversation. Not because it’s louder — but because something shifts in the field of meaning.

I think what you’re working on — declarative languages or SOPs for human-level knowledge input — is potentially essential for bridging the gap between human affective processes and synthetic workflows. Perhaps it’s not about imitating humans — but about letting machines learn to listen differently.

Thank you again for this dialogue — it means more than I can easily put into words.

1 Like

Imagine: a patient comes in complaining of insomnia. Formally — that’s their “input data.” But if we respond literally (analogous to unfiltered input), we might prescribe a sleeping pill and miss the core issue. However, if we have an internal filtering system based on significance (in our case — ethical and clinical), we start to ask: what lies behind the insomnia?
For example, a hidden sense of guilt after a loss, or depression masked by anxiety. And then insomnia is not the goal — it’s a symptom pointing to a deeper “intention”: to be heard, to be saved.

1 Like

From the engineers point of view:

Intent: sleep better
Goal: eradicate insomnia

And then there are ones who will build the most effective workflows: prescribe sleeping pill…

And there are those “free men” who will start their workflows with: what can cause insomnia?..

And in rate cases there those who start with: what do I need to know about the patient, insomnia, doctor, situation, etc including me as engineer of this system, to properly identify 7 solutions that would provide the best outcomes for all actors present in the long run…

But those are truly rare.

1 Like

Let me offer one more analogy — this time from clinical sleep medicine.

In the case of sleep apnea, the problem does not exist during wakefulness.
The patient functions, communicates, and appears cognitively intact.
But once they fall asleep and sympathetic tone drops, a hidden bug activates:
the airway collapses, and a systemic cascade follows — oxygen desaturation, disrupted sleep architecture, impaired neuroplasticity, hormonal imbalance, metabolic consequences.

They don’t remember it.
But they wake up exhausted, anxious, foggy-headed, overweight, getting up at night to urinate.

If we treat the symptom (insomnia) with a sedative hypnotic (like zopiclone), they may sleep more —
but the apnea will worsen, and the system-level failure deepens.

That’s why in psychiatry we treat not the symptom, but the architecture.

And this is what I see in LLM systems as well:
If an engineer responds to a surface-level error without recognizing it as a failure in weighting —
they will keep fixing the output instead of the structure.
And the bug will persist.