The friction collapse
How AI erodes the judgement it requires
From Michael Faraday’s The Chemical History of a Candle
A former colleague of mine was an expert in pretty much all the fields the company was interested in. Nobody ever really reviewed his work to find the obvious flaws that, as human work, it sometimes exhibited. Worse still, if he reviewed someone else’s work, anything he did not flag was, de facto, correct. This placed the burden of verification and correctness on him, while the other engineers gradually became complacent. The situation we face today when using AI is, in a way, similar: each employee now has not just one, but a whole cohort of experts at their disposal, available 24/7. They never get tired or snappy and they are far more knowledgeable on any given subject than most team members. The dilemma we face is then: how to check something better than us without rubber-stamping or pretending to out-expert it?
I see a mixture of excitement and shame when people use LLMs: the pleasure of the speed and operating at a higher level, combined with the somewhat defensive and romantic view of the Craft: “I’m using an LLM but it’s just for a quick-and-dirty proof-of-concept” or “I’m using an LLM, but I’m double-checking every line”, or even “I don’t trust the LLM: I can code better”. This feeling is not new and generations of software engineers experienced it while the software industry gradually developed languages with higher levels of abstraction, where the price of hardware relative to engineering changed the tradeoffs that were acceptable. From a reluctance to write anything other than raw machine code ("Why would you want more than machine language?" allegedly said Von Neumann when first hearing about FORTRAN in 1954) to developers scoffing that the web could not possibly be a serious platform. So why would LLMs be any different?
The conventional answer is that they aren’t, that this is just the next round of deskilling, which should be addressed by drills or friction like the previous ones were: this was argued elegantly by Mohammad Hossein Jarrahi. But LLMs are different because, for the first time, the thing being abstracted away has no fast oracle. While type checking and compilation assert well-formedness mechanically and reproducibly, design and architecture have no such test. Previous abstractions automated deterministic tasks like memory allocation, but LLMs reach into architecture and design (e.g., which tradeoffs are acceptable, what abstractions to choose) and that judgement layer has no fast or mechanical oracle, only reality itself.
Producing software used to require enough engagement with the implementation and judgement accumulated as a side effect: the engagement was friction, and the friction was generative. Software required training and practice which, over time, led to a collapse of the boundary between the craftsman (a programmer, an artist) and their tool (a musical instrument, a computer). That flow state was earned and the by-product was judgement, and that judgement kept unnecessary complexity at bay.
Agents make it possible to create whole architectures without accruing the same experiential grounding: they aren’t mere tools that amplify human judgement like compilers. As they take on activities that require judgement, it no longer comes for free. Whilst AI lets you explore ten architectures instead of one, judgement becomes increasingly detached from first-hand experience. Because it became optional for producing correct, working software, it now has to be rebuilt on purpose. This echoes Bainbridge’s irony of automation: a skill silently disappears and is not there when you reach for it. This is also visible outside of software engineering: continuous exposure of experienced endoscopists to AI across multiple centres led to a 20% decrease in adenoma detection rate when not using AI. In software, Anthropic researchers found that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains. Once judgement is no longer exercised during production, it decays like every unused skill, and the complexity it guarded against slowly creeps back.
I often hear that LLMs should be treated like junior developers, but I disagree: the implicit assumption is that they're below you now, they grow, and eventually replace you. But LLMs don't care. They don’t care if a particular outcome is “good” or “bad” because they are designed to produce plausible outcomes, not to have a stake in it: they behave like a cohort of experts with no ethos. Indeed, alignment training collapses entropy towards an averaged human preference, but never yours in particular. I therefore believe the expertise of LLMs could increase forever without them ever acquiring the one thing we humans provide: an opinion and a reason to care. Just like an editor-in-chief’s job is to choose between equally “good” articles to maintain an editorial line, our job is holding the stance, the voice, the no. Without an editor, an LLM’s non-principled, locally reasonable choices gradually accumulate into globally incoherent systems which no longer fit in any human brain — or any LLM’s context window. Somewhat paradoxically, we reach for LLMs to untangle the very complexity they produce.
The degradation in our ability to judge and the increase in systems complexity happen silently because software engineers cannot reliably perceive their own degradation. In addition, unlike aviation, which has official bodies that can identify the causes of accidents and offer recommendations, software incidents are diffuse and unattributable: while catastrophic software failures do result in postmortems and involve investigative bodies (like CISA), architectural rot has no visible crash site and hence has no body that could link the resulting failure to skill atrophy. Although we can already see that engineers can no longer keep up with validating AI-generated code, which can cause increased defect rates and even outages, the real issue is that essential systems will silently become too complex to understand, maintain, and fix.
So our original question (how to check something better than us without rubber-stamping or pretending to out-expert it) is in fact the wrong question. Indeed, verification was never the problem. The real problem is that LLM-induced complexity and skill atrophy feed each other because, unlike in aviation, software engineers are both the people designing the plane and the pilot flying it. As we use LLMs as the path of least friction, our ability to judge decays. All the while, systems get more complex, and we use LLMs more to untangle the complexity. Eventually, the system collapses and we cannot replicate what aviation did because architecture and design have no fast oracle. The question is not whether human judgement will still be needed in ten years: it is whether we will still be able to judge.

