AI Productivity Paradox - Are we actually moving slower?

I’ve been looking into some recent studies and commentary on AI-assisted development. It's decisive where and how we use these tools.

There is a difference in "feeling" high speed, and actually being efficient. If you ever drove an old car vs a new one - you know.

Sometimes, tasks with AI felt more engaging than they would have been otherwise. This was especially the case for repetitive ones like writing lots of similar tests or classes. Making the tasks into an interactive game, where I try to get the agent to do all the work with minimal manual intervention, was more fun than churning out very similar code over and over. But I don’t think it was faster.

[...] if I were shooting for maximum productivity on these sorts of issues, I would spend a lot of up-front time writing detailed issue descriptions, including specific implementation suggestions.

https://domenic.me/metr-ai-productivity/

Several studies say we might be falling into a "perception gap" that creates long-term technical debt and hidden costs.

Highlights (or lowlights):

19% Slowdown: A study by METR found that while devs felt 20% faster using AI, they were actually 19% slower on complex, real-world tasks. We’re essentially trading deep work for "Review Fatigue."

Mastery Gap: Anthropic research shows that AI-assisted coding can reduce logic mastery by 17%, making it much harder for us to debug the very code we "wrote" just an hour later.

"Code Churn" Liability: GitClear’s analysis of 200M+ lines of code shows a 60% decline in refactoring and a massive spike in "churn"—code that has to be deleted or fixed within two weeks.

My Takeaways:

If used for complex architecture or "black box" logic AI is a liability.
It is a superpower for boilerplate, regex, and unit tests.
Let’s be very intentional.
Using AI to solve a problem you don't fully understand yet, is offloading the cost to the next person.

Direct links for those interested in the data:

METR Study: https://domenic.me/metr-ai-productivity/
Anthropic: https://arxiv.org/html/2601.20245v1
GitClear: https://www.gitclear.com/coding_on_copilot_data_2024_report

That being said ...

I use AI in particular cases but in others it is posing a slow-down.

Here is a concrete example: You have a publicly available framework and you wrote your own wrapper which resides in another repository and is only linked as a package.

Since you wrapped the framework and grouped certain function calls to reduce boilerplate your AI agent is now completely amiss what is happening in your wrapper functions. It cannot anticipate this and it cannot read the code to understand it. Worse even it has either no or very limited capacity to understand from examples you may provide in addition.

In my experience the training from open-source code is weighing more than your provided example.

So the agent goes ahead and tries to mock functionality using stuff it knows when it could spin up an actual instance and test against it. Or it works around your wrapper all-to-gether to hide its incompetence.

So I am just not doing things like this anymore.

When I have a test function, I ask it to create more tests to ensure the behaviour in all cases is documented and well covered. But I need to provide atleast one test function for it to work.

When I have a custom framework it does not and cannot know (eg. binary package), I provide ample context and shrink the scope of work down so it does not matter.

In a way then, I am doing a lot of thinking so the AI can do the typing ;-)

Curious where we are moving to next!