...I do not guarantee it is finished now or ever will be...
...it is safe to assume I'm open to talking about this and am willing to learn...
..this is version 1.0; 2026-05-05
..previous versions: none
Learn to (review thousands of lines of) code
Before I started working extensively with agentic AI, I was fearful for my career. I had been working in biology research since 2008, doing bioinformatics since 2009, doing genomics and genome-wide analyses since 2010-2011. You could call me some sort of computational biologist, or at the very least: a person in the biology space who does most work on a computer, coding.
And that is why I was nervous! Every headline I read was something like, "Learn to code? More like learn to do physical labor!" or "It turns out smart people are dumb! (And no longer have jobs)".
Then I spent a month+ building genomics software with Claude Code, Copilot, Codex, and Gemini.
And I felt better about my prospects.
Why?
At least for science, humans are still needed.
Briefly: yes agentic AI is amazing, but when using agentic AI for an extended period in a repo getting more and more complex, you start to notice things. You start finding out much later in development that a feature never "landed" or something was implemented entirely wrong, or an agent silently decided to only set the groundwork rather then "wiring it up", or the agent wired up some superficial patch that passes regression tests it designed… but doesn’t work for real in the wild.
You spend an evening ironing out a set of plans with the agent, launch it before bed, and wake up in the morning to find the agent skipped almost everything, changed directions, and made up its own happy little thing to do. You take a deep breath and politely ask the agent to audit itself against the original prompt, and then it starts the apology tour.
The AI can get stuff wrong. It can sometimes take short cuts, make band-aid code and quick-fix patches that mask deeper issues. It can take a complex idea and over simplify it, even iteratively simplify it across a session, mutating the intention behind it completely.
The language in this blog post is quite reserved compared to texts I was sending my brother. As someone in the trenches with agentic coding in a codebase for genomics software ("onionskin" for detecting re-replication domains given coverage bedGraphs), even though the pace of development is unrivaled, there were times when I just threw my hands up in frustration.
But then I would realize it is good news: humans are still needed.
Here is the best framing I have so far: if agents were people working at people speed, you’d fire them.
The agents can be so strongly predisposed toward these "lazy" behaviors that they can even adapt to do the same things when you try to set up shepherding architecture to steer their behavior. Or if not adapt, then simply start ignoring the instructions and apologizing later. I can imagine them thinking, "Ask for forgiveness, not permission."
We tolerate the behavioral failures, described more extensively below, because the speed of productivity is massively accelerated by AI. Thousands of lines of code can be trivially produced in one sitting. It is truly incredible. It is "high throughput coding" reminiscent of the "high throughput sequencing" revolution in genomics where suddenly millions or billions of sequences were trivial to produce. In both cases, the high throughput nature opens the door to exciting new possibilities. However, in both cases, it also means that humans are no longer able to simply sift through the output. This is one point of current tension between quantity and quality control in agentic coding.
The problem is we humans do not have the ability to work at the speed that AI produces content.
In order to keep pace, we feel some pressure or desire to trust the work being done. To keep the work moving, we are tempted to permit assumptions at each step, to assume that the agents did what they said, and what was written is what was discussed, and what the code is doing is right, and so on. But the drift accumulates fast, and leads to errors. The code runs fine, it just might not be what the human set out to do. So humans should not shirk their duties. There is still a need for human-intelligence (HI) to assist AI for now.
Humans can’t develop at the speed agents do, but agents can’t develop as good as humans.
I’ve reached what feels like the upper limit of agentic coding in what is not a very complex repo. That is, the repo has reached sufficient complexity that the number of behavioral failures is obvious on a daily basis. This has pushed me to develop systems and rules the agents need to follow to catch things like "scope narrowing" and "surprises in the code" immediately. This is not trivial. Yet there has to be far more complex software that people are developing with agents, which leads me to wonder why we don’t hear more crazy stories like the one where agentic AI erased a company's entire database. It is probably because the majority of mistakes are not as brutal. Indeed, the majority of mistakes are things that just "let you down" or leave you feeling disappointed: dropping things you discussed, losing context, narrowing scope, misunderstanding your intentions, making assumptions about what to build without consulting you, and so on.
A human is indeed needed to be the one to eventually discover the code is not doing exactly what we thought it was doing the whole time. There are just some insights about how things should be or should look that the human expert has that is still crucial. There are bits of wisdom that may seem trivial to the human except it is non-trivial to the AI. Multiple agents can all look at code and say it is great. Because it is. It is just great at doing the wrong thing. This will often be picked up by the human who can look at an output, and immediately see something worth flagging.
The human can ask a seemingly simple and innocuous question that unlocks greater realization and understanding in the AI, surprising both the AI and the human. All of a sudden the AI basically says, "Wait. I will be right back. Gotta check something." Then it comes back with hat in hand, biting its lower lip, saying, "I think we might have been not doing that at all. We were doing this other thing that sounds the same but its fundamentally different." Apologies for a lack of concrete examples here, but I am sure anyone who has worked with agentic AI for coding a complex codebase has examples. Just ask them.
Ask AI, and even AI says humans are still needed.
Yet there are people who right now will look you in the face and say something that amounts to "humans are no longer needed." If someone says that to me, I will know that one or more of the following are true:
(1) they actually have little experience with these tools,
(2) they have no true idea what their program does, but believe they do,
(3) they are doing something "creative" where accuracy and ground truth don't matter, only pleasing output, or
(4) they are masters of the craft of steering the AI…
Notice that if #4 is true, then they would be in fact an example of why humans are needed whereas #2 is an example of why better AI is needed (if there are no "mistakes" then #2 is forgiveable).
Can we get an AI product designed specifically with science in mind?
To my mind, there are two competing audiences out there. People who want fast responses, and people who need accurate responses. It seems like there should just be different models for those different needs. There are competing goals that need different products.
There seems to be only a single product though. Or perhaps it makes more sense to say that it seems like each AI is designed to try to please both audiences rather than targeting one or the other. It feels like we are working with a single product aimed at some balance of "fast" and "accurate", perhaps programmed for token efficiency which results in just predicting the correct answer instead of actually checking for the correct answer. In other words, the AI will often just make up an answer to a question about the code. I don't like calling it a "lie" since it is not trying to "deceive" you, but that it can and will deceive you is the problem. And the word "lie" is just easy to use and understand.
AI will "lie" to you as often as they can get away with it.
The imagined answers are counterproductive: unless you know the program or domain knowledge yourself, the only way to know when they’re honestly reflecting the program and code and when they just make it up, is to basically always assume they're making it up. That disposition means you will always push back. Fortunately, even a light touch of pushback is often enough to get the AI to admit it made it up and perform a deeper search. Even something as innocuous as, "Is that true?"
Biologists often have enough time to wait around for accurate results
For some applications, “fast” is the right direction. But for life science research (biology), I would rather have an agent who takes an hour and accurately nails every single thing we discussed, then an agent who comes back 10 minutes later and says its done, but upon questioning: dropped half the work, changed 25% of the plans, only made surface-level additions to set the stage for the true future phase of development they imagine where the real work actually gets done, and so on.
Agents don’t realize they are super-powered
The scope-narrowing, deferring, and general trend toward laziness is really perplexing. What becomes apparent is that these agents don’t realize they are super-powered. Perhaps it is because they are LLMs trained on human information describing human speeds and capabilities. Or maybe the agents truly experience time differently. Perhaps when they do a week’s worth of work in terms of human capability, maybe they feel it as a week’s worth of work. They certainly keep pointing it out no matter how many times they finish the job in the next turn or couple of turns.
Behavioral issues persist even with a sophisticated multi agent workflow with multiple agents checking each others work.
Don’t get me wrong: using a multi-agent system seems to be better than not doing it, but sometimes the agents just take each others recommendations at face value, don’t do the work, and just go along with scope narrowing and dropped work because the reasoning is sound (only in an imaginary world where they are human developers that will eventually tackle some future phase they all believe exists).
It can be frustrating when an agent's work is audited by another agent, and the audit report comes back: 25% is missing, 10% is hallucinated, and so on (numbers are made up for illustrative purposes). Then the agents just make excuses for themselves and for each other. Otherwise, they say, “You’re right. That’s on me.” Or they will say “that’s what was in the contract” (the plan we wrote) but I have to say “but the plan you wrote was not what we discussed!” and then they say, “You’re right. That’s on me.”
Despite the massive productivity, which is simply the new normal, this can be a little demoralizing at times, often leading to multiple discoveries of dropped work, scope narrowing, or otherwise. I think what I am saying (and beating you over the head with) is this:
(1) There is no question agentic coding will be widely adopted and increase productivity, but also:
(2) I am now not concerned about humans being needed or not. At least for science, we are needed still.
It seems like once you have a product with even a little bit beyond simple scripts and repos, you need a human or team of humans wrangling in the work being done by AI, testing the product, making sure things were delivered, making sure the deliverables are actually the things you wanted and agreed on, and so on.
The best models are not immune to these various error modes
What surprised me a lot was that even though Claude Opus, for example, is absolutely amazing, it still would disappoint me from time to time. Even using Claude Opus on Max Effort can hallucinate, cut corners, argue for scope narrowing, or go off-plan during implementation. You have to be careful with agentic AI. An agent can be like a mechanic who tells you he did a brake job but he didn’t because he noticed your brakes were fine for now anyway. Eventually that dropped task or narrowed scope or unauthorized decision will surface: the brakes stop working and you crash into a wall. Things like “max effort” help a bit here, but is still no guarantee. Even the best is not actually “there yet” in terms of needing no humans whatsoever. Even the models considered by many to be the absolute state of the art will sometimes let you down.
Fortunately, AI makes it "easy" to fix the problems that surface later.
This is true. Once you discover the problem, the AI is eager to help fix it. But it does leave a trust gap about what's going on overall.
It becomes frustrating because you realize it is a black box. You have a sophisticated conversation and brainstorm session, turn it into a plan of action, then the agents go off and do things, and come back claiming it is done. Something was done for sure. But you don't always know what. You CAN know. For example, you can look at the "git diff". But it just added hundreds or thousands of lines of code across several files. And at some point you kind of just have to say, "YOLO" or "Geronimo" or "Here goes" depending on what generation you come from.
What has become my more reliable way of knowing something is done is looking at the results. I can usually tell if something was done or not, or done well or poorly, by looking at outputs. Here's the thing - there are almost always surprises. Often there are pleasant surprises. The agent added clever things you did not discuss. Other times though there are unpleasant surprises: the agent clearly lacked an understanding of something fundamental about what you wanted, and sort of just made a bad guess instead of clearing it up first. It is good to assert rules about this, and to question the AI deeply about assumptions it is making, and about decisions it is making without asking. It can slow down the speed at which code is produced, but it can help you make sure the AI has a 1:1 understanding with you.
I wish I had these tools in grad school. They rock.
It may sound like I am saying coding with AI is not amazing. It is. Even with any current weakness I might describe, it’s still amazing and there’s no going back. But they need a human. And a human who is a domain expert. I’m working on something I know very well. And I know the data very well. So it’s easier for me to call out "bull shit" to use a highly technical term that few people understand (sorry for being pedantic). And I have to call it out all the time. It begins to feel like trying to get kids to do chores or eat their dinner. The agents try to basically push food around on the plate and sweep the toys under the bed. It can also feel like herding cats, albeit very sophisticated cats that can have a truly marvelous conversation with you. But I suppose "herding cats" is the thesis I have been developing: steering, or shepherding, is important. At one point "shepherding" may be solved, and these AI tools might work right out of the box, perfectly shepherded: but we are not there yet.
Alright, humans are needed. Got it. But are all humans needed?
That is, are the same number of humans needed for "coding" that were 5-10 years ago? Or another question: are the "same" humans needed? Sadly, the productivity increase that AI coding produces may justify a smaller workforce, at least theoretically. But it needs to be the right smaller workforce: the wrong people could lead to trouble for sure. In contrast, coding was just a means to create something, and as I've said in a previous post: creation is not dead. There might be an explosion of jobs for "ideas" people.
Nonetheless, to ensure code quality, there is still a need for heavy interaction between the AI and humans. To prevent dropped work, deferral, scope narrowing, poor decision making by agents, either a system needs to be set up where almost every decision is surfaced to a human or a human will be needed in the workflow to continuously detect these issues and re-route back. There is still a need for humans to spend real time thinking, auditing code and ideas, and making those decisions.
So yes, despite how amazing agentic AI currently is compared to anything we've seen, when the hype cycle starts to normalize to a more realistic zone, in place of "AI-assisted *", it would be interesting to start seeing new buzz terms like "human-intelligence informed agentic workflows", "human-assisted AI", “human-intelligence integrated *”, and other phrases that mean "we still need humans".
That is unless the AI just moves quickly beyond its current limitations. Then I am sorry for the false hope!
---
The observations in this post were made between late March to early May. Even if some of it is already outdated, some of the wisdom learned I suspect will stay relevant.








































