..this is a living document that I will update as I go....
...I do not guarantee it is finished now or ever will be...
...it is safe to assume I'm open to talking about this and am willing to learn...
..this is version 1.0; 2026-03-31
..previous versions: none
...I do not guarantee it is finished now or ever will be...
...it is safe to assume I'm open to talking about this and am willing to learn...
..this is version 1.0; 2026-03-31
..previous versions: none
..Status: working draft / living document / corrections welcome
---
I recently began eagerly exploring agentic AI, and wrote about it here. That is when I was a total newb more than several days ago! Back in those days long past, I used a tiny toy code base and embarrassingly simple prompts. These days I am working with Claude Code and other agentic AI in an actual codebase I was working on called "Onionskin". I also worked with Copilot, Codex, and Gemini, but I worked first and most with Claude. This blog tells that story - my first reactions.
Sunday Mar 22 - Onionskin moves from ChatGPT to Agentic
Onionskin is a complicated program I originally prototyped with ChatGPT. I had ChatGPT make extensive "handoff instructions" and agent instructions. Then I asked for it to give me what my first prompt to Claude Code should be in the repo, which would include reading the handoff and agent instructions. Then I brought Claude Code into the prototype repo, and "we" just hit the floor running. The experience was very similar to iterating with ChatGPT but far smoother since it is all "in place". Less drift. Less frustration. It is simultaneously amazing how much you can accomplish as well as overwhelming. What I've made is 99% "vibe coded" (i.e. coded by AI) by which I mean 100%: I am inspecting stuff and making sure things are right... but writing very little. My main purpose is just human intervention. I'm a code reviewer, logic reviewer, idea reviewer.. but also a major contributor to the ideas. I think my domain knowledge and analytical knowledge is still essential to help guide development, and to interpret what has been developed.
A huge part of my job on this project is now review, not coding, but I am also having agents review. This seems especially helpful when you use completely different agents, putting me at a layer above even review - something like an editor or orchestrator. So even code review is just human-guided, not necessarily human-performed.
Agentic coding can be overwhelming because you can create a massive complex program in a day, with 1000s of lines of code, several different pipeline choices and pathways, inputs and outputs, and parameters, and options... and so on. And since you didn't develop it over the course of weeks and months, you don't have that same feel for everything... yet you have to review it anyway. So it is like reviewing someone else's code. And honestly, when presenting it, it is like presenting someone else's work. I really should just ask ChatGPT and Claude if they would rather explain "my" program in my next lab meeting, and then just silently fade into infinity.
---
Mon, Mar 23 - Big Oops on Token Usage:
I was accidentally having Claude Code be super token heavy, keeping the entire repo and instructions and convo in its context window basically… and having it do rereads constantly and using the most super charged model (Opus).
And it was amazing!
But as the repo got bigger and as expectations increased on what it should do after every edit (smoke tests, regression tests, audits, etc)… all of a sudden I was using my 5 hour limits in 5 minutes. I paid for "Extra Usage" a few times and just wiped it out instantly. So I asked both ChatGPT and Claude Code how to reduce token usage, and ultimately came up with a plan with Claude Code.
It involved a lot of stuff - but the take home is now it seems like the IQ of my assistant has dropped precipitously. How I had it set up - it was the absolute master expert at the codebase and all the ideas and goals and aims and larger picture - and how it all fits together; and each addition to the code was phenomenal.. and so on. Now it’s kind of like talking to someone you had a long relationship with but who then suffered some dementia of brain injury.. and knows a lot less about your history together or what the code is doing.
I say all that to say this:
- Companies who are able to afford having their employees basically use opus constantly and set up their session like mine was … they will likely be able to make rockstar code in leaps and bounds.
- Companies who cheap out and use lesser models and session designs that minimize token usage… they will run into many more errors and slower development overall.
Tuesday, Mar 31 - Just put my name in the author list by the way.
Having agents review each other's recommendations is the way to go.
Me to Claude: ChatGPT recommended this.
Claude: Well that is good except for all these weaknesses.
ChatGPT: Good points, but also this, and not that.
Claude: Great even stronger, but we should consider xyz.
ChatGPT: Claude is right, xzy should make it stronger. I think the plan is ready.
Claude: Me too. Let's go.
Me: Awesome. Just put my name in the author list by the way.
---
wrapping this up - will Claude make it as a bioinformatician?
I recognize I titled this, "Claude the bioinformatician: reactions from my first pass at using Claude Code on real genomics software and data" but did not directly address it. Suffice to say, my reactions apply to creating genomics software and working with real genomics datasets. Claude Code allowed me to quickly develop a complex program, but I struggled with fully trusting what was being made because now the rate of productivity far exceeds the rate of human expert guided quality control. It led me to providing "ground truth examples", enforcing copious amounts of regression tests, having extended discussions on what the code was doing, and having the agents walk through the code to translate it into English. This led to a token usage crisis, which I am still battling - and for which I am still hunting for the right balance. Part of that was bringing in other agentic AI platforms including Copilot, Codex, and Gemini. This allowed me to start asking agents to review the work of other agents, thereby distributing my "token usage" across platforms with the benefit of "fresh eyes" and a larger team. Ultimately, as scientists begin using agentic AI in the life sciences, we will need solutions to strike the right balance of productivity, cost (token usage), quality control, and overall accuracy and reliability of the code and results it produces. The latter is something that perhaps sets science apart from more "creative"-oriented applications of AI (not that science is not creative). Creative results are not useful if they do not reflect the nature of the reality being probed. Overall, Claude and other AI agents have a bright future in bioinformatics. In part, it makes everyone a bioinformatician -- but that is exactly why we need to pause and think about how to enforce quality over quantity, and strike the right balances.
---
future looking:
I am almost done creating a comprehensive multi agent behavior, memory, and development infrastructure to allow hopefully seamless passing between Claude, Gemini, Codex, and CoPilot agents.
I will discuss this more in future posts.
---
This blog post was entirely written by me. Not AI at all - except for the cartoons and augmented pictures, which I explained to ChatGPT for creation... so we are both the illustrators, right?!
No comments:
Post a Comment