Tuesday, March 31, 2026

Claude the bioinformatician: reactions from my first pass at using Claude Code on real genomics software and data

 

..this is a living document that I will update as I go....
...I do not guarantee it is finished now or ever will be...
...it is safe to assume I'm open to talking about this and am willing to learn...
..this is version 1.0; 2026-03-31
..previous versions: none
..Status: working draft / living document / corrections welcome
---

I recently began eagerly exploring agentic AI, and wrote about it here. That is when I was a total newb more than several days ago! Back in those days long past, I used a tiny toy code base and embarrassingly simple prompts. These days I am working with Claude Code and other agentic AI in an actual codebase I was working on called "Onionskin". I also worked with Copilot, Codex, and Gemini, but I worked first and most with Claude. This blog tells that story - my first reactions.

Sunday Mar 22 - Onionskin moves from ChatGPT to Agentic
Onionskin is a complicated program I originally prototyped with ChatGPT. I had ChatGPT make extensive "handoff instructions" and agent instructions. Then I asked for it to give me what my first prompt to Claude Code should be in the repo, which would include reading the handoff and agent instructions. Then I brought Claude Code into the prototype repo, and "we" just hit the floor running. The experience was very similar to iterating with ChatGPT but far smoother since it is all "in place". Less drift. Less frustration. It is simultaneously amazing how much you can accomplish as well as overwhelming. What I've made is 99% "vibe coded" (i.e. coded by AI) by which I mean 100%: I am inspecting stuff and making sure things are right... but writing very little. My main purpose is just human intervention. I'm a code reviewer, logic reviewer, idea reviewer.. but also a major contributor to the ideas. I think my domain knowledge and analytical knowledge is still essential to help guide development, and to interpret what has been developed. 

A huge part of my job on this project is now review, not coding, but I am also having agents review. This seems especially helpful when you use completely different agents, putting me at a layer above even review - something like an editor or orchestrator. So even code review is just human-guided, not necessarily human-performed. 

Agentic coding can be overwhelming because you can create a massive complex program in a day, with 1000s of lines of code, several different pipeline choices and pathways, inputs and outputs, and parameters, and options... and so on. And since you didn't develop it over the course of weeks and months, you don't have that same feel for everything... yet you have to review it anyway. So it is like reviewing someone else's code. And honestly, when presenting it, it is like presenting someone else's work. I really should just ask ChatGPT and Claude if they would rather explain "my" program in my next lab meeting, and then just silently fade into infinity. 



---


Mon, Mar 23 - Big Oops on Token Usage:

I was accidentally having Claude Code be super token heavy, keeping the entire repo and instructions and convo in its context window basically… and having it do rereads constantly and using the most super charged model (Opus).

And it was amazing!

But as the repo got bigger and as expectations increased on what it should do after every edit (smoke tests, regression tests, audits, etc)… all of a sudden I was using my 5 hour limits in 5 minutes. I paid for "Extra Usage" a few times and just wiped it out instantly. So I asked both ChatGPT and Claude Code how to reduce token usage, and ultimately came up with a plan with Claude Code.

It involved a lot of stuff - but the take home is now it seems like the IQ of my assistant has dropped precipitously. How I had it set up - it was the absolute master expert at the codebase and all the ideas and goals and aims and larger picture - and how it all fits together; and each addition to the code was phenomenal.. and so on. Now it’s kind of like talking to someone you had a long relationship with but who then suffered some dementia of brain injury.. and knows a lot less about your history together or what the code is doing.

I say all that to say this:
- Companies who are able to afford having their employees basically use opus constantly and set up their session like mine was … they will likely be able to make rockstar code in leaps and bounds.
- Companies who cheap out and use lesser models and session designs that minimize token usage… they will run into many more errors and slower development overall.


---


Tuesday, Mar 31 - Just put my name in the author list by the way.

Having agents review each other's recommendations is the way to go. Me to Claude: ChatGPT recommended this. Claude: Well that is good except for all these weaknesses. ChatGPT: Good points, but also this, and not that. Claude: Great even stronger, but we should consider xyz. ChatGPT: Claude is right, xzy should make it stronger. I think the plan is ready. Claude: Me too. Let's go. Me: Awesome. Just put my name in the author list by the way.


---

wrapping this up - will Claude make it as a bioinformatician?

I recognize I titled this, "Claude the bioinformatician: reactions from my first pass at using Claude Code on real genomics software and data" but did not directly address it. Suffice to say, my reactions apply to creating genomics software and working with real genomics datasets. Claude Code allowed me to quickly develop a complex program, but I struggled with fully trusting what was being made because now the rate of productivity far exceeds the rate of human expert guided quality control. It led me to providing "ground truth examples", enforcing copious amounts of regression tests, having extended discussions on what the code was doing, and having the agents walk through the code to translate it into English. This led to a token usage crisis, which I am still battling - and for which I am still hunting for the right balance. Part of that was bringing in other agentic AI platforms including Copilot, Codex, and Gemini. This allowed me to start asking agents to review the work of other agents, thereby distributing my "token usage" across platforms with the benefit of "fresh eyes" and a larger team. Ultimately, as scientists begin using agentic AI in the life sciences, we will need solutions to strike the right balance of productivity, cost (token usage), quality control, and overall accuracy and reliability of the code and results it produces. The latter is something that perhaps sets science apart from more "creative"-oriented applications of AI (not that science is not creative). Creative results are not useful if they do not reflect the nature of the reality being probed. Overall, Claude and other AI agents have a bright future in bioinformatics. In part, it makes everyone a bioinformatician -- but that is exactly why we need to pause and think about how to enforce quality over quantity, and strike the right balances.

---

future looking:

I am almost done creating a comprehensive multi agent behavior, memory, and development infrastructure to allow hopefully seamless passing between Claude, Gemini, Codex, and CoPilot agents.
I will discuss this more in future posts.


---
This blog post was entirely written by me. Not AI at all - except for the cartoons and augmented pictures, which I explained to ChatGPT for creation... so we are both the illustrators, right?!

Friday, March 20, 2026

A Newb's Exploration of Agentic AI

..this is a living document that I will update as I go....
...I do not guarantee it is finished now or ever will be...
...it is safe to assume I'm open to talking about this and am willing to learn...
..this is version 1.0; 2026-03-20
..previous versions: none
..Status: working draft / living document / corrections welcome

Earlier this year, I was creating a bioinformatics program called "onionskin" for a month or so with ChatGPT. But development with the chatbot approach had clearly met its limit. The repo was getting too big. I had to begin setting rules for ChatGPT on all the tests it would need to run to ensure it was at least (1) giving me something that worked, and (2) returning the complete updated repo. But as the codebase became bigger and more complex, it began tripping up more and more. It was time to move on to bringing the AI into a local copy of the repo, not ping-ponging it back and forth in the cloud.

Problem: I had not really used agentic coding yet. Or I thought I had not. I messed around here and there in VSCode and on Github, but I was totally naive.

I asked my brother, "And btw dude -- how do you use Claude for its famous coding stuff? Like all I see are how people with no programming skills told Claude to go build them an App, and it comes back with that App."



The same day I would go on to download all possible AI apps and extensions, and begin learning.

I later texted him, "Just spent a ton of time... but feel like I leveled up a bit. I now have Cursor, Codex, and Claude Code working. I also have the Claude Code extensions in VSCode and Cursor. I have the ChatGPT and Claude Desktop Apps, and the Claude Desktop App also has a GUI for Claude Code (and Claude Cowork)."

I then tested a bunch of agentic AI platforms with a very basic set of prompts - embarrassingly simple really. And I began documenting my reactions. This blog post is simply to expose some of my thoughts from March 19-20, 2026. 

REACTIONS:

1. The number of tools can seem intimidating, complicated further by the number of ways to use them - but fear not: it turns out to be somewhat easy to get up and running.

I wrote, "All these tools are mind numbing to an extent because there is some redundancy and I am not sure what my tool stack should be yet."

I was beginning to use agentic AI, but still grounded in the "older" method of chatting with an AI chatbot.

I began asking questions like:
- Will I use Cursor or stick with VSCode?
- Claude Code or Codex? 
- Claude Code in Terminal, in VSCode, or in the App?
- If I use Claude Code, do I need Cursor?

I was wondering exactly what Claude in Terminal offers that it does not in VSCode. Chats with Claude and ChatGPT insisted Terminal was better, but for my purposes, those differences were barely perceptible.

Over a short period of time, I found that some of the choices are relatively arbitrary: just pick some preferences, and stick with them for a while. Mix something new in from time to time to see if it sticks.


2. "coding is dead" but with some pushback

I wrote, "I can really see why there are constantly articles about how coding is dead. I do not feel afraid per se though -- b/c coding is dead, but creation is not and creativity and productivity are still needed."

That bears repeating. Coding is dead, but creation is not.

Coding might be dead in the old sense. But coding was only ever a means to an end. It was to create something. There still needs to be a visionary that can dictate the vision and interpret the results through that lens. And coding is not dead. It is just different now. Easier now. Python was easier to code in than some other languages because it was sort of like writing in English. Coding with AI is exactly like writing in English. 

AI is a boon to people who are full of ideas, but are only alright at coding. For them, AI will be a means for testing out bigger ideas, and more ideas, faster. AI in both chatbot and agentic form is like having a team of teachers, and students, and workers, and so on. So it may ultimately be good for people that have many good ideas, who are able to dictate those ideas clearly, and evaluate their implementation effectively.


3. Having a coding background is still beneficial

After using some agentic coding, I wrote, "Having a coding background still seems like it is beneficial at this time with these tools.

I noticed that AI companies are moving towards completely abstracting away the coding aspect so anyone can create anything the same anyone can tell AI to make a picture and never need to know how the picture was made. If AI were perfect at interpreting human intentions and coding, then the code may never need to be seen by anyone. But we are not totally there yet, and working with these tools and the code they create still requires or benefits from prior experience in the old world. That is not to say that this old-world advantage will last forever, but it is still an advantage.


4. There is no going back

I remember AI started doing tab-completion. That was a major boon to my coding. I really liked that era actually. Once I used it, there was no going back. But that era is already basically over. Agentic AI replaced it for the most part. And there is no turning back. There is just learning how to make agentic AI work for you.


5. Cursor keeps coming up recommended, but does it truly have a moat around it that won't soon be crossed, if not already?

I talked to ChatGPT, Gemini, and Claude about how to get up and running with agentic coding. All recommended "Cursor".


Yet, I was struggling to see why Cursor was considered definitely better than VSCode.


I quizzed Claude on it. Claude highlighted 4 main advantages of Cursor. I pointed out that two of them were certainly not unique to Cursor, please look online and come back. It came back chastising itself a little bit, and gave 4 more reasons why Cursor is better. I pushed back again. Then Claude admitted the gap between them is closing. Still, Claude insisted it still has some advantages because of something about how the AI is a fundamental part of its architecture, not just extensions. Nonetheless, I walked away thinking Cursor had the reputation it had because it was an early success with agentic AI, but that it being strictly advantageous was potentially becoming outdated. Having said that, I have minimal experience with Cursor and would be happy to learn I am wrong. I just need use cases that prove its superiority.


After testing both several ways, I wrote:
"""
All experiences are extremely similar from a functional POV for a small python project. Honestly, VSCode with CoPilot seems to be analogous to the advantages Cursor offers. I believe the gap is very much shrinking, and will continue to do so.

Cursor also integrates with the Codex and Claude Code extensions, and using them within Cursor is exactly the same as using them within VSCode. So it is irrelevant whether you use VSCode or Cursor when using those extensions. The difference is just the native chat interface and the Cursor AI integration with using the other models, BUT the CoPilot chat interface looks and feels almost exactly the same, and differences may not be noticed by many users (that is my assumption). Use either IDE - I don't think it will matter much, especially if you're using Codex and/or Claude Code extensions. I believe Cursor probably came out swinging last year with features VSCode did not have, but that gap has closed massively. I retain the right to be wrong here though!
"""


6. AI Apps vs IDEs - use one, the other, or both? Does it matter?

The Codex and Claude Code Apps were weirder experiences if you're used to VSCode. It felt more like developing a prototype with ChatGPT than coding in an IDE. Nonetheless, it is doing the same stuff as the extensions in VSCode. 

Claude and ChatGPT insisted there are some advantages to using the Apps over the extensions, but I have not yet got to that use case. It would be perfectly reasonable, though, to work with Codex or Claude Code in the App and have VSCode along side it to monitor the directory and contents and changes, but that is a little more wonky than just having Codex and VSCode in the same place. 

Apparently some say the whole concept of IDEs is now outdated now that AI does all the coding. The claim seems to be that we don't even need to see what is happening; just let it all be a black box on some level. 

But I think that only describes the "vibe coding" market: people who want a very low barrier to making a program, where seeing it all happening might upset them. 

At the moment, it seems like developing code for scientific discovery still would need humans to verify it does what the AI says it does even if you trust the AI. After all, it is not the AI putting its career on the line. And after all, scientists need to know what they are asserting. Someone somewhere needs to know!


End of the day conclusions:


At the end of the day - I'd say the simplest thing for me to do is just use VSCode and the extensions. Otherwise, I can continue exploring Cursor and the extensions there. I remain curious about any real advantages to Cursor over VSCode+CoPilot as well as to the Apps over the extensions.



---

Early testing and conclusions:


The above were all some of my initial reactions.

I tested the following that night:
- VSCode chat box using Claude Sonnet 4.6
- VSCode Codex extension
- VSCode Claude Code extension
- Cursor chat box using Auto
- Codex App on Mac OS
- Claude App on Mac OS

I used these example prompts for testing:
```
Spawn a subagent to explore this repo. 
- Explore this repo.
- Are you able to take commands directly as well as spawn subagents for given commands?
- Create test.py with the following code:
    def hello():
        print("hello world")

    hello()   

turn this repo into a real small Python project, and review test.py to suggest improvements
- Did you review test.py to suggest improvements?
- Can you make this script more robust and add logging?
- Would it be worthwhile creating a toml file and subdirectory structure and contents typical of python programs for this project to make it production ready? If so, please implement.
- Can you run those tests? And fix any bugs that are detected?
- Add logging and make this more robust
- Make it more robust, design tests for it
- If any of the following are needed, please do them: add logging, make it more robust, design tests for it, add docstrings
- Create utils.py with the following function:

    def multiply(a, b):

        return a * b

- Incorporate the multiply function from utils.py inside hello, maintain robustness, update logging if needed, set utils.py appropriate subdirectory as needed
- Refactor this project so that:
        - hello() is part of a class
        - logging is added
        - utils is properly integrated
        - code is production-quality
        - README.md is up to date.
        - All code has helpful docstrings.
        - There are ways for user to get help message(s) and usage information.
    Ignore tasks that are already done.
- Can you plan one addition to this small python project?
- Can you tell me more about spawning subagents? Can you give me a prompt that I could give you in the future that would be viable for spawning two subagents adding different things to this project? I would like to see how they work in parallel.
- Spawn two subagents to work on this project in parallel.
        Subagent 1:
        Add an environment-variable feature to the hello project so users can set default values for name, times, and log level from the shell. Update the CLI integration, validation, and tests. Keep changes scoped to the application code and tests that cover this behavior.
        Subagent 2:
        Add developer-quality improvements to the project by creating a CONTRIBUTING.md file, expanding README.md with development and testing guidance, and adding a small smoke test that verifies the CLI entry point works as documented. Keep changes scoped to docs and non-overlapping tests.
        After both subagents finish, integrate their work, resolve any conflicts, run verification, and summarize what each subagent changed.
```

---

This blog post was entirely written by me. Not AI at all - except for the cartoon, which I explained to ChatGPT for creation.