Thursday, April 30, 2026

Gemini: reactions to integrating Gemini into a multi-agent development system for genomics software

..this is a living document that I will update as I go....
...I do not guarantee it is finished now or ever will be...
...it is safe to assume I'm open to talking about this and am willing to learn...
..this is version 1.0; 2026-04-30
..previous versions: none
..Status: working draft / living document / corrections welcome
---

Gemini has been integrated across most Google products, and I like a lot of it. I love talking to Gemini Live inside Google Maps when traveling, asking about the route as well as talking about anything I want. In general, I've used "live mode" in the Gemini App more than in ChatGPT or Claude Apps. Gemini also lives inside my Gmail, Google Drive, and Chrome Browser. So it is becoming omnipresent. I've been figuring out more ways to use it, and it is mostly fantastic for those applications.

Gemini is also a strong chatbot for discussing bioinformatics. Prior to using the Gemini CLI for agentic coding, I would use the chatbot as a second or third "voice" for reasoning out various features to add to "onionskin", a program I began developing with ChatGPT followed by Claude Code and other agentic AI, touched on previously here and here. Since it was the AI that I talked with most in Live mode, I began having live conversations with Gemini on my car rides. Gemini even taught me about Claude Code, and how to do things like "slash commands" - the most useful one I learned being "/remote-control". 




My fondest memory of Gemini Live conversations was discussing a recent feature I added to "onionskin" with Claude Code (perhaps ChatGPT advised on it as well). The new feature involved scoring genomic coverage profiles corresponding to candidate re-replication domains according to shapes like rectangles or triangles. The purpose was to classify those candidates as either collapsed repeats (rectangles) or true re-replication domains (triangles). It involved computing shape scores, then using a Bayesian Information Content (BIC) approach to see whether triangle modeling performed substantially better than rectangle modeling. In a single car ride, Gemini helped me work out exactly how the shapes were being modeled, scored, and compared -- without either of us ever looking at the code. The next chance I got, I found all the pertinent code - and there it was exactly as we had surmised. So Gemini Live was fantastic for conversation, and fantastic for tossing around ideas.

Overall, Gemini is a strong chatbot for discussing bioinformatics. That is why I was surprised at the relatively poor performance of Gemini CLI for agentic coding in early April 2026 (underlined because this opinion is probably already "stale"). 



Due to its ubiquitous integration into the Google ecosystem I've been using for years, I have often posited that if Google "gets it right" with Gemini, I could see a world where it is the only AI I need. But so far, that world does not exist yet. I find tremendous value in using other AIs. 

Let's be more specific. I was working with Gemini three ways during April:
1. Gemini Code Assist extension in VSCode
2. Gemini CLI agentic coding in VSCode
3. Gemini CLI in Terminal

Moreover, the model I was using almost exclusively was "Gemini 3.1 Pro Preview" - but also other models available around this time.

I wrote the following to my brother on April 16th, 2026: 
Gemini CLI in VSCode is barely usable for agentic coding. It can take like 30 minutes to answer a question like “How are you?”. It just hangs forever - probably because it is building or reading its massive context window. But that is problematic too - it takes a snapshot to memory and then never checks again basically without forcing it to at gun point. So if you are toggling between agents and doing active development, it quickly becomes “stale” and far “adrift” from the current reality of the code base. That was tolerable b/c there are ways around it - but the chronic slowness is insane.

Let's break that down. There were some important bits that bear repeating.

1. Ultra slow responses - "It can take like 30 minutes to answer a question like “How are you?”. It just hangs forever - probably because it is building or reading its massive context window."

2. Large context fails if it is quickly stale - "it takes a snapshot to memory and then never checks again basically without forcing it to at gun point. So if you are toggling between agents and doing active development, it quickly becomes “stale” and far “adrift” from the current reality of the code base."


Fortunately, I found that the ultra slow response problem was solved if I were to use Gemini CLI in the VSCode Terminal (not the VSCode extension) or just the regular Terminal. I later said to my brother on the same day.

Update on Gemini - using Gemini CLI from the command-line is a whole different story than the Agent in VSCode (not through copilot, the regular Gemini interface in VSCode). It was fast and responsive, and more enjoyable. I only have N=1 time using it, and I can’t be sure the VSCode agent wouldn’t have also been flying. But this was a different class of experience than I had been having. Btw - I am using Gemini CLI in Terminal, but in the VSCode Terminal, so it still interacts with VSCode just fine - including showing diffs and all that.

I then asked Gemini CLI directly about the performance difference between the extension and Terminal:

White background = Gemini
Grey background = Me

  

Unfortunately, the issues with a giant stale context window are present across Gemini Code Assist (GCA for Q&A) as well as Gemini CLI in all its forms. GCA and GCLI would answer questions in a way that would have been accurate in a prior state of the codebase, but is now outdated. This meant it could not be used reliably in multi-agent architecture I was constructing during April 2026. Moreover, it tended to be bad at coding in the complex repo. I said to my brother on April 21, 2026:
It is crazy how bad even the latest Gemini agent can be at coding in a complex codebase or maybe at all. It is analogous to a chicken kicking all the chess pieces over and thinking it is winning the game. I just had to do a git revert.


In that story, when my next 5 hour Claude session started, I had to ask Claude how bad Gemini screwed up the repo. Claude came back seemingly "flustered" after investigating the git history with a report essentially condemning the work done by Gemini. Claude then helped with the "git revert" followed by addressing my original needs.

Gemini was sometimes amazing at auditing coding done by other agents, and sometimes terrible. For amazing results, the context window almost certainly had to be fresh and current with the repo. Then it seemed to have the ability to pick up on things Claude, Codex, and Copilot did not or could not pick up on. It never gave very extensive audit reports, but it would add value. Thus, the "fresh eyes" concept of using multiple agents was validated. But then other times, perhaps when the context window was stale but not necessarily, it would give very shallow and vapid reports compared to other agents. 


So will Gemini make it as a bioinformatician?

Obviously, yes - it is only a matter of time. I do not think Google will sit down and give up. Nonetheless, as recent as April 29, I was still making notes on Gemini that it was not up to the task for agentic coding.

What Gemini taught me is that a "giant context window" alone guarantees nothing. Not even good context. There were agents with context windows 5x smaller running circles around Gemini. And agents like Claude did not seem to have an approach where it trusted its context window anyway - it tries to find the pertinent files and code to read directly and answer honestly. 

Giant context windows, nevertheless, are likely better than smaller context windows given some set of conditions are met. That will certainly include continuously updating the context -- adding new and pruning old in an intelligent way. This is basically what the human brain is already very good at. Human brains have massive continuously updated context windows.

Gemini is already great at discussing bioinformatics and genomics as a chatbot and Live conversation companion. It just needs to catch up in the agentic space. Even now, it is great for one-off scripts, tab completion, code review (with a fresh context window, or in chat), and so on. I would predict that in time, Gemini CLI will catch up, and it is not impossible that it could one day lead the pack. What Google has going for it is a massive user base ready to adopt it. Their emphasis has probably been on integrating Gemini across their already expansive ecosystem (Gmail, GDrive, Maps, Search, etc). It is just a matter of time before Gemini CLI proves to be as useful as other agentic AI already is, and when that happens it can easily be widely adopted and integrated.


But at the time of writing this: Gemini CLI is not playing as well as other agents inside a multi-agent development architecture likely due to its giant context window quickly becoming stale when other agents do work, even when forcing it to read handoff files specifying the new work.


---
This blog post was entirely written by me. Not AI at all. However, ChatGPT was used to make the cartoons and augmented pictures. I had the ideas though ... so we both get credit, right?! 



No comments:

Post a Comment