<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://bennettwaxse.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://bennettwaxse.com/" rel="alternate" type="text/html" /><updated>2026-02-08T22:49:04-05:00</updated><id>https://bennettwaxse.com/feed.xml</id><title type="html">bennettwaxse.com</title><subtitle>EHR informatics fellow exploring how frontier AI unlocks computational methods for biomedical research</subtitle><author><name>Bennett Waxse, MD, PhD</name></author><entry><title type="html">What Claude Code’s /insights Taught Me About My Workflow</title><link href="https://bennettwaxse.com/blog/bioinformatics/tools/insights/" rel="alternate" type="text/html" title="What Claude Code’s /insights Taught Me About My Workflow" /><published>2026-02-08T22:15:00-05:00</published><updated>2026-02-08T22:15:00-05:00</updated><id>https://bennettwaxse.com/blog/bioinformatics/tools/insights</id><content type="html" xml:base="https://bennettwaxse.com/blog/bioinformatics/tools/insights/"><![CDATA[<p>It will surprise no one that I love data, and the <code class="language-plaintext highlighter-rouge">/insights</code> command from Claude Code delivered.</p>

<p>1,295 messages across 325 files (+60,192/-13,058 lines of code) over 82 sessions spanning five weeks.</p>

<h2 id="the-summary-felt-spot-on">The Summary Felt Spot-On</h2>

<p>Claude Code characterized my workflow as “a deeply hands-on, iterative debugger who uses Claude Code as a persistent collaborator.” The key pattern it identified:</p>

<blockquote>
  <p>You drive complex, multi-file changes through rapid iterative cycles—letting Claude attempt solutions, catching errors and wrong approaches in real time, and steering toward simpler implementations with hands-on corrective feedback.</p>
</blockquote>

<p>It felt good to see what feels like growing competency validated, but I also learned a reality of my use: I rarely give exhaustive upfront specifications. Instead, I describe a goal and steer Claude through implementation with real-time feedback—a “try it and fix it” philosophy.</p>

<h2 id="but-the-diagnostic-section-hit-harder">But the Diagnostic Section Hit Harder</h2>

<p>What surprised me its surgical precision of the diagnostics. Claude Code didn’t just tell me what I was doing; it told me where things were breaking down and why.</p>

<h3 id="where-things-broke-down">Where Things Broke Down</h3>

<p><strong>Incorrect Initial Implementations Requiring Multiple Fix Cycles</strong></p>

<p>Claude’s first-pass code frequently contained bugs—wrong API access patterns, broken migrations, incorrect UI wiring. The report cited specific examples:</p>

<blockquote>
  <p>Stripe subscription date extraction failed three times (direct attribute access, wrong <code class="language-plaintext highlighter-rouge">.get()</code> level, then finally <code class="language-plaintext highlighter-rouge">items.data[0]</code>), and credits still weren’t granted afterward—burning an entire session on what should have been a straightforward API integration.</p>
</blockquote>

<p>The fix: “Providing more explicit specifications upfront (e.g., expected Stripe object structures, exact DB constraints) or asking Claude to outline its approach before coding could reduce these costly iteration rounds.”</p>

<p><strong>Wrong Solution Architecture Before User Correction</strong></p>

<p>Claude repeatedly proposed overly complex solutions—wrapper abstractions, symlink chains, /tmp hacks—when I already knew a simpler approach existed. Example from my genomics pipeline work:</p>

<blockquote>
  <p>For GCS file localization, Claude cycled through symlinks, gsutil in-script, and dirname approaches before you steered it to the straightforward FUSE mount solution you wanted from the start.</p>
</blockquote>

<p>The insight: State architectural preferences upfront. “Use FUSE mounts, not gsutil” or “no wrapper classes” would have short-circuited wasted iteration.</p>

<p><strong>Aggressive Edits Without Verification</strong></p>

<p>Claude often applied changes too eagerly—editing documents without pausing for review, adding more content than requested, or committing unwanted files. I interrupted tool calls twice during document editing because Claude was applying changes without letting me review intermediate results.</p>

<p>The data quantified this: 20 instances of “wrong approach” friction and 19 instances of “buggy code,” but only 3 rejected actions and 2 user interruptions. I prefer to let Claude attempt things and then correct rather than pre-empt.</p>

<h2 id="the-improvements">The Improvements</h2>

<p>Then came the actionable section—not generic advice, but specific refinements drawn from actual sessions:</p>

<h3 id="1-exact-prompts-for-claudemd">1. Exact Prompts for CLAUDE.md</h3>

<p>The report generated precise additions for my project documentation based on pain points from our sessions:</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="rouge-code"><pre>When working with Stripe API objects, always access nested data through 
<span class="sb">`items.data[0]`</span> for subscription details. Never use <span class="sb">`.get()`</span> or direct 
attribute access on top-level Stripe objects for period dates.
</pre></td></tr></tbody></table></code></pre></div></div>

<p>This wasn’t general strategy. It was extracted from three failed attempts at the same integration pattern.</p>

<h3 id="2-lifecycle-hooks">2. Lifecycle Hooks</h3>

<p>The report suggested hooks to auto-run linting or tests after edits, catching buggy first-pass implementations before I see them. Example:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="rouge-code"><pre><span class="na">hooks</span><span class="pi">:</span>
  <span class="na">post-edit</span><span class="pi">:</span>
    <span class="c1"># Auto-run linter after Python edits</span>
    <span class="pi">-</span> <span class="na">if</span><span class="pi">:</span> <span class="s2">"</span><span class="s">file.endsWith('.py')"</span>
      <span class="na">run</span><span class="pi">:</span> <span class="s2">"</span><span class="s">ruff</span><span class="nv"> </span><span class="s">check</span><span class="nv"> </span><span class="s">{file}"</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<h3 id="3-better-prompt-patterns">3. Better Prompt Patterns</h3>

<p>Example prompts showing how to add context up front, guide iterative fixes, and spawn parallel agents. One particularly useful pattern for my genomics pipeline work:</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="rouge-code"><pre>Context: KIR-mapper pipeline using dsub + Google Batch. Prefer FUSE mounts 
over gsutil. No wrapper abstractions around dsub functions.

Task: Debug why pipeline stalls at sample 199/200...
</pre></td></tr></tbody></table></code></pre></div></div>

<h2 id="the-meta-learning">The Meta-Learning</h2>

<p>A tool that accelerates my data processing is also teaching me to use it better. The meta-analysis of my own workflow is as valuable as the code it helped me write.</p>

<p>The report revealed patterns I hadn’t yet noticed:</p>
<ul>
  <li>My median response time to Claude is 98.6 seconds (average 316.3s)—I’m actively engaged, not fire-and-forget</li>
  <li>24 of 82 sessions were “Iterative Refinement” type—my most common workflow</li>
  <li>Multi-file changes were Claude’s most helpful capability (20 instances), followed by correct code edits (13)</li>
</ul>

<h2 id="what-im-implementing">What I’m Implementing</h2>

<p>Based on these diagnostics:</p>

<ol>
  <li><strong>Enhanced CLAUDE.md files</strong> with tech stack preferences, data processing constraints, and architectural patterns drawn from actual failure modes</li>
  <li><strong>Pre-commit hooks</strong> for automatic linting and basic test runs</li>
  <li><strong>More upfront context</strong> in prompts—especially example data structures for API integrations and explicit architectural constraints for infrastructure work</li>
</ol>

<p>The report also suggested I’m ready for more ambitious workflows as models improve:</p>
<blockquote>
  <p>Your most painful sessions—multi-round pipeline debugging and Stripe integration fixes—should become single-prompt workflows where Claude autonomously iterates against your test suite until everything passes.</p>
</blockquote>

<p>That’s the future I’m building toward: test coverage comprehensive enough that future Claude can self-correct without intervention.</p>

<h2 id="try-it-yourself">Try It Yourself</h2>

<p>If you’re using Claude Code, type <code class="language-plaintext highlighter-rouge">/insights</code> and see what workflow patterns it identifies. The Scott Cunningham method is also worth trying: ask for a slide deck, your place in the taxonomy of how people use Claude Code, and a translation of engineering-speak into concepts you understand.</p>

<p>Still slightly offended it called me an ICU doc though…</p>]]></content><author><name>Bennett Waxse, MD, PhD</name></author><category term="Blog" /><category term="Bioinformatics" /><category term="Tools" /><category term="Claude Code" /><category term="Bioinformatics" /><category term="AI" /><category term="LLM" /><category term="Research Informatics" /><category term="Productivity" /><summary type="html"><![CDATA[Claude Code analyzed 1,295 messages across 325 files and diagnosed my workflow bottlenecks with surgical precision.]]></summary></entry><entry><title type="html">Using Claude Code for EHR Informatics: Getting Started, Part I</title><link href="https://bennettwaxse.com/blog/bioinformatics/getting%20started/claude-code-ehr-informatics-part-1/" rel="alternate" type="text/html" title="Using Claude Code for EHR Informatics: Getting Started, Part I" /><published>2026-01-29T17:15:00-05:00</published><updated>2026-01-29T17:15:00-05:00</updated><id>https://bennettwaxse.com/blog/bioinformatics/getting%20started/claude-code-ehr-informatics-part-1</id><content type="html" xml:base="https://bennettwaxse.com/blog/bioinformatics/getting%20started/claude-code-ehr-informatics-part-1/"><![CDATA[<p>I’ve been using Claude Code to create cohorts, diagnose bugs, and really accelerate my research workflows. Before getting into the fun stuff though, I want to share how I set up my environment. If you’re new to Claude Code or curious about what goes into my CLAUDE.md files, this post is for you.</p>

<div class="notice--info">

<p><strong>Coming up in this series:</strong></p>
<ul>
  <li>Part II: Skills, plugins, and MCP servers</li>
  <li>Part III: Building a cohort in <em>All of Us</em> with Claude Code</li>
</ul>

</div>

<p>If you’d like to follow along, the GitHub repo is <a href="https://github.com/bwaxse/research-code">research-code</a>. All files referenced here are available for you to use.</p>

<h2 id="why-context-matters">Why Context Matters</h2>

<p><em>But Bennett, can’t I just give Claude a snippet of code and ask it to do something?</em> Sometimes that works well, but the key is context.</p>

<p>If LLM context windows are big enough to consider an entire notebook or project, why limit them? Don’t hide how you derived a cohort when that derivation can inform what you’re asking the model to do next.</p>

<p>At the same time, how do you provide information efficiently without bloating the context window? The <code class="language-plaintext highlighter-rouge">.ipynb</code> format includes structural metadata that wastes tokens. References to irrelevant methods add noise without value, and you don’t want to hit your usage limit prematurely.</p>

<p>Claude Code already enables users to review an entire codebase and offer bug fixes or deep understanding. Why not do the same for research? Here’s how I approach it.</p>

<h2 id="1-centralize-reference-information">1. Centralize Reference Information</h2>

<p><img src="/assets/images/posts/2026-01-29-claude-code-0-starting-files.png" alt="Starting file structure" class="align-right" style="max-width: 300px;" /></p>

<p>The models have been trained on what the internet knows about <em>All of Us</em>, which is both good and bad. They have a general sense of what’s available in the biobank, but for rapidly evolving systems, that information may already be outdated.</p>

<p>For my work, I asked Claude what would be helpful to know. Publicly available data dictionaries exist online, but what details matter most? I downloaded the data dictionary and pared it down to essentials: Table Name, Field Name, OMOP CDM Standard or Custom Field, Description, and Field Type. Claude then generated SQL to provide counts for each table in the current CDR. The result: 9 <code class="language-plaintext highlighter-rouge">.tsv</code> files of schema structure and data counts, adhering to the &lt;20 censoring required by <em>All of Us</em> (<code class="language-plaintext highlighter-rouge">_reference/all_of_us_tables/</code>).</p>

<h2 id="2-centralize-phecode-lists">2. Centralize Phecode Lists</h2>

<p><a href="https://phewascatalog.org/">Phecodes</a> are manually curated groupings of ICD codes designed to capture clinically meaningful concepts for research. Lisa Bastarache has an excellent <a href="https://pubmed.ncbi.nlm.nih.gov/34465180/">review</a> if you want to learn more. Not every phecode is perfect, but if clinicians and researchers have already worked to group billing codes into meaningful categories, why not start there? We all know about the reproducibility crisis in biomedical research, and random unvalidated ICD groupings aren’t going to help.</p>

<p>This is why I include CSV files mapping phecodes to ICD codes (<code class="language-plaintext highlighter-rouge">_reference/phecode/</code>).</p>

<h2 id="3-identify-trusted-queries">3. Identify Trusted Queries</h2>

<p>I also brought in a trusted ICD query from my labmate, Tam Tran. He’s the force behind <a href="https://github.com/nhgritctran/PheTK">PheTK</a>, a fast Python library for Phenome Wide Association Studies (PheWAS) that includes Cox regression for incident analyses, dsub integration for distributed computing, and more.</p>

<p>In developing PheTK, Tam discovered some peculiarities worth noting:</p>

<div class="notice--warning">

<p><strong>V-code ambiguity:</strong> While most ICD-9 and ICD-10 codes differ structurally, V codes exist in both. V01-V09 means “Persons With Potential Health Hazards Related To Communicable Diseases” in ICD-9-CM but “Pedestrian injured in transport accident” in ICD-10-CM. His query always joins the concept table and matches <code class="language-plaintext highlighter-rouge">vocabulary_id</code>.</p>

</div>

<p><strong>Dual identifiers:</strong> ICD codes appear as both <code class="language-plaintext highlighter-rouge">concept_id</code> and <code class="language-plaintext highlighter-rouge">concept_code</code> (e.g., 1567285 and A40 for Streptococcal sepsis), and not always both present. His query checks for both.</p>

<p>By keeping these queries in <code class="language-plaintext highlighter-rouge">_reference/trusted_queries/</code>, I carry forward these lessons in my code.</p>

<h2 id="4-collect-your-code">4. Collect Your Code</h2>

<p>Finally, the important part—your actual code. I work in <em>All of Us</em>, a cloud-based environment where researchers cannot download individual data. To export notebooks safely, I created <code class="language-plaintext highlighter-rouge">upload_safe.sh</code>, a script that syncs with my GitHub repo, copies selected notebooks, and strips them of output, bucket paths, and secrets. This way, Claude only sees code—not data.</p>

<div class="notice--danger">

<p><strong>This was critical for me.</strong> In Claude Code, it’s easy to share something unintentionally. I never want to share data I don’t have permission to share.</p>

</div>

<p>In this public repo, I’ve included a few published projects:</p>
<ul>
  <li><strong>genomics</strong>: Genomic analysis pipelines for <em>All of Us</em> genetic data</li>
  <li><strong>hpv</strong>: HPV research cohorts using OMOP CDR data</li>
  <li><strong>nc3</strong>: N3C RECOVER Long COVID phenotyping algorithm adapted for <em>All of Us</em></li>
</ul>

<h2 id="orienting-claude-code">Orienting Claude Code</h2>

<p>Once all files are in place, you’re ready to initialize. Open terminal, navigate to your working folder, and type <code class="language-plaintext highlighter-rouge">claude</code>.</p>

<p><img src="/assets/images/posts/2026-01-29-claude-code-1-opening.png" alt="Opening Claude Code" /></p>

<p><img src="/assets/images/posts/2026-01-29-claude-code-2-trust.png" alt="Trust files prompt" /></p>

<p>Type <code class="language-plaintext highlighter-rouge">/init</code> to create a <code class="language-plaintext highlighter-rouge">CLAUDE.md</code> file for the repository root.</p>

<p><img src="/assets/images/posts/2026-01-29-claude-code-3-init.png" alt="Running /init" /></p>

<p><img src="/assets/images/posts/2026-01-29-claude-code-4-init-output.png" alt="Init analyzing codebase" /></p>

<p><code class="language-plaintext highlighter-rouge">CLAUDE.md</code> files define coding standards, review criteria, user preferences, and project-specific rules. Each time you start Claude Code, this document loads into context and guides your session. Anthropic recommends keeping it focused and concise, updating as the repository evolves.</p>

<p>Claude does the heavy lifting. It produces a solid first draft:</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
</pre></td><td class="rouge-code"><pre>This repository contains computational methods for research informatics
and genomics research, primarily focused on analyzing data from the NIH's
<span class="gs">**_All of Us_ Research Program**</span>. Code examples are shared from
bennettwaxse.com and include analysis tools for:
<span class="p">
-</span> <span class="gs">**Genomics**</span>: Variant analysis, ancestry inference, PCA workflows using PLINK2 and Hail
<span class="p">-</span> <span class="gs">**HPV Research**</span>: Cohort construction and analysis
<span class="p">-</span> <span class="gs">**N3C/RECOVER**</span>: Long COVID phenotyping algorithms adapted from PySpark to Python/pandas
<span class="p">-</span> <span class="gs">**Reference Materials**</span>: _All of Us_ table schemas, PheCode mappings, and Verily Workbench helpers
</pre></td></tr></tbody></table></code></pre></div></div>

<p>It included other useful sections: Platform, Key Environment Variables, Code Structure, Project Organization, Data Handling, Common Libraries, and Workflow Patterns.</p>

<p><img src="/assets/images/posts/2026-01-29-claude-code-5-claudemd.png" alt="CLAUDE.md created" /></p>

<h2 id="editing-claudemd">Editing CLAUDE.md</h2>

<p>The first draft captured the general intent, but I edited line by line. I added a <strong>Development Philosophy</strong> section describing the importance of:</p>

<ul>
  <li><strong>Data validation throughout processing</strong>: Check frequently that data transforms as expected</li>
  <li><strong>Code clarity over abstraction</strong>: These scripts serve a dual purpose—analysis and teaching EHR/genomics informatics. I want trainees to see exactly how processes work.</li>
</ul>

<p><img src="/assets/images/posts/2026-01-29-claude-code-6-revision.png" alt="Revising CLAUDE.md" /></p>

<p>I also added a section about <em>All of Us</em> rules, including count censoring for all values &lt;20 to minimize problematic reporting.</p>

<p>Some things Claude got wrong. One section referenced code for setting environment variables in the new Verily Workbench. Claude assumed this was required for every project, so I clarified it’s only for new Verily Workbench notebooks, a work-in-progress for <em>All of Us</em>.</p>

<p><strong>As with everything AI-generated, my mantra: review it line by line.</strong> Then iterate. Claude refined my verbose writing and kept things focused.</p>

<h2 id="creating-claudeignore">Creating .claudeignore</h2>

<p>Next, I asked Claude to create subdirectory CLAUDE.md files. These only load when Claude works in those directories—a good way to reveal specifics only when relevant. Remember, it’s all about efficient context usage.</p>

<p><img src="/assets/images/posts/2026-01-29-claude-code-7-claudeignore.png" alt="Creating .claudeignore" /></p>

<p>I created a <code class="language-plaintext highlighter-rouge">.claudeignore</code> file to:</p>
<ul>
  <li>Prevent reading <code class="language-plaintext highlighter-rouge">.ipynb</code> files (redundant with <code class="language-plaintext highlighter-rouge">.py</code> scripts)</li>
  <li>Block files containing secrets like <code class="language-plaintext highlighter-rouge">.env</code></li>
  <li>Exclude raw data files (<code class="language-plaintext highlighter-rouge">.bam</code>, <code class="language-plaintext highlighter-rouge">.fastq</code>)—Claude isn’t analyzing actual data</li>
  <li>Skip Python cache, build artifacts, and IDE files</li>
</ul>

<p>I did keep <code class="language-plaintext highlighter-rouge">.csv</code> and <code class="language-plaintext highlighter-rouge">.tsv</code> off this list since I share mappings and references with Claude.</p>

<p><img src="/assets/images/posts/2026-01-29-claude-code-8-readme.png" alt="Claude asking about README" /></p>

<p>Claude also asked clarifying questions. It offered to expand the README and suggested a few other improvements.</p>

<h2 id="the-result">The Result</h2>

<p>What we have is a meaningfully structured folder with source material that mirrors my typical workflows and resources. CLAUDE.md files orient Claude to the project, and <code class="language-plaintext highlighter-rouge">.claudeignore</code> tells it what to avoid.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
</pre></td><td class="rouge-code"><pre>research-code/
├── CLAUDE.md              # Root instructions
├── .claudeignore          # Files to skip
├── _reference/
│   ├── CLAUDE.md          # Reference-specific context
│   ├── all_of_us_tables/  # CDR schemas
│   ├── phecode/           # Phecode mappings
│   └── trusted_queries/   # Vetted SQL patterns
├── genomics/
│   ├── CLAUDE.md          # Genomics-specific context
│   └── *.py               # Analysis scripts
├── hpv/
│   └── ...
└── nc3/
    └── ...
</pre></td></tr></tbody></table></code></pre></div></div>

<h2 id="whats-next">What’s Next</h2>

<p>If you’re excited to see what Claude can do with this foundation, you’re in the right place. Next time, I’ll introduce skills, plugins, and MCP servers—components that extend what Claude Code can do!</p>

<p>Soon, you’ll see how Claude Code is supercharging my data analysis in <em>All of Us</em>. If you’re already using Claude Code, I’d love to learn how you’re using it too!</p>

<p>Until then, sciencespeed.</p>

<p><img src="/assets/images/posts/2026-01-29-claude-code-9-bye.png" alt="Setup complete" /></p>]]></content><author><name>Bennett Waxse, MD, PhD</name></author><category term="Blog" /><category term="Bioinformatics" /><category term="Getting Started" /><category term="Claude Code" /><category term="Bioinformatics" /><category term="All of Us" /><category term="Tutorial" /><category term="AI" /><category term="LLM" /><category term="Research Informatics" /><category term="OMOP" /><category term="Getting Started" /><summary type="html"><![CDATA[How I set up Claude Code for research informatics work in All of Us—from CLAUDE.md files to .claudeignore—and why context matters more than clever prompts.]]></summary></entry><entry><title type="html">Using Claude Code for Genomic Pipeline Optimization: A KIR Mapper Case Study</title><link href="https://bennettwaxse.com/blog/bioinformatics/dsub-scale-up/" rel="alternate" type="text/html" title="Using Claude Code for Genomic Pipeline Optimization: A KIR Mapper Case Study" /><published>2025-01-23T14:00:00-05:00</published><updated>2025-01-23T14:00:00-05:00</updated><id>https://bennettwaxse.com/blog/bioinformatics/dsub-scale-up</id><content type="html" xml:base="https://bennettwaxse.com/blog/bioinformatics/dsub-scale-up/"><![CDATA[<p>As a physician-scientist with clinical and lab experience, I know how to ask answerable questions, structure experiments, and evaluate results carefully. But I didn’t have the computing skills to do everything I wanted. Two years ago, large language models changed that. By pasting code into Claude.ai, I could generate entire classes and SQL queries that did exactly what I needed. It was inefficient at times—for instance, Claude didn’t know much about polars initially—but it worked, and I knew this was the work I wanted to do.</p>

<p>At the end of last year, I discovered Claude Code. I was blown away. I built a command-line tool to efficiently evaluate primary literature using LLMs, which eventually became <a href="https://scholia.fyi">Scholia.fyi</a>—releasing in the next few weeks. When I returned to work in January, it was clear Claude Code would also take my informatics to the next level.</p>

<p>This is the first in a series of posts about using Claude Code for data analysis and scientific work. I found plenty of resources on building <a href="https://www.youtube.com/@avtharai">apps</a> or <a href="https://www.youtube.com/@PatrickOakleyEllis">websites</a> with <a href="https://www.youtube.com/watch?v=gv0WHhKelSE">Claude Code</a>, but not much on bioinformatics and science. My hope is to create a community of Claude Code-literate scientists who keep humans in the loop and amplify what we can do.</p>

<p>This first example involves scaling a KIR gene mapping pipeline from 40 samples to 200,000+. Beyond optimizing machine types (Claude Code helped me find a 30% cheaper parallelized setup), I consistently hit a wall around 100-200 samples. The tool worked fine at small scale but failed predictably in production. This is a story about debugging something outside my expertise and learning where Claude Code genuinely helps.</p>

<!--more-->

<h2 id="the-setup">The Setup</h2>

<p>I’m working with <a href="https://github.com/erickcastelli/kir-mapper">kir-mapper</a>, which aligns KIR genes from whole-genome sequencing. The pipeline orchestrates several tools:</p>

<ol>
  <li>GATK PrintReads converts CRAM to BAM, extracting the KIR region</li>
  <li>kir-mapper map aligns reads to KIR references</li>
  <li>(Later) ncopy, genotype, haplotype call variants</li>
</ol>

<p>The strategy was simple: run 5 samples in parallel using GNU parallel, process them in 100-200 sample batches. This should be efficient. Instead, the pipeline stalled reliably at sample 99/100 or 199/200.</p>

<h2 id="the-pattern">The Pattern</h2>

<p>The failure was consistent but puzzling:</p>

<ul>
  <li>40-sample run: succeeded</li>
  <li>64-sample run: succeeded</li>
  <li>100-sample run: stalled at 99/100</li>
  <li>199-sample run: stalled at 198/199</li>
  <li>200-sample run: stalled at 199/200</li>
</ul>

<p>Always at or near the last sample. This suggested something systematic—not random failure, not resource exhaustion, but something about how the parallel queue itself was behaving.</p>

<p>I first analyzed memory and disk usage throughout the runs with Claude Code. We never exhausted resources. That ruled out the obvious culprits. Beyond that, I was stuck. File conflicts in parallel execution was a hypothesis, but debugging GNU parallel internals isn’t my area, and I’d never written low-level code to know where to look.</p>

<h2 id="where-claude-code-helped">Where Claude Code Helped</h2>

<p>I gave Claude the problem, access to the kir-mapper source code, and access to my base and parallelization scripts. It did something I couldn’t have done efficiently: read through thousands of lines of C++ and identify where temporary files were created.</p>

<p>Here’s what it found in map_dna.cpp (lines 1473-1474):</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre><span class="n">string</span> <span class="n">v_sai1</span> <span class="o">=</span> <span class="n">v_output</span> <span class="o">+</span> <span class="s">"tmp1.sai "</span><span class="p">;</span>
<span class="n">string</span> <span class="n">v_sai2</span> <span class="o">=</span> <span class="n">v_output</span> <span class="o">+</span> <span class="s">"tmp2.sai "</span><span class="p">;</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Compare with how SAM files are named (line 1134):</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre><span class="n">string</span> <span class="n">outsamtmp</span> <span class="o">=</span> <span class="n">v_output</span> <span class="o">+</span> <span class="n">v_sample</span> <span class="o">+</span> <span class="o">*</span><span class="n">i</span> <span class="o">+</span> <span class="s">".tmp.sam"</span><span class="p">;</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>The difference is stark: SAM files include the sample name. The temporary BWA index files do not. When five samples run in parallel, they all write to the same <code class="language-plaintext highlighter-rouge">tmp1.sai</code> and <code class="language-plaintext highlighter-rouge">tmp2.sai</code> files. File locks accumulate. After a certain number of jobs queue up, GNU parallel hits an internal limit and stops accepting new ones.</p>

<p>This explained an interesting observation: GATK worked fine, even though it also runs in parallel. The difference isn’t parallelization strategy—both GATK and kir-mapper use GNU parallel with 5 samples at a time. The difference is how they handle temporary files. GATK apparently names its temp files in a way that avoids collisions (or writes them elsewhere), whereas kir-mapper creates those shared <code class="language-plaintext highlighter-rouge">tmp1.sai</code> and <code class="language-plaintext highlighter-rouge">tmp2.sai</code> files that all samples contend for. When file locks accumulate on those shared temp files, the GNU parallel queue hits its limit and stops accepting new jobs.</p>

<p>Could I have figured this out alone? Eventually, probably. But it would have taken substantially longer. Could I have abandoned parallelization and forfeited 30% savings, also yes. The key was that Claude could quickly parse unfamiliar C++ and identify the pattern.</p>

<h2 id="two-fixes">Two Fixes</h2>

<p>With the diagnosis in hand, I implemented two complementary fixes:</p>

<p><strong>Fix 1: Per-sample output directories.</strong> Instead of all samples writing to a shared output directory:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
</pre></td><td class="rouge-code"><pre><span class="nb">local </span><span class="nv">sample_output</span><span class="o">=</span><span class="s2">"./kir_output/</span><span class="k">${</span><span class="nv">person_id</span><span class="k">}</span><span class="s2">"</span>
<span class="nb">mkdir</span> <span class="nt">-p</span> <span class="s2">"</span><span class="nv">$sample_output</span><span class="s2">"</span>

kir-mapper map <span class="se">\</span>
    <span class="nt">-bam</span> <span class="s2">"</span><span class="k">${</span><span class="nv">person_id</span><span class="k">}</span><span class="s2">_chr19.bam"</span> <span class="se">\</span>
    <span class="nt">-sample</span> <span class="s2">"</span><span class="k">${</span><span class="nv">person_id</span><span class="k">}</span><span class="s2">"</span> <span class="se">\</span>
    <span class="nt">-output</span> <span class="s2">"</span><span class="nv">$sample_output</span><span class="s2">"</span> <span class="se">\</span>
    <span class="nt">-threads</span> <span class="nv">$THREADS_PER_SAMPLE</span> 2&gt;&amp;1
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Each sample now gets its own subdirectory. The <code class="language-plaintext highlighter-rouge">tmp1.sai</code> and <code class="language-plaintext highlighter-rouge">tmp2.sai</code> files for sample A don’t collide with those for sample B.</p>

<p><strong>Fix 2: 25-sample sub-batches.</strong> Even with per-sample directories, sending 100 jobs through GNU parallel at once keeps you near that queue limit. So instead of:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre><span class="nb">seq </span>1 100 | parallel <span class="nt">-j</span> 5 <span class="s1">'run_kirmap {}'</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>We process in sub-batches:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="rouge-code"><pre><span class="nv">BATCH_SIZE</span><span class="o">=</span>25

<span class="k">while</span> <span class="o">[</span> <span class="nv">$idx</span> <span class="nt">-le</span> <span class="nv">$NUM_SAMPLES</span> <span class="o">]</span><span class="p">;</span> <span class="k">do
    </span><span class="nv">BATCH_END</span><span class="o">=</span><span class="k">$((</span>idx <span class="o">+</span> BATCH_SIZE <span class="o">-</span> <span class="m">1</span><span class="k">))</span>
    <span class="k">if</span> <span class="o">[</span> <span class="nv">$BATCH_END</span> <span class="nt">-gt</span> <span class="nv">$NUM_SAMPLES</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
        </span><span class="nv">BATCH_END</span><span class="o">=</span><span class="nv">$NUM_SAMPLES</span>
    <span class="k">fi

    </span><span class="nb">seq</span> <span class="nv">$idx</span> <span class="nv">$BATCH_END</span> | parallel <span class="nt">-j</span> 5 <span class="s1">'run_kirmap_with_env {}'</span>
    <span class="nv">idx</span><span class="o">=</span><span class="k">$((</span>BATCH_END <span class="o">+</span> <span class="m">1</span><span class="k">))</span>
<span class="k">done</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>25 jobs stays well below the queue limit and within the ballpark of successful runs.</p>

<h2 id="why-this-matters">Why This Matters</h2>

<p>Many of us build bioinformatics pipelines with deep domain expertise but without systems-level knowledge. You understand your science (genomics, epidemiology, medicine) but orchestrating multiple tools, managing parallel job queues, and debugging temporary file handling is a different domain entirely.</p>

<p>Claude Code was useful here specifically because:</p>

<ol>
  <li>It could read large codebases quickly and extract relevant sections</li>
  <li>It could reason about file I/O patterns and concurrency issues</li>
  <li>It helped me iterate through multiple approaches and find the best</li>
</ol>

<p>What it didn’t do: it didn’t magically know the answer. I had to provide the logs, frame hypotheses, and verify its analysis myself. The diagnosis made sense conceptually, so I checked the source code to confirm. That verification step is critical, but we’re already used to this in science.</p>

<h2 id="when-its-actually-useful">When It’s Actually Useful</h2>

<p>If you’re considering Claude Code for bioinformatics, the wins come from:</p>

<ul>
  <li><strong>Code analysis</strong>: Reading unfamiliar source code to find issues</li>
  <li><strong>Prototyping approaches</strong>: Testing multiple strategies before committing</li>
  <li><strong>Architecture</strong>: Designing cleaner pipeline structure (mine went from 6 stages to 3)</li>
  <li><strong>Getting through the tedium</strong>: I’ve also used it to construct new cohorts, an easy but tedious process</li>
</ul>

<p>The important caveats:</p>

<ul>
  <li>You still need domain knowledge to evaluate suggestions</li>
  <li>Verify diagnoses yourself, especially for system behavior</li>
  <li>Use it as a thought partner, not a substitute for critical thinking</li>
  <li>If something doesn’t make sense, push back and ask for more detail</li>
</ul>

<h2 id="the-result">The Result</h2>

<p>The pipeline now processes all samples in a batch successfully, maintains parallelization efficiency, and scales to multiple ancestries. More importantly, I understand why it failed and what the fixes address. That understanding is what matters for extending the code later, and approaching similar tasks in the future.</p>

<hr />

<p>If you’re building scientific pipelines and hit problems outside your expertise, try Claude Code. Frame questions clearly, give it access to relevant code, and validate its analysis. The combination of clear problem statements, access to source code, and iterative refinement creates a solid workflow for technical debugging.</p>

<p>I’ll be writing more about how I’m using Claude Code for data analysis work. If you have questions or want to share how you’re using it, I’d be interested to hear.</p>]]></content><author><name>Bennett Waxse, MD, PhD</name></author><category term="Blog" /><category term="Bioinformatics" /><category term="Claude Code" /><category term="Bioinformatics" /><category term="Pipeline Optimization" /><category term="KIR Mapping" /><category term="Parallel Processing" /><category term="Debugging" /><summary type="html"><![CDATA[As a physician-scientist with clinical and lab experience, I know how to ask answerable questions, structure experiments, and evaluate results carefully. But I didn’t have the computing skills to do everything I wanted. Two years ago, large language models changed that. By pasting code into Claude.ai, I could generate entire classes and SQL queries that did exactly what I needed. It was inefficient at times—for instance, Claude didn’t know much about polars initially—but it worked, and I knew this was the work I wanted to do.]]></summary></entry></feed>