<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://bennettwaxse.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://bennettwaxse.com/" rel="alternate" type="text/html" /><updated>2026-04-03T10:44:50-04:00</updated><id>https://bennettwaxse.com/feed.xml</id><title type="html">bennettwaxse.com</title><subtitle>EHR informatics fellow exploring how frontier AI unlocks computational methods for biomedical research</subtitle><author><name>Bennett Waxse, MD, PhD</name></author><entry><title type="html">What I’m Reading: LLMs Ace Medical Benchmarks, But Users Don’t</title><link href="https://bennettwaxse.com/blog/reading/llm-benchmarks-vs-human-utility/" rel="alternate" type="text/html" title="What I’m Reading: LLMs Ace Medical Benchmarks, But Users Don’t" /><published>2026-04-03T10:00:00-04:00</published><updated>2026-04-03T10:00:00-04:00</updated><id>https://bennettwaxse.com/blog/reading/llm-benchmarks-vs-human-utility</id><content type="html" xml:base="https://bennettwaxse.com/blog/reading/llm-benchmarks-vs-human-utility/"><![CDATA[<p>A 1,298-person RCT on LLM-assisted medical reasoning landed in <em>Nature Medicine</em> this week, and the gap between benchmark performance and real-world utility is striking.</p>

<p>Models scored between 90–99% on standalone evaluations for identifying relevant conditions. Real-world users with access to the same models hit 34.5%.</p>

<p>This exposes that this is a fundamentally different task.</p>

<!--more-->

<p>The middle ground makes the story richer: models mentioned relevant conditions in 65–73% of the conversations—already a step down from benchmark performance—but users then failed to recognize or retain those conditions in their final answers. So the information was often <em>there</em> (though not 90-99% of the time), just not <em>transferred</em>.</p>

<p><img src="/assets/images/posts/2026-04-03-llm-medical-benchmark.png" alt="Evaluation gap between model benchmarks and real-world user performance" class="align-center" style="max-width: 500px;" /></p>

<p>I keep thinking about the methodology. Standalone benchmarks feed models a full clinical vignette and ask for an answer. A back-and-forth dialogue is structurally different—it distributes information across turns, requires active synthesis, and introduces retrieval demands on the human side. We already know LLMs perform better with complete context windows. This paper operationalizes that it matters in practice.</p>

<p>Two questions this raises for me:</p>

<p><strong>Do we need benchmarks that account for the human-in-the-loop performance hit?</strong> Frontier capability clearly doesn’t translate automatically to clinical utility, and the gap here is large enough that it should change how we think about deployment readiness.</p>

<p><strong>What does this mean for clinical AI interfaces?</strong> If users are missing conditions that models mention, the bottleneck isn’t the model. Is it the interaction design, the information architecture, or what we’re asking users to do with AI-generated output?</p>

<p>Worth reading if you’re thinking about AI in clinical settings.</p>

<p><a href="https://www.nature.com/articles/s41591-025-04074-y">Paper</a> · <a href="https://github.com/bwaxse/scholia-oss/tree/main">Read with Scholia</a></p>]]></content><author><name>Bennett Waxse, MD, PhD</name></author><category term="Blog" /><category term="Reading" /><category term="LLM" /><category term="Clinical AI" /><category term="Benchmarks" /><category term="Human-AI Interaction" /><category term="Scholia" /><summary type="html"><![CDATA[A 1,298-person RCT reveals a dramatic gap between standalone model performance and real-world human utility, and raises questions about what we're even measuring.]]></summary></entry><entry><title type="html">Using Claude Code for EHR Informatics: Skills, Plugins, and MCP Servers, Part II</title><link href="https://bennettwaxse.com/blog/bioinformatics/getting%20started/claude-code-ehr-informatics-part-2/" rel="alternate" type="text/html" title="Using Claude Code for EHR Informatics: Skills, Plugins, and MCP Servers, Part II" /><published>2026-03-07T17:00:00-05:00</published><updated>2026-03-07T17:00:00-05:00</updated><id>https://bennettwaxse.com/blog/bioinformatics/getting%20started/claude-code-ehr-informatics-part-2</id><content type="html" xml:base="https://bennettwaxse.com/blog/bioinformatics/getting%20started/claude-code-ehr-informatics-part-2/"><![CDATA[<p><a href="/blog/bioinformatics/getting%20started/claude-code-ehr-informatics-part-1/">Part I</a> was about context — orienting Claude to your work, your project. CLAUDE.md files, <code class="language-plaintext highlighter-rouge">.claudeignore</code>, reference tables, trusted queries. That foundation was important, but after a while, I noticed I was spending a lot of time re-explaining or re-orienting Claude: the same ICD query pitfalls, the same dsub machine type constraints, the same plotting conventions. CLAUDE.md files are a good way to bring some of this information into context, but context is also a finite resource. How do I efficiently bring in tools and expertise just when I need it?</p>

<p>Skills, plugins, hooks, and MCP servers are the way. This post walks through what I’ve built and what I’ve found useful from the community.</p>

<div class="notice--info">

<p><strong>The series so far:</strong></p>
<ul>
  <li><a href="/blog/bioinformatics/getting%20started/claude-code-ehr-informatics-part-1/">Part I: Getting started — CLAUDE.md, .claudeignore, and why context matters</a></li>
  <li>Part II: Skills, plugins, hooks, and MCP servers ← you are here</li>
  <li>Part III: Building a cohort in <em>All of Us</em> with Claude Code</li>
</ul>

</div>

<h2 id="1-skills-stop-re-explaining-yourself">1. Skills: Stop Re-Explaining Yourself</h2>

<p>A <strong>skill</strong> is a <code class="language-plaintext highlighter-rouge">SKILL.md</code> file — structured knowledge and workflows Claude can pull in when a situation calls for it. Unlike CLAUDE.md, which loads every session, skills load on demand. The tradeoff is intentional: CLAUDE.md gives Claude its baseline orientation, skills give it deep expertise in specific situations without loading everything at once.</p>

<p>Skills live in <code class="language-plaintext highlighter-rouge">~/.claude/skills/</code> for global use, or inside a project’s <code class="language-plaintext highlighter-rouge">.claude/</code> directory for project-specific knowledge. The project-level location is where I put most of mine - it lets me titrate Claude’s abilities depending on whether I’m working on <a href="http://scholia.fyi">Scholia.fyi</a>, the website I built to evaluate primary literature, or <a href="https://github.com/bwaxse/research-code">research-code</a>, my repo for research tools.</p>

<p>Here’s a sample of what I’ve built recently.</p>

<h3 id="my-voice">My voice</h3>

<p><code class="language-plaintext highlighter-rouge">bjw-voice-modeling</code> is one of the rare global skills (<code class="language-plaintext highlighter-rouge">~/.claude/skills/</code>) that isn’t research-specific. Whether a drafted email or a notebook that I use to describe my work, I want it to sound like me. The skill includes four real writing samples across different contexts: an email to a colleague sharing null results, a methodological clarification correspondence, a speech I gave for a former senior resident, and a personal statement. Claude uses these to calibrate before writing anything on my behalf, orienting to my voice.</p>

<p><img src="/assets/images/posts/2026-03-07-claude-code-1-voice-skill.png" alt="bjw-voice-modeling SKILL.md" /></p>

<p>Anyone can build this, especially when you have Claude to help you. In addition to a bounty of helpful tools, Anthropic has a <a href="https://github.com/anthropics/skills/tree/main/skills/skill-creator">skill-creator skill</a> that you can use to help compile your skill. For my voice modeling skill, I just provided the four writing samples and iterated with Claude to make this skill that I use almost daily.</p>

<p>Hopefully you’ll see this come up again and again - use Claude to make Claude better.</p>

<h3 id="icd-queries">ICD queries</h3>

<p><code class="language-plaintext highlighter-rouge">aou-icd-query</code> builds on something I learned from a colleague. ICD code extraction in <em>All of Us</em> has a pitfall that isn’t obvious — one my labmate Tam Tran first identified when building <a href="https://github.com/nhgritctran/PheTK">PheTK</a>: ICD-9 and ICD-10 both have codes beginning with “V,” but they mean entirely different things. Both share the same <code class="language-plaintext highlighter-rouge">concept_code</code> in the database, so a text join on <code class="language-plaintext highlighter-rouge">condition_source_value</code> returns both rows. (I mentioned this in <a href="/blog/bioinformatics/getting%20started/claude-code-ehr-informatics-part-1/">Part I</a>.)</p>

<p>Tam’s correct query has three stages: extract all ICD events from both <code class="language-plaintext highlighter-rouge">condition_occurrence</code> and <code class="language-plaintext highlighter-rouge">observation</code> (across both source value and source concept ID columns), then resolve V-code vocabulary ambiguity by tracing through <code class="language-plaintext highlighter-rouge">concept_relationship</code>, then union the results. Every time I started a new cohort, I was explaining this from scratch — or hoping Claude would reconstruct it correctly.</p>

<p>Now it’s in a skill, including a note I care about:</p>

<div class="notice--info">

<p><strong>From the skill</strong>: “Prefer using the full SQL rather than importing the function — the three-stage structure is intentionally visible so users understand the V-code resolution and dual-table logic rather than treating it as a black box.”</p>

</div>

<p>The goal is legibility, not abstraction. These scripts help me try to teach as well as analyze in my notebooks.</p>

<h3 id="dsub-infrastructure">dsub infrastructure</h3>

<p><code class="language-plaintext highlighter-rouge">aou-dsub-infrastructure</code> is institutional memory. Distributed computing on <em>All of Us</em> runs through <code class="language-plaintext highlighter-rouge">dsub</code>, which submits jobs to Google Cloud Batch. dsub has a few constraints that are painful to rediscover.</p>

<p>When we migrated to Google Batch last summer, one was particularly painful: <strong>c4 machine types didn’t work.</strong> c4 requires <code class="language-plaintext highlighter-rouge">hyperdisk-balanced</code> boot disks, and dsub couldn’t set boot disk types. I learned this by submitting a job, waiting for it to fail, and debugging the error - exactly the kind of thing that belongs in a skill rather than in someone’s head.</p>

<p>The skill includes a constraint table (machine families, provider requirements, network flags, logging paths), the <code class="language-plaintext highlighter-rouge">dsub_script()</code> function pattern I use across SAIGE GWAS and METAL meta-analysis workflows, job monitoring utilities, and a machine type selection guide drawn from actual projects. When I or a trainee opens a new genomics notebook, this doesn’t need to be re-established.</p>

<h3 id="table-schemas">Table schemas</h3>

<p><code class="language-plaintext highlighter-rouge">aou-table-schema</code> is an interesting case. In Part I, I described putting <em>All of Us</em> CDR data dictionaries in <code class="language-plaintext highlighter-rouge">_reference/all_of_us_tables/</code> as flat tsv files. Claude could read them when needed — but it had no guidance about <em>when</em> they were relevant or <em>which file</em> to use.</p>

<p>Wrapping those same files in a skill keeps the information out of context when it’s not needed, and adds routing logic: when looking up table columns, load <code class="language-plaintext highlighter-rouge">table_schemas.tsv</code>; when estimating query size, load <code class="language-plaintext highlighter-rouge">table_row_counts.tsv</code>; when you need to filter by care setting, load <code class="language-plaintext highlighter-rouge">visit_concepts.tsv</code>. For the larger files, the skill instructs Claude to grep first, then load. This keeps token usage down, but still makes the data available even when I forget to direct Claude to the reference material.</p>

<p>The shift from “here are files” to “here is a skill with files” is subtle. It turns passive storage into active guidance, but it really draws from why skills are so powerful - Claude has access to all skill metadata every time, so it knows where to look and when.</p>

<p><img src="/assets/images/posts/2026-03-07-claude-code-2-claude-directory.png" alt="skill folders, hookify rule" /></p>

<h3 id="plotting-conventions">Plotting conventions</h3>

<p><code class="language-plaintext highlighter-rouge">bjw-plotting</code> is the smallest skill but saves real time. I care a lot about making figures that tell a story and look good. This skill carries forward my 11-color colorblind-friendly palette, the same seaborn whitegrid setup, consistent figure sizes by plot type. The <em>All of Us</em> count suppression rule also applies to plots: any annotation showing participant counts between 1 and 19 must display as <code class="language-plaintext highlighter-rouge">"&lt; 20"</code>. The skill encodes all of this, including the <code class="language-plaintext highlighter-rouge">format_count()</code> function, so I don’t establish it at the top of every notebook.</p>

<p>Here’s a taste from my <a href="https://www.nature.com/articles/s41598-025-02183-9">last publication</a>:</p>

<p><img src="/assets/images/posts/2026-03-07-claude-code-3-figure-5.png" alt="Figure 5" /></p>

<h2 id="2-plugins-skills-agents-hooks-and-more">2. Plugins: Skills, Agents, Hooks, and More</h2>

<p><strong>Plugins</strong> are packaged distributions that “<a href="https://code.claude.com/docs/en/plugins">extend Claude Code with skills, agents, hooks, and MCP servers</a>.” You install them once and they’re available across projects — no copying files between repos. They live in <code class="language-plaintext highlighter-rouge">~/.claude/plugins/</code>.</p>

<p>The two sources I use:</p>

<h3 id="example-skills"><code class="language-plaintext highlighter-rouge">example-skills</code></h3>

<p>Anthropic ships a set of example skills as a plugin: document creation (docx, pptx, xlsx, pdf), frontend design, web artifacts, and more. These are genuinely useful for the non-code parts of research — formatted reports, slides, analysis dashboards. More importantly, this plugin is the reference implementation for <em>how to write skills</em>. If you want to build your own, reading through these teaches you the structure. The <code class="language-plaintext highlighter-rouge">skill-creator</code> skill (included here) guides you through the process.</p>

<h3 id="the-life-sciences-plugin-marketplace">The Life Sciences Plugin Marketplace</h3>

<p>A more exciting set for researchers is the <a href="https://github.com/anthropics/life-sciences">life sciences plugin marketplace</a>. A few plugins worth knowing:</p>

<p><strong><code class="language-plaintext highlighter-rouge">biorender</code></strong>: Integrates BioRender for scientific figure creation.</p>

<p><strong><code class="language-plaintext highlighter-rouge">biorxiv</code></strong>: Access to preprints from bioRxiv and medRxiv. Useful when you want Claude to engage with recent work before it’s indexed in PubMed.</p>

<p><strong><code class="language-plaintext highlighter-rouge">open-targets</code></strong>: This is my recent favorite. Open Targets aggregates SNP data from gnomad, GWAS associations, and QTL credible sets all in one place. For the question “what else is known about this locus?” — which I ask constantly — this replaces a round trip through multiple browser tabs with a structured answer in the same session where I’m already working.</p>

<p><strong><code class="language-plaintext highlighter-rouge">nextflow-development</code></strong>: Guides users through nf-core pipelines with a structured checklist from data acquisition to output verification. I run genomics pipelines through dsub and PLINK2 on <em>All of Us</em> Researcher Workbench, so I haven’t used this one myself, but it targets exactly the bench scientist who has FASTQ files and needs to run a standardized pipeline.</p>

<h2 id="3-hooks-guardrails-for-your-code">3. Hooks: Guardrails For Your Code</h2>

<p>CLAUDE.md instructions are soft constraints. Claude reads them, but in a long session with many steps, they can be forgotten or reasoned around. <strong>Hooks</strong> are hard constraints. They’re event-driven rules that intercept tool calls before they execute and warn or block based on patterns you define.</p>

<h3 id="blocking-recursive-gcs-deletion">Blocking recursive GCS deletion</h3>

<p><code class="language-plaintext highlighter-rouge">gsutil rm -r</code> deletes everything at a GCS path, recursively. In an <em>All of Us</em> workspace, that could mean months of GWAS summary statistics, processed genotype files, or model outputs. There’s no recycle bin.</p>

<p>One hook I have intercepts any Bash command matching <code class="language-plaintext highlighter-rouge">gsutil (-m )?rm -r</code> and blocks it:</p>

<div class="notice--danger">

<p>🛑 <strong>Recursive GCS deletion blocked!</strong></p>

<p>You’re trying to recursively delete from Google Cloud Storage.</p>

<p><strong>Why blocked</strong>: This can delete entire result directories.</p>

<p><strong>Alternative</strong>: Move to a backup location instead — <code class="language-plaintext highlighter-rouge">gsutil -m mv -r gs://bucket/old-results/ gs://bucket/trash/old-results-$(date +%Y%m%d)/</code></p>

</div>

<p>If I genuinely need to delete something, I can disable the rule temporarily, forcing an intentional extra safety step. Ten lines of YAML protecting months of work.</p>

<h3 id="creating-hooks-with-hookify">Creating hooks with hookify</h3>

<p>I didn’t write that rule file by hand. <strong>hookify</strong> is a plugin that creates hooks from conversation — I ran <code class="language-plaintext highlighter-rouge">/hookify</code>, described what I wanted to prevent, and it generated the file. Rules take effect immediately, and if in practice I find the pattern to be too broad or too narrow, I edit the file directly. It’s worth installing early — once you see what hooks can do, you’ll want more of them.</p>

<h2 id="4-mcp-servers-real-time-external-connections">4. MCP Servers: Real-Time External Connections</h2>

<p>CLAUDE.md, skills, and reference files all provide <em>static</em> context — information that was true when you wrote it. (I of course should update my CLAUDE.md files, but this isn’t a strength of mine. Perhaps I need the right hook…) <strong>MCP servers</strong> connect Claude to <em>external data</em>. Claude can call external tools, get current results, and reason about them in the same context where you’re working.</p>

<p>Two I use somewhat regularly:</p>

<h3 id="pubmed">PubMed</h3>

<p>The PubMed MCP gives Claude access to <code class="language-plaintext highlighter-rouge">search_articles</code>, <code class="language-plaintext highlighter-rouge">get_full_text_article</code>, <code class="language-plaintext highlighter-rouge">find_related_articles</code>, and <code class="language-plaintext highlighter-rouge">get_article_metadata</code>. While I do most of my recent literature search or review with <a href="https://elicit.com/">Elicit</a> or <a href="https://www.scholia.fyi/">Scholia.fyi</a>, this is a great resource. Instead of opening yet another tab, I use this to ask Claude to retrieve relevant papers and reason about methodology in the same session where I’m writing code.</p>

<h3 id="context7">context7</h3>

<p>context7 retrieves current documentation for software libraries. When I’m working with a library I haven’t used recently — or one that’s changed since Claude’s training (#Polars once upon a time) — I ask Claude to pull the current docs before writing code. Library APIs shift, deprecated functions stick around in training data, and the difference between working code and code-that-looks-like-it-should-work is often one argument that changed between versions. context7 is a small habit that cuts time on avoidable debugging.</p>

<h2 id="building-your-own">Building Your Own</h2>

<p>If you want to go further:</p>

<p><strong>Skills</strong>: Start with something specific — a query pattern you explain repeatedly, a constraint you keep rediscovering, a set of conventions you want consistent across projects. The <code class="language-plaintext highlighter-rouge">skill-creator</code> skill (in example-skills) guides you through the structure. Supporting reference files go in a <code class="language-plaintext highlighter-rouge">references/</code> subdirectory alongside your <code class="language-plaintext highlighter-rouge">SKILL.md</code>.</p>

<p><strong>Hooks</strong>: Run <code class="language-plaintext highlighter-rouge">/hookify</code> and describe what you want to prevent.</p>

<p><strong>MCP servers</strong>: MCP servers are external processes that expose tools via a standardized protocol. The <code class="language-plaintext highlighter-rouge">mcp-builder</code> skill walks through creating one in Python or TypeScript.</p>

<h2 id="what-this-adds-up-to">What This Adds Up To</h2>

<p>Part I was about giving Claude a map. Part II is about giving it tools. Together: Claude knows my project structure, my domain-specific query patterns, my coding conventions, my voice — and now it won’t delete my results directory, can search the literature without breaking context, and always uses the right palette for my figures.</p>

<p>That’s not replacing judgment. It’s scaffolding that takes the repetitive off my plate so judgment is what’s left. That’s the version of this I find worth using, and most of this can be found at my <a href="https://github.com/bwaxse/research-code">research-code repo</a>. Hope they help!</p>

<p>In Part III, I’ll show what all of this looks like in practice: building a cohort from scratch in <em>All of Us</em>.</p>

<p>See you then.</p>]]></content><author><name>Bennett Waxse, MD, PhD</name></author><category term="Blog" /><category term="Bioinformatics" /><category term="Getting Started" /><category term="Claude Code" /><category term="Bioinformatics" /><category term="All of Us" /><category term="Tutorial" /><category term="AI" /><category term="LLM" /><category term="Research Informatics" /><category term="Getting Started" /><summary type="html"><![CDATA[Part I gave Claude a project map. Part II gives it tools—skills and plugins that encode expertise, hooks that prevent mistakes, and MCP servers that connect Claude to real data.]]></summary></entry><entry><title type="html">What Claude Code’s /insights Taught Me About My Workflow</title><link href="https://bennettwaxse.com/blog/bioinformatics/tools/insights/" rel="alternate" type="text/html" title="What Claude Code’s /insights Taught Me About My Workflow" /><published>2026-02-08T22:15:00-05:00</published><updated>2026-02-08T22:15:00-05:00</updated><id>https://bennettwaxse.com/blog/bioinformatics/tools/insights</id><content type="html" xml:base="https://bennettwaxse.com/blog/bioinformatics/tools/insights/"><![CDATA[<p>It will surprise no one that I love data, and the <code class="language-plaintext highlighter-rouge">/insights</code> command from Claude Code delivered.</p>

<p>1,295 messages across 325 files (+60,192/-13,058 lines of code) over 82 sessions spanning five weeks.</p>

<h2 id="the-summary-felt-spot-on">The Summary Felt Spot-On</h2>

<p>Claude Code characterized my workflow as “a deeply hands-on, iterative debugger who uses Claude Code as a persistent collaborator.” The key pattern it identified:</p>

<blockquote>
  <p>You drive complex, multi-file changes through rapid iterative cycles—letting Claude attempt solutions, catching errors and wrong approaches in real time, and steering toward simpler implementations with hands-on corrective feedback.</p>
</blockquote>

<p>It felt good to see what feels like growing competency validated, but I also learned a reality of my use: I rarely give exhaustive upfront specifications. Instead, I describe a goal and steer Claude through implementation with real-time feedback—a “try it and fix it” philosophy.</p>

<h2 id="but-the-diagnostic-section-hit-harder">But the Diagnostic Section Hit Harder</h2>

<p>What surprised me its surgical precision of the diagnostics. Claude Code didn’t just tell me what I was doing; it told me where things were breaking down and why.</p>

<h3 id="where-things-broke-down">Where Things Broke Down</h3>

<p><strong>Incorrect Initial Implementations Requiring Multiple Fix Cycles</strong></p>

<p>Claude’s first-pass code frequently contained bugs—wrong API access patterns, broken migrations, incorrect UI wiring. The report cited specific examples:</p>

<blockquote>
  <p>Stripe subscription date extraction failed three times (direct attribute access, wrong <code class="language-plaintext highlighter-rouge">.get()</code> level, then finally <code class="language-plaintext highlighter-rouge">items.data[0]</code>), and credits still weren’t granted afterward—burning an entire session on what should have been a straightforward API integration.</p>
</blockquote>

<p>The fix: “Providing more explicit specifications upfront (e.g., expected Stripe object structures, exact DB constraints) or asking Claude to outline its approach before coding could reduce these costly iteration rounds.”</p>

<p><strong>Wrong Solution Architecture Before User Correction</strong></p>

<p>Claude repeatedly proposed overly complex solutions—wrapper abstractions, symlink chains, /tmp hacks—when I already knew a simpler approach existed. Example from my genomics pipeline work:</p>

<blockquote>
  <p>For GCS file localization, Claude cycled through symlinks, gsutil in-script, and dirname approaches before you steered it to the straightforward FUSE mount solution you wanted from the start.</p>
</blockquote>

<p>The insight: State architectural preferences upfront. “Use FUSE mounts, not gsutil” or “no wrapper classes” would have short-circuited wasted iteration.</p>

<p><strong>Aggressive Edits Without Verification</strong></p>

<p>Claude often applied changes too eagerly—editing documents without pausing for review, adding more content than requested, or committing unwanted files. I interrupted tool calls twice during document editing because Claude was applying changes without letting me review intermediate results.</p>

<p>The data quantified this: 20 instances of “wrong approach” friction and 19 instances of “buggy code,” but only 3 rejected actions and 2 user interruptions. I prefer to let Claude attempt things and then correct rather than pre-empt.</p>

<h2 id="the-improvements">The Improvements</h2>

<p>Then came the actionable section—not generic advice, but specific refinements drawn from actual sessions:</p>

<h3 id="1-exact-prompts-for-claudemd">1. Exact Prompts for CLAUDE.md</h3>

<p>The report generated precise additions for my project documentation based on pain points from our sessions:</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="rouge-code"><pre>When working with Stripe API objects, always access nested data through 
<span class="sb">`items.data[0]`</span> for subscription details. Never use <span class="sb">`.get()`</span> or direct 
attribute access on top-level Stripe objects for period dates.
</pre></td></tr></tbody></table></code></pre></div></div>

<p>This wasn’t general strategy. It was extracted from three failed attempts at the same integration pattern.</p>

<h3 id="2-lifecycle-hooks">2. Lifecycle Hooks</h3>

<p>The report suggested hooks to auto-run linting or tests after edits, catching buggy first-pass implementations before I see them. Example:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="rouge-code"><pre><span class="na">hooks</span><span class="pi">:</span>
  <span class="na">post-edit</span><span class="pi">:</span>
    <span class="c1"># Auto-run linter after Python edits</span>
    <span class="pi">-</span> <span class="na">if</span><span class="pi">:</span> <span class="s2">"</span><span class="s">file.endsWith('.py')"</span>
      <span class="na">run</span><span class="pi">:</span> <span class="s2">"</span><span class="s">ruff</span><span class="nv"> </span><span class="s">check</span><span class="nv"> </span><span class="s">{file}"</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<h3 id="3-better-prompt-patterns">3. Better Prompt Patterns</h3>

<p>Example prompts showing how to add context up front, guide iterative fixes, and spawn parallel agents. One particularly useful pattern for my genomics pipeline work:</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="rouge-code"><pre>Context: KIR-mapper pipeline using dsub + Google Batch. Prefer FUSE mounts 
over gsutil. No wrapper abstractions around dsub functions.

Task: Debug why pipeline stalls at sample 199/200...
</pre></td></tr></tbody></table></code></pre></div></div>

<h2 id="the-meta-learning">The Meta-Learning</h2>

<p>A tool that accelerates my data processing is also teaching me to use it better. The meta-analysis of my own workflow is as valuable as the code it helped me write.</p>

<p>The report revealed patterns I hadn’t yet noticed:</p>
<ul>
  <li>My median response time to Claude is 98.6 seconds (average 316.3s)—I’m actively engaged, not fire-and-forget</li>
  <li>24 of 82 sessions were “Iterative Refinement” type—my most common workflow</li>
  <li>Multi-file changes were Claude’s most helpful capability (20 instances), followed by correct code edits (13)</li>
</ul>

<h2 id="what-im-implementing">What I’m Implementing</h2>

<p>Based on these diagnostics:</p>

<ol>
  <li><strong>Enhanced CLAUDE.md files</strong> with tech stack preferences, data processing constraints, and architectural patterns drawn from actual failure modes</li>
  <li><strong>Pre-commit hooks</strong> for automatic linting and basic test runs</li>
  <li><strong>More upfront context</strong> in prompts—especially example data structures for API integrations and explicit architectural constraints for infrastructure work</li>
</ol>

<p>The report also suggested I’m ready for more ambitious workflows as models improve:</p>
<blockquote>
  <p>Your most painful sessions—multi-round pipeline debugging and Stripe integration fixes—should become single-prompt workflows where Claude autonomously iterates against your test suite until everything passes.</p>
</blockquote>

<p>That’s the future I’m building toward: test coverage comprehensive enough that future Claude can self-correct without intervention.</p>

<h2 id="try-it-yourself">Try It Yourself</h2>

<p>If you’re using Claude Code, type <code class="language-plaintext highlighter-rouge">/insights</code> and see what workflow patterns it identifies. The Scott Cunningham method is also worth trying: ask for a slide deck, your place in the taxonomy of how people use Claude Code, and a translation of engineering-speak into concepts you understand.</p>

<p>Still slightly offended it called me an ICU doc though…</p>]]></content><author><name>Bennett Waxse, MD, PhD</name></author><category term="Blog" /><category term="Bioinformatics" /><category term="Tools" /><category term="Claude Code" /><category term="Bioinformatics" /><category term="AI" /><category term="LLM" /><category term="Research Informatics" /><category term="Productivity" /><summary type="html"><![CDATA[Claude Code analyzed 1,295 messages across 325 files and diagnosed my workflow bottlenecks with surgical precision.]]></summary></entry><entry><title type="html">Using Claude Code for EHR Informatics: Getting Started, Part I</title><link href="https://bennettwaxse.com/blog/bioinformatics/getting%20started/claude-code-ehr-informatics-part-1/" rel="alternate" type="text/html" title="Using Claude Code for EHR Informatics: Getting Started, Part I" /><published>2026-01-29T17:15:00-05:00</published><updated>2026-01-29T17:15:00-05:00</updated><id>https://bennettwaxse.com/blog/bioinformatics/getting%20started/claude-code-ehr-informatics-part-1</id><content type="html" xml:base="https://bennettwaxse.com/blog/bioinformatics/getting%20started/claude-code-ehr-informatics-part-1/"><![CDATA[<p>I’ve been using Claude Code to create cohorts, diagnose bugs, and really accelerate my research workflows. Before getting into the fun stuff though, I want to share how I set up my environment. If you’re new to Claude Code or curious about what goes into my CLAUDE.md files, this post is for you.</p>

<div class="notice--info">

<p><strong>Coming up in this series:</strong></p>
<ul>
  <li><a href="/blog/bioinformatics/getting%20started/claude-code-ehr-informatics-part-2/">Part II: Skills, plugins, and MCP servers</a></li>
  <li>Part III: Building a cohort in <em>All of Us</em> with Claude Code</li>
</ul>

</div>

<p>If you’d like to follow along, the GitHub repo is <a href="https://github.com/bwaxse/research-code">research-code</a>. All files referenced here are available for you to use.</p>

<h2 id="why-context-matters">Why Context Matters</h2>

<p><em>But Bennett, can’t I just give Claude a snippet of code and ask it to do something?</em> Sometimes that works well, but the key is context.</p>

<p>If LLM context windows are big enough to consider an entire notebook or project, why limit them? Don’t hide how you derived a cohort when that derivation can inform what you’re asking the model to do next.</p>

<p>At the same time, how do you provide information efficiently without bloating the context window? The <code class="language-plaintext highlighter-rouge">.ipynb</code> format includes structural metadata that wastes tokens. References to irrelevant methods add noise without value, and you don’t want to hit your usage limit prematurely.</p>

<p>Claude Code already enables users to review an entire codebase and offer bug fixes or deep understanding. Why not do the same for research? Here’s how I approach it.</p>

<h2 id="1-centralize-reference-information">1. Centralize Reference Information</h2>

<p><img src="/assets/images/posts/2026-01-29-claude-code-0-starting-files.png" alt="Starting file structure" class="align-right" style="max-width: 300px;" /></p>

<p>The models have been trained on what the internet knows about <em>All of Us</em>, which is both good and bad. They have a general sense of what’s available in the biobank, but for rapidly evolving systems, that information may already be outdated.</p>

<p>For my work, I asked Claude what would be helpful to know. Publicly available data dictionaries exist online, but what details matter most? I downloaded the data dictionary and pared it down to essentials: Table Name, Field Name, OMOP CDM Standard or Custom Field, Description, and Field Type. Claude then generated SQL to provide counts for each table in the current CDR. The result: 9 <code class="language-plaintext highlighter-rouge">.tsv</code> files of schema structure and data counts, adhering to the &lt;20 censoring required by <em>All of Us</em> (<code class="language-plaintext highlighter-rouge">_reference/all_of_us_tables/</code>).</p>

<h2 id="2-centralize-phecode-lists">2. Centralize Phecode Lists</h2>

<p><a href="https://phewascatalog.org/">Phecodes</a> are manually curated groupings of ICD codes designed to capture clinically meaningful concepts for research. Lisa Bastarache has an excellent <a href="https://pubmed.ncbi.nlm.nih.gov/34465180/">review</a> if you want to learn more. Not every phecode is perfect, but if clinicians and researchers have already worked to group billing codes into meaningful categories, why not start there? We all know about the reproducibility crisis in biomedical research, and random unvalidated ICD groupings aren’t going to help.</p>

<p>This is why I include CSV files mapping phecodes to ICD codes (<code class="language-plaintext highlighter-rouge">_reference/phecode/</code>).</p>

<h2 id="3-identify-trusted-queries">3. Identify Trusted Queries</h2>

<p>I also brought in a trusted ICD query from my labmate, Tam Tran. He’s the force behind <a href="https://github.com/nhgritctran/PheTK">PheTK</a>, a fast Python library for Phenome Wide Association Studies (PheWAS) that includes Cox regression for incident analyses, dsub integration for distributed computing, and more.</p>

<p>In developing PheTK, Tam discovered some peculiarities worth noting:</p>

<div class="notice--warning">

<p><strong>V-code ambiguity:</strong> While most ICD-9 and ICD-10 codes differ structurally, V codes exist in both. V01-V09 means “Persons With Potential Health Hazards Related To Communicable Diseases” in ICD-9-CM but “Pedestrian injured in transport accident” in ICD-10-CM. His query always joins the concept table and matches <code class="language-plaintext highlighter-rouge">vocabulary_id</code>.</p>

</div>

<p><strong>Dual identifiers:</strong> ICD codes appear as both <code class="language-plaintext highlighter-rouge">concept_id</code> and <code class="language-plaintext highlighter-rouge">concept_code</code> (e.g., 1567285 and A40 for Streptococcal sepsis), and not always both present. His query checks for both.</p>

<p>By keeping these queries in <code class="language-plaintext highlighter-rouge">_reference/trusted_queries/</code>, I carry forward these lessons in my code.</p>

<h2 id="4-collect-your-code">4. Collect Your Code</h2>

<p>Finally, the important part—your actual code. I work in <em>All of Us</em>, a cloud-based environment where researchers cannot download individual data. To export notebooks safely, I created <code class="language-plaintext highlighter-rouge">upload_safe.sh</code>, a script that syncs with my GitHub repo, copies selected notebooks, and strips them of output, bucket paths, and secrets. This way, Claude only sees code—not data.</p>

<div class="notice--danger">

<p><strong>This was critical for me.</strong> In Claude Code, it’s easy to share something unintentionally. I never want to share data I don’t have permission to share.</p>

</div>

<p>In this public repo, I’ve included a few published projects:</p>
<ul>
  <li><strong>genomics</strong>: Genomic analysis pipelines for <em>All of Us</em> genetic data</li>
  <li><strong>hpv</strong>: HPV research cohorts using OMOP CDR data</li>
  <li><strong>nc3</strong>: N3C RECOVER Long COVID phenotyping algorithm adapted for <em>All of Us</em></li>
</ul>

<h2 id="orienting-claude-code">Orienting Claude Code</h2>

<p>Once all files are in place, you’re ready to initialize. Open terminal, navigate to your working folder, and type <code class="language-plaintext highlighter-rouge">claude</code>.</p>

<p><img src="/assets/images/posts/2026-01-29-claude-code-1-opening.png" alt="Opening Claude Code" /></p>

<p><img src="/assets/images/posts/2026-01-29-claude-code-2-trust.png" alt="Trust files prompt" /></p>

<p>Type <code class="language-plaintext highlighter-rouge">/init</code> to create a <code class="language-plaintext highlighter-rouge">CLAUDE.md</code> file for the repository root.</p>

<p><img src="/assets/images/posts/2026-01-29-claude-code-3-init.png" alt="Running /init" /></p>

<p><img src="/assets/images/posts/2026-01-29-claude-code-4-init-output.png" alt="Init analyzing codebase" /></p>

<p><code class="language-plaintext highlighter-rouge">CLAUDE.md</code> files define coding standards, review criteria, user preferences, and project-specific rules. Each time you start Claude Code, this document loads into context and guides your session. Anthropic recommends keeping it focused and concise, updating as the repository evolves.</p>

<p>Claude does the heavy lifting. It produces a solid first draft:</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
</pre></td><td class="rouge-code"><pre>This repository contains computational methods for research informatics
and genomics research, primarily focused on analyzing data from the NIH's
<span class="gs">**_All of Us_ Research Program**</span>. Code examples are shared from
bennettwaxse.com and include analysis tools for:
<span class="p">
-</span> <span class="gs">**Genomics**</span>: Variant analysis, ancestry inference, PCA workflows using PLINK2 and Hail
<span class="p">-</span> <span class="gs">**HPV Research**</span>: Cohort construction and analysis
<span class="p">-</span> <span class="gs">**N3C/RECOVER**</span>: Long COVID phenotyping algorithms adapted from PySpark to Python/pandas
<span class="p">-</span> <span class="gs">**Reference Materials**</span>: _All of Us_ table schemas, PheCode mappings, and Verily Workbench helpers
</pre></td></tr></tbody></table></code></pre></div></div>

<p>It included other useful sections: Platform, Key Environment Variables, Code Structure, Project Organization, Data Handling, Common Libraries, and Workflow Patterns.</p>

<p><img src="/assets/images/posts/2026-01-29-claude-code-5-claudemd.png" alt="CLAUDE.md created" /></p>

<h2 id="editing-claudemd">Editing CLAUDE.md</h2>

<p>The first draft captured the general intent, but I edited line by line. I added a <strong>Development Philosophy</strong> section describing the importance of:</p>

<ul>
  <li><strong>Data validation throughout processing</strong>: Check frequently that data transforms as expected</li>
  <li><strong>Code clarity over abstraction</strong>: These scripts serve a dual purpose—analysis and teaching EHR/genomics informatics. I want trainees to see exactly how processes work.</li>
</ul>

<p><img src="/assets/images/posts/2026-01-29-claude-code-6-revision.png" alt="Revising CLAUDE.md" /></p>

<p>I also added a section about <em>All of Us</em> rules, including count censoring for all values &lt;20 to minimize problematic reporting.</p>

<p>Some things Claude got wrong. One section referenced code for setting environment variables in the new Verily Workbench. Claude assumed this was required for every project, so I clarified it’s only for new Verily Workbench notebooks, a work-in-progress for <em>All of Us</em>.</p>

<p><strong>As with everything AI-generated, my mantra: review it line by line.</strong> Then iterate. Claude refined my verbose writing and kept things focused.</p>

<h2 id="creating-claudeignore">Creating .claudeignore</h2>

<p>Next, I asked Claude to create subdirectory CLAUDE.md files. These only load when Claude works in those directories—a good way to reveal specifics only when relevant. Remember, it’s all about efficient context usage.</p>

<p><img src="/assets/images/posts/2026-01-29-claude-code-7-claudeignore.png" alt="Creating .claudeignore" /></p>

<p>I created a <code class="language-plaintext highlighter-rouge">.claudeignore</code> file to:</p>
<ul>
  <li>Prevent reading <code class="language-plaintext highlighter-rouge">.ipynb</code> files (redundant with <code class="language-plaintext highlighter-rouge">.py</code> scripts)</li>
  <li>Block files containing secrets like <code class="language-plaintext highlighter-rouge">.env</code></li>
  <li>Exclude raw data files (<code class="language-plaintext highlighter-rouge">.bam</code>, <code class="language-plaintext highlighter-rouge">.fastq</code>)—Claude isn’t analyzing actual data</li>
  <li>Skip Python cache, build artifacts, and IDE files</li>
</ul>

<p>I did keep <code class="language-plaintext highlighter-rouge">.csv</code> and <code class="language-plaintext highlighter-rouge">.tsv</code> off this list since I share mappings and references with Claude.</p>

<p><img src="/assets/images/posts/2026-01-29-claude-code-8-readme.png" alt="Claude asking about README" /></p>

<p>Claude also asked clarifying questions. It offered to expand the README and suggested a few other improvements.</p>

<h2 id="the-result">The Result</h2>

<p>What we have is a meaningfully structured folder with source material that mirrors my typical workflows and resources. CLAUDE.md files orient Claude to the project, and <code class="language-plaintext highlighter-rouge">.claudeignore</code> tells it what to avoid.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
</pre></td><td class="rouge-code"><pre>research-code/
├── CLAUDE.md              # Root instructions
├── .claudeignore          # Files to skip
├── _reference/
│   ├── CLAUDE.md          # Reference-specific context
│   ├── all_of_us_tables/  # CDR schemas
│   ├── phecode/           # Phecode mappings
│   └── trusted_queries/   # Vetted SQL patterns
├── genomics/
│   ├── CLAUDE.md          # Genomics-specific context
│   └── *.py               # Analysis scripts
├── hpv/
│   └── ...
└── nc3/
    └── ...
</pre></td></tr></tbody></table></code></pre></div></div>

<h2 id="whats-next">What’s Next</h2>

<p>If you’re excited to see what Claude can do with this foundation, you’re in the right place. Next time, I’ll introduce skills, plugins, and MCP servers—components that extend what Claude Code can do!</p>

<p>Soon, you’ll see how Claude Code is supercharging my data analysis in <em>All of Us</em>. If you’re already using Claude Code, I’d love to learn how you’re using it too!</p>

<p>Until then, sciencespeed.</p>

<p><img src="/assets/images/posts/2026-01-29-claude-code-9-bye.png" alt="Setup complete" /></p>]]></content><author><name>Bennett Waxse, MD, PhD</name></author><category term="Blog" /><category term="Bioinformatics" /><category term="Getting Started" /><category term="Claude Code" /><category term="Bioinformatics" /><category term="All of Us" /><category term="Tutorial" /><category term="AI" /><category term="LLM" /><category term="Research Informatics" /><category term="OMOP" /><category term="Getting Started" /><summary type="html"><![CDATA[How I set up Claude Code for research informatics work in All of Us—from CLAUDE.md files to .claudeignore—and why context matters more than clever prompts.]]></summary></entry><entry><title type="html">Using Claude Code for Genomic Pipeline Optimization: A KIR Mapper Case Study</title><link href="https://bennettwaxse.com/blog/bioinformatics/dsub-scale-up/" rel="alternate" type="text/html" title="Using Claude Code for Genomic Pipeline Optimization: A KIR Mapper Case Study" /><published>2025-01-23T14:00:00-05:00</published><updated>2025-01-23T14:00:00-05:00</updated><id>https://bennettwaxse.com/blog/bioinformatics/dsub-scale-up</id><content type="html" xml:base="https://bennettwaxse.com/blog/bioinformatics/dsub-scale-up/"><![CDATA[<p>As a physician-scientist with clinical and lab experience, I know how to ask answerable questions, structure experiments, and evaluate results carefully. But I didn’t have the computing skills to do everything I wanted. Two years ago, large language models changed that. By pasting code into Claude.ai, I could generate entire classes and SQL queries that did exactly what I needed. It was inefficient at times—for instance, Claude didn’t know much about polars initially—but it worked, and I knew this was the work I wanted to do.</p>

<p>At the end of last year, I discovered Claude Code. I was blown away. I built a command-line tool to efficiently evaluate primary literature using LLMs, which eventually became <a href="https://scholia.fyi">Scholia.fyi</a>—releasing in the next few weeks. When I returned to work in January, it was clear Claude Code would also take my informatics to the next level.</p>

<p>This is the first in a series of posts about using Claude Code for data analysis and scientific work. I found plenty of resources on building <a href="https://www.youtube.com/@avtharai">apps</a> or <a href="https://www.youtube.com/@PatrickOakleyEllis">websites</a> with <a href="https://www.youtube.com/watch?v=gv0WHhKelSE">Claude Code</a>, but not much on bioinformatics and science. My hope is to create a community of Claude Code-literate scientists who keep humans in the loop and amplify what we can do.</p>

<p>This first example involves scaling a KIR gene mapping pipeline from 40 samples to 200,000+. Beyond optimizing machine types (Claude Code helped me find a 30% cheaper parallelized setup), I consistently hit a wall around 100-200 samples. The tool worked fine at small scale but failed predictably in production. This is a story about debugging something outside my expertise and learning where Claude Code genuinely helps.</p>

<!--more-->

<h2 id="the-setup">The Setup</h2>

<p>I’m working with <a href="https://github.com/erickcastelli/kir-mapper">kir-mapper</a>, which aligns KIR genes from whole-genome sequencing. The pipeline orchestrates several tools:</p>

<ol>
  <li>GATK PrintReads converts CRAM to BAM, extracting the KIR region</li>
  <li>kir-mapper map aligns reads to KIR references</li>
  <li>(Later) ncopy, genotype, haplotype call variants</li>
</ol>

<p>The strategy was simple: run 5 samples in parallel using GNU parallel, process them in 100-200 sample batches. This should be efficient. Instead, the pipeline stalled reliably at sample 99/100 or 199/200.</p>

<h2 id="the-pattern">The Pattern</h2>

<p>The failure was consistent but puzzling:</p>

<ul>
  <li>40-sample run: succeeded</li>
  <li>64-sample run: succeeded</li>
  <li>100-sample run: stalled at 99/100</li>
  <li>199-sample run: stalled at 198/199</li>
  <li>200-sample run: stalled at 199/200</li>
</ul>

<p>Always at or near the last sample. This suggested something systematic—not random failure, not resource exhaustion, but something about how the parallel queue itself was behaving.</p>

<p>I first analyzed memory and disk usage throughout the runs with Claude Code. We never exhausted resources. That ruled out the obvious culprits. Beyond that, I was stuck. File conflicts in parallel execution was a hypothesis, but debugging GNU parallel internals isn’t my area, and I’d never written low-level code to know where to look.</p>

<h2 id="where-claude-code-helped">Where Claude Code Helped</h2>

<p>I gave Claude the problem, access to the kir-mapper source code, and access to my base and parallelization scripts. It did something I couldn’t have done efficiently: read through thousands of lines of C++ and identify where temporary files were created.</p>

<p>Here’s what it found in map_dna.cpp (lines 1473-1474):</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre><span class="n">string</span> <span class="n">v_sai1</span> <span class="o">=</span> <span class="n">v_output</span> <span class="o">+</span> <span class="s">"tmp1.sai "</span><span class="p">;</span>
<span class="n">string</span> <span class="n">v_sai2</span> <span class="o">=</span> <span class="n">v_output</span> <span class="o">+</span> <span class="s">"tmp2.sai "</span><span class="p">;</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Compare with how SAM files are named (line 1134):</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre><span class="n">string</span> <span class="n">outsamtmp</span> <span class="o">=</span> <span class="n">v_output</span> <span class="o">+</span> <span class="n">v_sample</span> <span class="o">+</span> <span class="o">*</span><span class="n">i</span> <span class="o">+</span> <span class="s">".tmp.sam"</span><span class="p">;</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>The difference is stark: SAM files include the sample name. The temporary BWA index files do not. When five samples run in parallel, they all write to the same <code class="language-plaintext highlighter-rouge">tmp1.sai</code> and <code class="language-plaintext highlighter-rouge">tmp2.sai</code> files. File locks accumulate. After a certain number of jobs queue up, GNU parallel hits an internal limit and stops accepting new ones.</p>

<p>This explained an interesting observation: GATK worked fine, even though it also runs in parallel. The difference isn’t parallelization strategy—both GATK and kir-mapper use GNU parallel with 5 samples at a time. The difference is how they handle temporary files. GATK apparently names its temp files in a way that avoids collisions (or writes them elsewhere), whereas kir-mapper creates those shared <code class="language-plaintext highlighter-rouge">tmp1.sai</code> and <code class="language-plaintext highlighter-rouge">tmp2.sai</code> files that all samples contend for. When file locks accumulate on those shared temp files, the GNU parallel queue hits its limit and stops accepting new jobs.</p>

<p>Could I have figured this out alone? Eventually, probably. But it would have taken substantially longer. Could I have abandoned parallelization and forfeited 30% savings, also yes. The key was that Claude could quickly parse unfamiliar C++ and identify the pattern.</p>

<h2 id="two-fixes">Two Fixes</h2>

<p>With the diagnosis in hand, I implemented two complementary fixes:</p>

<p><strong>Fix 1: Per-sample output directories.</strong> Instead of all samples writing to a shared output directory:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
</pre></td><td class="rouge-code"><pre><span class="nb">local </span><span class="nv">sample_output</span><span class="o">=</span><span class="s2">"./kir_output/</span><span class="k">${</span><span class="nv">person_id</span><span class="k">}</span><span class="s2">"</span>
<span class="nb">mkdir</span> <span class="nt">-p</span> <span class="s2">"</span><span class="nv">$sample_output</span><span class="s2">"</span>

kir-mapper map <span class="se">\</span>
    <span class="nt">-bam</span> <span class="s2">"</span><span class="k">${</span><span class="nv">person_id</span><span class="k">}</span><span class="s2">_chr19.bam"</span> <span class="se">\</span>
    <span class="nt">-sample</span> <span class="s2">"</span><span class="k">${</span><span class="nv">person_id</span><span class="k">}</span><span class="s2">"</span> <span class="se">\</span>
    <span class="nt">-output</span> <span class="s2">"</span><span class="nv">$sample_output</span><span class="s2">"</span> <span class="se">\</span>
    <span class="nt">-threads</span> <span class="nv">$THREADS_PER_SAMPLE</span> 2&gt;&amp;1
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Each sample now gets its own subdirectory. The <code class="language-plaintext highlighter-rouge">tmp1.sai</code> and <code class="language-plaintext highlighter-rouge">tmp2.sai</code> files for sample A don’t collide with those for sample B.</p>

<p><strong>Fix 2: 25-sample sub-batches.</strong> Even with per-sample directories, sending 100 jobs through GNU parallel at once keeps you near that queue limit. So instead of:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre><span class="nb">seq </span>1 100 | parallel <span class="nt">-j</span> 5 <span class="s1">'run_kirmap {}'</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>We process in sub-batches:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="rouge-code"><pre><span class="nv">BATCH_SIZE</span><span class="o">=</span>25

<span class="k">while</span> <span class="o">[</span> <span class="nv">$idx</span> <span class="nt">-le</span> <span class="nv">$NUM_SAMPLES</span> <span class="o">]</span><span class="p">;</span> <span class="k">do
    </span><span class="nv">BATCH_END</span><span class="o">=</span><span class="k">$((</span>idx <span class="o">+</span> BATCH_SIZE <span class="o">-</span> <span class="m">1</span><span class="k">))</span>
    <span class="k">if</span> <span class="o">[</span> <span class="nv">$BATCH_END</span> <span class="nt">-gt</span> <span class="nv">$NUM_SAMPLES</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
        </span><span class="nv">BATCH_END</span><span class="o">=</span><span class="nv">$NUM_SAMPLES</span>
    <span class="k">fi

    </span><span class="nb">seq</span> <span class="nv">$idx</span> <span class="nv">$BATCH_END</span> | parallel <span class="nt">-j</span> 5 <span class="s1">'run_kirmap_with_env {}'</span>
    <span class="nv">idx</span><span class="o">=</span><span class="k">$((</span>BATCH_END <span class="o">+</span> <span class="m">1</span><span class="k">))</span>
<span class="k">done</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>25 jobs stays well below the queue limit and within the ballpark of successful runs.</p>

<h2 id="why-this-matters">Why This Matters</h2>

<p>Many of us build bioinformatics pipelines with deep domain expertise but without systems-level knowledge. You understand your science (genomics, epidemiology, medicine) but orchestrating multiple tools, managing parallel job queues, and debugging temporary file handling is a different domain entirely.</p>

<p>Claude Code was useful here specifically because:</p>

<ol>
  <li>It could read large codebases quickly and extract relevant sections</li>
  <li>It could reason about file I/O patterns and concurrency issues</li>
  <li>It helped me iterate through multiple approaches and find the best</li>
</ol>

<p>What it didn’t do: it didn’t magically know the answer. I had to provide the logs, frame hypotheses, and verify its analysis myself. The diagnosis made sense conceptually, so I checked the source code to confirm. That verification step is critical, but we’re already used to this in science.</p>

<h2 id="when-its-actually-useful">When It’s Actually Useful</h2>

<p>If you’re considering Claude Code for bioinformatics, the wins come from:</p>

<ul>
  <li><strong>Code analysis</strong>: Reading unfamiliar source code to find issues</li>
  <li><strong>Prototyping approaches</strong>: Testing multiple strategies before committing</li>
  <li><strong>Architecture</strong>: Designing cleaner pipeline structure (mine went from 6 stages to 3)</li>
  <li><strong>Getting through the tedium</strong>: I’ve also used it to construct new cohorts, an easy but tedious process</li>
</ul>

<p>The important caveats:</p>

<ul>
  <li>You still need domain knowledge to evaluate suggestions</li>
  <li>Verify diagnoses yourself, especially for system behavior</li>
  <li>Use it as a thought partner, not a substitute for critical thinking</li>
  <li>If something doesn’t make sense, push back and ask for more detail</li>
</ul>

<h2 id="the-result">The Result</h2>

<p>The pipeline now processes all samples in a batch successfully, maintains parallelization efficiency, and scales to multiple ancestries. More importantly, I understand why it failed and what the fixes address. That understanding is what matters for extending the code later, and approaching similar tasks in the future.</p>

<hr />

<p>If you’re building scientific pipelines and hit problems outside your expertise, try Claude Code. Frame questions clearly, give it access to relevant code, and validate its analysis. The combination of clear problem statements, access to source code, and iterative refinement creates a solid workflow for technical debugging.</p>

<p>I’ll be writing more about how I’m using Claude Code for data analysis work. If you have questions or want to share how you’re using it, I’d be interested to hear.</p>]]></content><author><name>Bennett Waxse, MD, PhD</name></author><category term="Blog" /><category term="Bioinformatics" /><category term="Claude Code" /><category term="Bioinformatics" /><category term="Pipeline Optimization" /><category term="KIR Mapping" /><category term="Parallel Processing" /><category term="Debugging" /><summary type="html"><![CDATA[As a physician-scientist with clinical and lab experience, I know how to ask answerable questions, structure experiments, and evaluate results carefully. But I didn’t have the computing skills to do everything I wanted. Two years ago, large language models changed that. By pasting code into Claude.ai, I could generate entire classes and SQL queries that did exactly what I needed. It was inefficient at times—for instance, Claude didn’t know much about polars initially—but it worked, and I knew this was the work I wanted to do.]]></summary></entry></feed>