Linked sequences by colinvwood · Pull Request #190 · qiime2/q2-dada2

colinvwood · 2026-03-30T23:26:39Z

Description

Allow denoise-paired to rescue and return unmerged sequences

AI Disclosure

NO AI USED.
AI USED.

AI Usage Details

Co-written with Codex

…tion changes

colinvwood · 2026-03-30T23:28:12Z

~~needs qiime2/q2-types#388~~

colinvwood · 2026-03-30T23:28:54Z

@ebolyen

colinvwood · 2026-03-30T23:39:26Z

disclaimer: the run_dada.R diff is 99% codex and the test_denoise.py diff is about 50/50 codex/de moi

colinvwood · 2026-04-03T00:16:04Z

One outstanding decision is how we want to restructure the stats table to accommodate these changes.

Oddant1

@colinvwood's comments came from the review. Needed to look it over with him to make sure I knew what the R was doing.

colinvwood · 2026-06-12T00:16:55Z

related to #129, #93, #189, qiime2/q2-feature-classifier#219

colinvwood · 2026-06-15T22:12:23Z

~~Needs qiime2/q2cli#401~~

ebolyen

Some grouped suggestions.

Co-authored-by: Evan Bolyen <ebolyen@gmail.com>

nbokulich · 2026-06-26T10:29:30Z

hey drive-by comment: I am testing this with some real data locally to see how it compares to single and paired, looking great! I like how the output stats report the number of concatenated seqs that are rescued, this is really useful. Could we also have a sum total of merged non-chimeric + concatenated, for simplicity we could just call this column "output" (and an additional column of output / input). Very cool to finally have this functionality in q2-dada2, thanks @colinvwood !

ebolyen

I think this looks good, but I did get lost in one spot.

ebolyen · 2026-06-26T10:35:04Z

+
+    def _to_sequence(sequence, metadata):
+        if retain_unmerged:
+            return skbio.Sequence(sequence, metadata=metadata)


I am 99% sure this will still work even with the new LinkedDNA class.

You mean the __init__ signature?

ebolyen · 2026-06-26T10:43:26Z

+      duplicated(unmerged.id.map$temporary, fromLast=TRUE)
+  ]
+  if(length(ambiguous.ids) > 0){
+    errQuit("Unable to uniquely map retained unmerged sequences from the temporary DADA2-compatible representation to linked sequences.", status=1)


I am quite confused how this would be possible. Can you explain a little more in a comment?

Say we have two linked sequences like so:

>seq1 ACN TG >seq2 AC NTG

The the unmerged.id.map becomes (imagine we're using a single N separator instead of ten):

ACNNTG -> ACN TG ACNNTG -> AC NTG

Now it's ambiguous what ACNNTG maps to.

colinvwood · 2026-06-29T17:33:34Z

Needs qiime2/q2-types#397. (Need to merge and use those changes).

colinvwood · 2026-06-29T17:52:58Z

hey drive-by comment: I am testing this with some real data locally to see how it compares to single and paired, looking great! I like how the output stats report the number of concatenated seqs that are rescued, this is really useful. Could we also have a sum total of merged non-chimeric + concatenated, for simplicity we could just call this column "output" (and an additional column of output / input). Very cool to finally have this functionality in q2-dada2, thanks @colinvwood !

Thank you for the feedback, I agree the stats table needs polishing. Currently, your "output" column is actually just the "non-chimeric" column. (This is confusing and needs to change.)

Do you think it's useful to see chimera filtering stats stratified by merged/concatenated? If not we could just drop the last two columns (non-chimeric concatenated & its %). That would leave us with only two new columns compared to the pre-concatenation changes.

If we do want to see the stratified output we need to add a merged-only & its % columns for chimera filtering. This would leave us with six new columns compared to the pre-concatenation version, which feels like it's starting to make the stats table pretty dense. But I can also see the extra stats being useful.

What do you all think @ebolyen, @cherman2?

nbokulich · 2026-06-30T04:32:06Z

Do you think it's useful to see chimera filtering stats stratified by merged/concatenated? If not we could just drop the last two columns (non-chimeric concatenated & its %).

I don't think we need chimera filtering stats on the concatenated and merged reads separately... could just lump these together into one. I agree, otherwise this table becomes very dense.

colinvwood added 3 commits March 30, 2026 15:53

dada2 wrapper changes for linked sequence support & denoise_paired ac…

833c591

…tion changes

retain_unmerged parameter, type map

364d2bf

tests

14ac4e4

q2d2 added this to QIIME 2 - Triage 🚑 Mar 30, 2026

github-project-automation Bot moved this to Needs Triage in QIIME 2 - Triage 🚑 Mar 30, 2026

colinvwood removed this from QIIME 2 - Triage 🚑 Mar 30, 2026

colinvwood added this to 2026.4 🌱 Mar 30, 2026

github-project-automation Bot moved this to Backlog in 2026.4 🌱 Mar 30, 2026

colinvwood moved this from Backlog to In Development in 2026.4 🌱 Mar 30, 2026

colinvwood added the stat:blocked This cannot be resolved until something else has changed. label Mar 30, 2026

colinvwood self-assigned this Mar 31, 2026

Oddant1 moved this from In Development to In Review in 2026.4 🌱 Apr 3, 2026

Oddant1 assigned Oddant1 and unassigned colinvwood Apr 3, 2026

Oddant1 self-requested a review April 3, 2026 17:33