Skip to content

Linked sequences#190

Open
colinvwood wants to merge 10 commits into
qiime2:devfrom
colinvwood:linked-sequences
Open

Linked sequences#190
colinvwood wants to merge 10 commits into
qiime2:devfrom
colinvwood:linked-sequences

Conversation

@colinvwood

@colinvwood colinvwood commented Mar 30, 2026

Copy link
Copy Markdown
Contributor

Description

Allow denoise-paired to rescue and return unmerged sequences

AI Disclosure

  • NO AI USED.
  • AI USED.

AI Usage Details

Co-written with Codex

@github-project-automation github-project-automation Bot moved this to Backlog in 2026.4 🌱 Mar 30, 2026
@colinvwood colinvwood moved this from Backlog to In Development in 2026.4 🌱 Mar 30, 2026
@colinvwood

colinvwood commented Mar 30, 2026

Copy link
Copy Markdown
Contributor Author

needs qiime2/q2-types#388

@colinvwood colinvwood added the stat:blocked This cannot be resolved until something else has changed. label Mar 30, 2026
@colinvwood

Copy link
Copy Markdown
Contributor Author

@ebolyen

@colinvwood

Copy link
Copy Markdown
Contributor Author

disclaimer: the run_dada.R diff is 99% codex and the test_denoise.py diff is about 50/50 codex/de moi

@colinvwood colinvwood self-assigned this Mar 31, 2026
@colinvwood

Copy link
Copy Markdown
Contributor Author

One outstanding decision is how we want to restructure the stats table to accommodate these changes.

@Oddant1 Oddant1 moved this from In Development to In Review in 2026.4 🌱 Apr 3, 2026
@Oddant1 Oddant1 assigned Oddant1 and unassigned colinvwood Apr 3, 2026
@Oddant1 Oddant1 self-requested a review April 3, 2026 17:33
Comment thread q2_dada2/assets/run_dada.R Outdated
Comment thread q2_dada2/assets/run_dada.R
Comment thread q2_dada2/assets/run_dada.R
Comment thread q2_dada2/assets/run_dada.R Outdated
Comment thread q2_dada2/assets/run_dada.R

@Oddant1 Oddant1 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colinvwood's comments came from the review. Needed to look it over with him to make sure I knew what the R was doing.

@Oddant1 Oddant1 assigned colinvwood and unassigned Oddant1 Apr 6, 2026
@Oddant1 Oddant1 moved this from In Review to In Development in 2026.4 🌱 Apr 6, 2026
@lizgehret lizgehret removed this from 2026.4 🌱 Apr 23, 2026
@github-project-automation github-project-automation Bot moved this to Backlog in 2026.7 🐐 Apr 23, 2026
@lizgehret lizgehret moved this from Backlog to In Development in 2026.7 🐐 Apr 23, 2026
@colinvwood colinvwood removed the stat:blocked This cannot be resolved until something else has changed. label Jun 11, 2026
@colinvwood

colinvwood commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

related to #129, #93, #189, qiime2/q2-feature-classifier#219

@colinvwood colinvwood removed this from 2026.7 🐐 Jun 12, 2026
@colinvwood

colinvwood commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Needs qiime2/q2cli#401

@ebolyen ebolyen assigned ebolyen and unassigned colinvwood Jun 16, 2026

@ebolyen ebolyen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some grouped suggestions.

Comment thread q2_dada2/plugin_setup.py Outdated
Comment thread q2_dada2/plugin_setup.py Outdated
Comment thread q2_dada2/plugin_setup.py Outdated
colinvwood and others added 3 commits June 17, 2026 09:46
Co-authored-by: Evan Bolyen <ebolyen@gmail.com>
Co-authored-by: Evan Bolyen <ebolyen@gmail.com>
Co-authored-by: Evan Bolyen <ebolyen@gmail.com>
@nbokulich

Copy link
Copy Markdown
Member

hey drive-by comment: I am testing this with some real data locally to see how it compares to single and paired, looking great! I like how the output stats report the number of concatenated seqs that are rescued, this is really useful. Could we also have a sum total of merged non-chimeric + concatenated, for simplicity we could just call this column "output" (and an additional column of output / input). Very cool to finally have this functionality in q2-dada2, thanks @colinvwood !

@ebolyen ebolyen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good, but I did get lost in one spot.

Comment thread q2_dada2/_denoise.py

def _to_sequence(sequence, metadata):
if retain_unmerged:
return skbio.Sequence(sequence, metadata=metadata)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am 99% sure this will still work even with the new LinkedDNA class.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the __init__ signature?

duplicated(unmerged.id.map$temporary, fromLast=TRUE)
]
if(length(ambiguous.ids) > 0){
errQuit("Unable to uniquely map retained unmerged sequences from the temporary DADA2-compatible representation to linked sequences.", status=1)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am quite confused how this would be possible. Can you explain a little more in a comment?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say we have two linked sequences like so:

>seq1
ACN TG
>seq2
AC NTG

The the unmerged.id.map becomes (imagine we're using a single N separator instead of ten):

ACNNTG -> ACN TG
ACNNTG -> AC NTG

Now it's ambiguous what ACNNTG maps to.

@ebolyen ebolyen assigned colinvwood and unassigned ebolyen Jun 26, 2026
@colinvwood

Copy link
Copy Markdown
Contributor Author

Needs qiime2/q2-types#397. (Need to merge and use those changes).

@colinvwood

Copy link
Copy Markdown
Contributor Author

hey drive-by comment: I am testing this with some real data locally to see how it compares to single and paired, looking great! I like how the output stats report the number of concatenated seqs that are rescued, this is really useful. Could we also have a sum total of merged non-chimeric + concatenated, for simplicity we could just call this column "output" (and an additional column of output / input). Very cool to finally have this functionality in q2-dada2, thanks @colinvwood !

Thank you for the feedback, I agree the stats table needs polishing. Currently, your "output" column is actually just the "non-chimeric" column. (This is confusing and needs to change.)

Do you think it's useful to see chimera filtering stats stratified by merged/concatenated? If not we could just drop the last two columns (non-chimeric concatenated & its %). That would leave us with only two new columns compared to the pre-concatenation changes.

If we do want to see the stratified output we need to add a merged-only & its % columns for chimera filtering. This would leave us with six new columns compared to the pre-concatenation version, which feels like it's starting to make the stats table pretty dense. But I can also see the extra stats being useful.

What do you all think @ebolyen, @cherman2?

@colinvwood colinvwood added the stat:DO-NOT-MERGE Please do not merge this until this label has been removed. label Jun 29, 2026
@nbokulich

Copy link
Copy Markdown
Member

Do you think it's useful to see chimera filtering stats stratified by merged/concatenated? If not we could just drop the last two columns (non-chimeric concatenated & its %).

I don't think we need chimera filtering stats on the concatenated and merged reads separately... could just lump these together into one. I agree, otherwise this table becomes very dense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stat:DO-NOT-MERGE Please do not merge this until this label has been removed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants