Linked sequences#190
Conversation
|
|
|
disclaimer: the |
|
One outstanding decision is how we want to restructure the stats table to accommodate these changes. |
Oddant1
left a comment
There was a problem hiding this comment.
@colinvwood's comments came from the review. Needed to look it over with him to make sure I knew what the R was doing.
|
related to #129, #93, #189, qiime2/q2-feature-classifier#219 |
|
|
Co-authored-by: Evan Bolyen <ebolyen@gmail.com>
Co-authored-by: Evan Bolyen <ebolyen@gmail.com>
Co-authored-by: Evan Bolyen <ebolyen@gmail.com>
|
hey drive-by comment: I am testing this with some real data locally to see how it compares to single and paired, looking great! I like how the output stats report the number of concatenated seqs that are rescued, this is really useful. Could we also have a sum total of merged non-chimeric + concatenated, for simplicity we could just call this column "output" (and an additional column of output / input). Very cool to finally have this functionality in q2-dada2, thanks @colinvwood ! |
ebolyen
left a comment
There was a problem hiding this comment.
I think this looks good, but I did get lost in one spot.
|
|
||
| def _to_sequence(sequence, metadata): | ||
| if retain_unmerged: | ||
| return skbio.Sequence(sequence, metadata=metadata) |
There was a problem hiding this comment.
I am 99% sure this will still work even with the new LinkedDNA class.
There was a problem hiding this comment.
You mean the __init__ signature?
| duplicated(unmerged.id.map$temporary, fromLast=TRUE) | ||
| ] | ||
| if(length(ambiguous.ids) > 0){ | ||
| errQuit("Unable to uniquely map retained unmerged sequences from the temporary DADA2-compatible representation to linked sequences.", status=1) |
There was a problem hiding this comment.
I am quite confused how this would be possible. Can you explain a little more in a comment?
There was a problem hiding this comment.
Say we have two linked sequences like so:
>seq1
ACN TG
>seq2
AC NTG
The the unmerged.id.map becomes (imagine we're using a single N separator instead of ten):
ACNNTG -> ACN TG
ACNNTG -> AC NTG
Now it's ambiguous what ACNNTG maps to.
|
Needs qiime2/q2-types#397. (Need to merge and use those changes). |
Thank you for the feedback, I agree the stats table needs polishing. Currently, your "output" column is actually just the "non-chimeric" column. (This is confusing and needs to change.) Do you think it's useful to see chimera filtering stats stratified by merged/concatenated? If not we could just drop the last two columns (non-chimeric concatenated & its %). That would leave us with only two new columns compared to the pre-concatenation changes. If we do want to see the stratified output we need to add a merged-only & its % columns for chimera filtering. This would leave us with six new columns compared to the pre-concatenation version, which feels like it's starting to make the stats table pretty dense. But I can also see the extra stats being useful. |
I don't think we need chimera filtering stats on the concatenated and merged reads separately... could just lump these together into one. I agree, otherwise this table becomes very dense. |
Description
Allow
denoise-pairedto rescue and return unmerged sequencesAI Disclosure
AI Usage Details
Co-written with Codex