Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameters optimal to produce COI exact variants. #2015

Open
aimirza opened this issue Sep 10, 2024 · 4 comments
Open

Parameters optimal to produce COI exact variants. #2015

aimirza opened this issue Sep 10, 2024 · 4 comments

Comments

@aimirza
Copy link

aimirza commented Sep 10, 2024

Hi,

What parameters do you suggest changing when working on Illumina short-amplicon reads of COI genes?
For example, would you:

  • SelfConsist = TRUE instead of proving 16 × 41 Transition probabilities for COI (if it exists)? Or do both!
  • Leave BAND_SIZE alone or slightly decrease since COI gene has low indels? If yes, to what value?
  • Increase KDIST_CUTOFF because the COI gene has more variability between species than 16S rRNA? If yes, what value do you recommend?
@benjjneb
Copy link
Owner

I wouldn't recommend any parameter changes. Parameter settings should be appropriate for the sequencing technology, as the errors from PCR/sequencing is what DADA2 is modeling. They don't change between amplicon targets (outside of the filterAndTrim stage anyway).

@aimirza
Copy link
Author

aimirza commented Sep 10, 2024

Not even the alignment parameters, which is done before error modelling? I am worried that important sequences not aligning to each other because of more variability in the COI gene compared to the 16S. For example, in the paper it says

Both heuristics [ BAND_SIZE and KDIST_CUTOFF] can be disabled by the user, and the default values should be re-examined if the algorithm is applied to genetic regions with significantly different characteristics, such as the indel-rich ITS region

@benjjneb
Copy link
Owner

I am worried that important sequences not aligning to each other because of more variability in the COI gene compared to the 16S

That's fine. If they don't align because they are so different, then they will be split into different ASVs as they should be.

the default values should be re-examined if the algorithm is applied to genetic regions with significantly different characteristics, such as the indel-rich ITS region

We now realize that isn't the right advice. The alignment parameters should be reconsidered when the sequencing tech has different characteristics (e.g. high indels).

@aimirza
Copy link
Author

aimirza commented Sep 10, 2024

Got it, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants