Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mmseq2 with ncbi nr data base taxonamy issue #856

Open
feixiang1209 opened this issue Jun 27, 2024 · 5 comments
Open

mmseq2 with ncbi nr data base taxonamy issue #856

feixiang1209 opened this issue Jun 27, 2024 · 5 comments

Comments

@feixiang1209
Copy link

Dear mmseq2 team

I am trying to create taxonomy database for ncbi nr data base. First I downloaded nr.fa, taxdump and prot.accession2taxid. Then I ran the below commands

mmseqs createdb nr.fa nrDB

mmseqs createtaxdb nrDB tmp --ncbi-tax-dump ./taxdump/ --tax-mapping-file ./prot.accession2taxid

After a few hours, the run completed without error. However, file nrDB_mapping is empty. Could you please advise where I did wrongly?

Thanks a lot

@AndrazMarinc
Copy link

AndrazMarinc commented Jul 8, 2024

Probably you've solved it by now, but still. Have you tried using mmseqs nrtotaxmapping after the createtaxdb? I saw your comment in the other issue and I think nrtotaxmapping is the solution. Not 100% sure though.

@feixiang1209
Copy link
Author

Thanks for your reply, I tried to run mmseqs nrtotaxmapping after “mmseqs createtaxdb nrDB tmp --ncbi-tax-dump ./taxdump/ --tax-mapping-file ./prot.accession2taxid” using command "mmseqs nrtotaxmapping accession2taxid/prot.accession2taxid nrDB output.tsv". The output.tsv is as below, should I replace nrDB_mapping with this file?

0 1047168
1 185202
2 412384
3 3072323
4 150340
5 1573704
6 2517205
7 286
8 1307
9 2635419
10 34041
11 2212474
12 1487711
13 1871050

@AndrazMarinc
Copy link

AndrazMarinc commented Jul 10, 2024

I think so. I'm looking at the code the mmseqs devs linked in the previous issue and it seems that's what their script does. You'll basically create the ${OUTDB}_mapping file by renaming your tsv.

${MMSEQS}" nrtotaxmapping "${TMP_PATH}/pdb.accession2taxid" "${TMP_PATH}/prot.accession2taxid" "${OUTDB}" "${OUTDB}_mapping" ${THREADS_PAR}

@milot-mirdita
Copy link
Member

Thanks a lot @AndrazMarinc!

That looks correct! You still have to call the createtaxdb after you replace the _mapping file to create the _taxonomy file that contains all the taxdump information.

@feixiang1209
Copy link
Author

Thanks a lot @AndrazMarinc and @milot-mirdita . It worked. Also I found another solution, the file "prot.accession2taxid" download from NCBI needs modification. Only two columns (accession.version and taxid) are needed to run createtaxdb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants