Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to download large Clinvar tabular table #182

Open
manburst opened this issue Jul 20, 2022 · 2 comments
Open

How to download large Clinvar tabular table #182

manburst opened this issue Jul 20, 2022 · 2 comments

Comments

@manburst
Copy link

Dear Sir
Now I am working project to search a lot of variant data in Clinvar which is time consuming because it seem that Clinvar limit 20 queries per search.
For example
I want to get 2 specific variant data so I search like this
(2[chr] AND 47168774[chrpos37] AND TTC7A[gene] AND p.Leu32Val[varname] AND c.94C>G[varname]) OR (3[chr] AND 9974774[chrpos37] AND IL17RC[gene] AND p.Leu625Val[varname] AND c.1873C>G[varname])
I will get result like this
image
you will notice download which can export tabular data (or table file like in the image)
Can Rentrez package download the tabular table file and could you please give example of command to fetch the data?
And how limit of number of query for search with command?
Thanks in advance
JK

@allenbaron
Copy link

The package maintainer isn't currently available to reply. You may be able to find what you need in the Entrez Utilities documentation. The Clinvar docs do say that Clinvar is available via E-Utilties.

Apologies that I don't have more time to assist myself.

@vestalgd
Copy link

q1 <- "2[chr] AND 47168774[chrpos37] AND TTC7A[gene] AND p.Leu32Val[varname] AND c.94C>G[varname]) OR (3[chr] AND 9974774[chrpos37] AND IL17RC[gene] AND p.Leu625Val[varname] AND c.1873C>G[varname]"

#search for gene or topic of interest
search <- entrez_search(db = "clinvar",
term = q1,
use_history = TRUE)
#by adding the retmode = "xml", it will put out 9999 Clinvar variants at maximum; if you have more than 9999, figure out a way to chunk them up.

summary <- entrez_summary(db = "clinvar", web_history = search$web_history, retmode = "xml")
summary
summary_cv <- extract_from_esummary(summary, c("obj_type", "accession", "accession_version", "title", "variation_set",
"trait_set", "supporting_submissions","clinical_significance","record_status",
"gene_sort", "chr_sort", "location_sort", "variation_set_name", "variation_set_id",
"genes", "protein_change", "fda_recognized_database"))

#file output is a matrix; this code transposes the data (t()) and turns that results into a tibble; from here you can unnest_wider or unnest_longer to pull out the list columns

cv_extract_final <- summary_cv %>%
t() %>%
as_tibble(rownames = NA) %>%
rownames_to_column(var="ID")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants