Title: | An R Interface to 'Phylotastic' Web Services |
---|---|
Description: | This wraps the 'Phylotastic' services APIs described on Web Services at <www.phylotastic.org>. The main use case is to return a phylogenetic tree for a set of species, but the services also include ways to extract species names from web pages, perform taxonomic name resolution, retrieve a list of all descendant species of a taxon, find images of a species, and more. |
Authors: | Brian O'Meara [aut, cre], Abu Saleh Md Tayeen [aut], Luna L. Sanchez Reyes [aut] |
Maintainer: | Brian O'Meara <[email protected]> |
License: | GPL-2 |
Version: | 0.0.8.1 |
Built: | 2024-11-01 03:25:12 UTC |
Source: | https://github.com/phylotastic/rphylotastic |
This wraps the 'Phylotastic' services APIs described on Web Services at <www.phylotastic.org>. The main use case is to return a phylogenetic tree for a set of species, but the services also include ways to extract species names from web pages, perform taxonomic name resolution, retrieve a list of all descendant species of a taxon, find images of a species, and more.
See 'vignette("rphylotastic", package = "rphylotastic")' for an overview of the package and its utilities.
Maintainer: Brian O'Meara [email protected]
Authors:
Abu Saleh Md Tayeen [email protected]
Luna L. Sanchez Reyes [email protected]
Useful links:
Report bugs at https://github.com/phylotastic/rphylotastic/issues
Add arc labels to tips of a phylogeny; works for non-monophyletic groups and single tip lineages.
arclabels(phy, tips, ...) ## Default S3 method: arclabels( phy = NULL, tips, text, plot_singletons = TRUE, ln.offset = 1.02, lab.offset = 1.06, cex = 1, orientation = "horizontal", ... )
arclabels(phy, tips, ...) ## Default S3 method: arclabels( phy = NULL, tips, text, plot_singletons = TRUE, ln.offset = 1.02, lab.offset = 1.06, cex = 1, orientation = "horizontal", ... )
phy |
An object of class phylo. |
tips |
A character vector (or a list ?) with the names of the tips that belong to the clade or group. If multiple groups are going to be plotted, tips must be given in the form of a list. |
... |
optional arguments for |
text |
A character vector indicating the desired text to label the arcs. |
plot_singletons |
Boolean. If TRUE (default), it will add arcs (and labels) to single tip lineages too. If FALSE, no arc or labels will be plotted over that tip.. |
ln.offset |
line offset (as a function of total tree height) for |
lab.offset |
label offset for |
cex |
character expansion factor. |
orientation |
orientation of the text. Can be |
NULL
This uploads a file (a PDF, Microsoft Word document, plain text file, etc.) and extracts all scientific names from it. For example, you can input a PDF of a scientific article and it will return all the scientific names in that article.
file_get_scientific_names(file_name, search_engine = 0, above_species = FALSE)
file_get_scientific_names(file_name, search_engine = 0, above_species = FALSE)
file_name |
The file path and name to extract names from |
search_engine |
1 to use TaxonFinder, 2 to use NetiNeti, 0 to use both |
above_species |
Boolean. Default to FALSE. If TRUE it will only return scientific names above the species level. |
It requires that curl is installed on your system.
A vector of scientific names
https://github.com/phylotastic/phylo_services_docs/tree/master/ServiceDescription
This uploads a file (a PDF, Microsoft Word document, plain text file, etc.) and extracts all scientific names from it. For example, you can input a PDF of a scientific article and it will return all the scientific names in that article.
file_get_scientific_names_from_GNRD( file_name, search_engine = 0, above_species = FALSE )
file_get_scientific_names_from_GNRD( file_name, search_engine = 0, above_species = FALSE )
file_name |
The file path and name to extract names from |
search_engine |
1 to use TaxonFinder, 2 to use NetiNeti, 0 to use both |
above_species |
Boolean. Default to FALSE. If TRUE it will only return scientific names above the species level. |
It requires that curl is installed on your system.
A vector of scientific names
https://github.com/phylotastic/phylo_services_docs/tree/master/ServiceDescription
Flowering plants families from Open Tree Taxonomy
flower_plant_fams
flower_plant_fams
A character vector
flower_plant_fams <- datelife::get_ott_children(ott_ids = 99252, ott_rank = "family") flower_plant_fams <- flower_plant_fams[[1]] flower_plant_fams <- rownames(flower_plant_fams)[as.character(flower_plant_fams[,"rank"]) == "family"] usethis::use_data(flower_plant_fams, overwrite = TRUE)
Luna L. Sanchez-Reyes [email protected] Brian O'Meara [email protected]
https://tree.opentreeoflife.org/about/taxonomy-version/ott3.0
Get Phylotastic base URL return The URL for the phylotastic server
get_base_url()
get_base_url()
Return Species List server URL
get_list_server_url()
get_list_server_url()
Get existing list/lists of species
get_species_from_list( userid, access_token, list_id, verbose = FALSE, content = TRUE )
get_species_from_list( userid, access_token, list_id, verbose = FALSE, content = TRUE )
userid |
A valid gmail address of the user |
access_token |
Access token of the gmail address |
list_id |
An integer id of the list to retrieve |
verbose |
(optional)By default FALSE and shows minimal meta-data of the list. |
content |
(optional)By default TRUE and shows the species collection of the list |
An existing list with metadata and content based on parameters
https://github.com/phylotastic/phylo_services_docs/tree/master/ServiceDescription
# This gives you the syntax, but since the access token expires after one hour, # this particular example will not work. ## Not run: userid = "[email protected]" access_token = "ya29..zQLmLjbyujJjwV6RVSM2sy-mkeaKu-9" list_id = 12 verbose = TRUE content = FALSE get_species_from_list(userid, access_token, list_id, verbose, content) ## End(Not run)
# This gives you the syntax, but since the access token expires after one hour, # this particular example will not work. ## Not run: userid = "[email protected]" access_token = "ya29..zQLmLjbyujJjwV6RVSM2sy-mkeaKu-9" list_id = 12 verbose = TRUE content = FALSE get_species_from_list(userid, access_token, list_id, verbose, content) ## End(Not run)
Insert list of species
insert_species_in_list(userid, listObj)
insert_species_in_list(userid, listObj)
userid |
A valid gmail address of the user |
listObj |
A list object |
A list with the id of the new list created
https://github.com/phylotastic/phylo_services_docs/tree/master/ServiceDescription
userid = "[email protected]" listObj = list(list_extra_info="", list_description="A sublist on the bird species added", list_keywords=c("bird", "endangered species", "Everglades"), list_curator="HD Laughinghouse", list_origin="webapp", list_curation_date="02-24-2016", list_source="des", list_focal_clade="Aves", list_title="Bird Species List",list_author=c("Bass", "O. & Cunningham", "R."), list_date_published="01-01-2017", is_list_public=TRUE, list_species=list(list(family="",scientific_name="Aix sponsa", scientific_name_authorship="", vernacular_name="Wood Duck", phylum="",nomenclature_code="ICZN",order="Anseriformes",class=""), list(family="",scientific_name="Anas strepera", scientific_name_authorship="", vernacular_name="Gadwall", phylum="",nomenclature_code="ICZN", order="Anseriformes",class="") )) insert_species_in_list(userid, listObj)
userid = "[email protected]" listObj = list(list_extra_info="", list_description="A sublist on the bird species added", list_keywords=c("bird", "endangered species", "Everglades"), list_curator="HD Laughinghouse", list_origin="webapp", list_curation_date="02-24-2016", list_source="des", list_focal_clade="Aves", list_title="Bird Species List",list_author=c("Bass", "O. & Cunningham", "R."), list_date_published="01-01-2017", is_list_public=TRUE, list_species=list(list(family="",scientific_name="Aix sponsa", scientific_name_authorship="", vernacular_name="Wood Duck", phylum="",nomenclature_code="ICZN",order="Anseriformes",class=""), list(family="",scientific_name="Anas strepera", scientific_name_authorship="", vernacular_name="Gadwall", phylum="",nomenclature_code="ICZN", order="Anseriformes",class="") )) insert_species_in_list(userid, listObj)
Remove an existing list of species
remove_species_from_list(userid, access_token, list_id)
remove_species_from_list(userid, access_token, list_id)
userid |
A valid gmail address of the user |
access_token |
Access token of the gmail address |
list_id |
An integer id of the list to retrieve |
A list with the id of the list removed
https://github.com/phylotastic/phylo_services_docs/tree/master/ServiceDescription
# This gives you the syntax, but since the access token expires after one hour, # this particular example will not work. ## Not run: userid = "[email protected]" access_token = "ya29..zQLmLjbyujJjwV6RVSM2sy-mkeaKu-9" list_id = 12 remove_species_from_list(userid, access_token, list_id) ## End(Not run)
# This gives you the syntax, but since the access token expires after one hour, # this particular example will not work. ## Not run: userid = "[email protected]" access_token = "ya29..zQLmLjbyujJjwV6RVSM2sy-mkeaKu-9" list_id = 12 remove_species_from_list(userid, access_token, list_id) ## End(Not run)
Replace a list of species
replace_species_in_list(userid, access_token, list_id, speciesObj)
replace_species_in_list(userid, access_token, list_id, speciesObj)
userid |
A valid gmail address of the user |
access_token |
Access token of the gmail address |
list_id |
An integer id of the list to be modified |
speciesObj |
A species object to replace with |
A list with the old species and new species list
https://github.com/phylotastic/phylo_services_docs/tree/master/ServiceDescription
# This gives you the syntax, but since the access token expires after one hour, # this particular example will not work. ## Not run: userid = "[email protected]" access_token = "ya29..zQLmLjbyujJjwV6RVSM2sy-mkeaKu-9" list_id = 12 speciesObj = list( list(family="",scientific_name="Aix sponsa",scientific_name_authorship="", vernacular_name="Wood Duck",phylum="",nomenclature_code="ICZN",order="Anseriformes",class="")) replace_species_in_list(userid, access_token, list_id, speciesObj) ## End(Not run)
# This gives you the syntax, but since the access token expires after one hour, # this particular example will not work. ## Not run: userid = "[email protected]" access_token = "ya29..zQLmLjbyujJjwV6RVSM2sy-mkeaKu-9" list_id = 12 speciesObj = list( list(family="",scientific_name="Aix sponsa",scientific_name_authorship="", vernacular_name="Wood Duck",phylum="",nomenclature_code="ICZN",order="Anseriformes",class="")) replace_species_in_list(userid, access_token, list_id, speciesObj) ## End(Not run)
Get image metadata of a list of species
species_get_image_data(species)
species_get_image_data(species)
species |
A vector of names |
A data frame of image metadata(image urls, license info etc.) of species
https://github.com/phylotastic/phylo_services_docs/tree/master/ServiceDescription or https://eol.org/api/
Get information from Encyclopedia of Life of a list of species
species_get_info(species)
species_get_info(species)
species |
A vector of names |
A data frame of species information from Encyclopedia of Life. Type of information available varies among species.
https://github.com/phylotastic/phylo_services_docs/tree/master/ServiceDescription or https://eol.org/api/
Convert common names to scientific names.
taxa_common_to_scientific(taxa, service = "NCBI", multiple = FALSE)
taxa_common_to_scientific(taxa, service = "NCBI", multiple = FALSE)
taxa |
A character vector of common names. Binomials can be spaced with underscore or white space. |
service |
Which service to use: NCBI, ITIS, or TROPICOS |
multiple |
If TRUE, then the service will return multiple matches (if available) for each common name in the input list. |
A vector of scientific names. Output order may not correspond to input order.
taxize package for name resolution in general and its sci2comm function.
taxa <- c("blue_whale", "swordfish", "killer whale") scientific <- taxa_common_to_scientific(taxa) print(scientific)
taxa <- c("blue_whale", "swordfish", "killer whale") scientific <- taxa_common_to_scientific(taxa) print(scientific)
Get OToL induced subtree
taxa_get_otol_tree(taxa)
taxa_get_otol_tree(taxa)
taxa |
The vector of names, already resolved to match OToL taxa |
A phylo object
https://github.com/phylotastic/phylo_services_docs/tree/master/ServiceDescription or the rotl package, another interface to Open Tree of Life
taxa <- c("Crabronidae", "Ophiocordyceps", "Megalyridae", "Formica polyctena", "Tetramorium caespitum", "Pseudomyrmex", "Carebara diversa", "Formicinae") phy <- taxa_get_otol_tree(taxa) plot(phy)
taxa <- c("Crabronidae", "Ophiocordyceps", "Megalyridae", "Formica polyctena", "Tetramorium caespitum", "Pseudomyrmex", "Carebara diversa", "Formicinae") phy <- taxa_get_otol_tree(taxa) plot(phy)
Get phylomatic subtree
taxa_get_phylomatic_tree(taxa)
taxa_get_phylomatic_tree(taxa)
taxa |
The vector of names, already resolved |
A phylo object
https://github.com/phylotastic/phylo_services_docs/tree/master/ServiceDescription or the interface of phylomatic http://phylodiversity.net/phylomatic/
phy <- taxa_get_phylomatic_tree(c("Panthera leo", "Panthera onca", "Panthera tigris", "Panthera uncia")) plot(phy)
phy <- taxa_get_phylomatic_tree(c("Panthera leo", "Panthera onca", "Panthera tigris", "Panthera uncia")) plot(phy)
Resolve Scientific Names with GNR TNRS
taxa_resolve_names_with_gnr(taxa)
taxa_resolve_names_with_gnr(taxa)
taxa |
The vector of names |
Mispelled or incorrect names will be dropped.
A vector of correct names. THE ORDER MAY NOT CORRESPOND TO YOUR INPUT ORDER.
https://github.com/phylotastic/phylo_services_docs/tree/master/ServiceDescription or the rotl package, another interface to Open Tree of Life, or the taxize package for name resolution in general.
Resolve Scientific Names with Open Tree TNRS
taxa_resolve_names_with_otol(taxa)
taxa_resolve_names_with_otol(taxa)
taxa |
The vector of names |
A vector of corrected names. THE ORDER MAY NOT CORRESPOND TO YOUR INPUT ORDER.
https://github.com/phylotastic/phylo_services_docs/tree/master/ServiceDescription or the rotl package, another interface to Open Tree of Life, or the taxize package for name resolution in general.
my.species.raw <- c("Formica polyctena", "Formica exsectoides", "Farmica pacifica") my.species.corrected <- taxa_resolve_names_with_otol(my.species.raw) print(my.species.corrected)
my.species.raw <- c("Formica polyctena", "Formica exsectoides", "Farmica pacifica") my.species.corrected <- taxa_resolve_names_with_otol(my.species.raw) print(my.species.corrected)
Keep the first part of the binomial from a vector of taxon names that includes species binomial names
taxa_toss_binomials(taxa)
taxa_toss_binomials(taxa)
taxa |
A character vector of taxon names. |
A character vector of lineage names above the species level.
https://github.com/phylotastic/phylo_services_docs/tree/master/ServiceDescription or the rotl package, another interface to Open Tree of Life
taxa_toss_binomials("Vulpes_vulpes")
taxa_toss_binomials("Vulpes_vulpes")
Get all species from a taxon from Open Tree of Life taxonomy.
taxon_get_species(taxon, filters = c("environmental", "sp\\.", "cf\\."))
taxon_get_species(taxon, filters = c("environmental", "sp\\.", "cf\\."))
taxon |
A character vector of length 1. Specify the taxon name to get all species for. If vector is longer than 1, it will only take the first element and ignore all other names. |
filters |
A character vector of strings to exclude. |
A character vector of species names.
https://github.com/phylotastic/phylo_services_docs/tree/master/ServiceDescription or the rotl package, another interface to Open Tree of Life
print(taxon_get_species("Vulpes"))
print(taxon_get_species("Vulpes"))
Get all species filtered by country from a taxon
taxon_get_species_from_country( taxon, country, filters = c("environmental", "sp\\.", "cf\\.") )
taxon_get_species_from_country( taxon, country, filters = c("environmental", "sp\\.", "cf\\.") )
taxon |
A character vector of length 1. Specify the taxon name to get a subset of species that are established in a particular country |
country |
A country name where species of the input taxon are established. |
filters |
A character vector of strings to exclude |
A vector of names
https://github.com/phylotastic/phylo_services_docs/tree/master/ServiceDescription or the rotl package, another interface to Open Tree of Life
Get all species that have genome sequence in NCBI from a taxon
taxon_get_species_with_genome( taxon, filters = c("environmental", "sp\\.", "cf\\.") )
taxon_get_species_with_genome( taxon, filters = c("environmental", "sp\\.", "cf\\.") )
taxon |
A character vector of length 1. Specify the taxon name to get a subset of species having genome sequence |
filters |
A character vector of strings to exclude |
A vector of names
https://github.com/phylotastic/phylo_services_docs/tree/master/ServiceDescription or the rotl package, another interface to Open Tree of Life
Separate dark from known taxa on another database
taxon_separate_dark_taxa_using_genbank( taxon, filters = c("environmental", "sp\\.", "cf\\.", "uncultured", "unidentified", " clone", " enrichment"), verbose = TRUE, sleep = 0 )
taxon_separate_dark_taxa_using_genbank( taxon, filters = c("environmental", "sp\\.", "cf\\.", "uncultured", "unidentified", " clone", " enrichment"), verbose = TRUE, sleep = 0 )
taxon |
A taxon to get all species for |
filters |
A character vector of strings to exclude |
verbose |
Update on how many are done |
sleep |
How many seconds to sleep between calls (on top of rentrez's defaults) |
A list containing a vector of dark names, a vector of known names, and fraction.dark
Separate dark from known taxa on OpenTree of Life
taxon_separate_dark_taxa_using_otol( taxon, filters = c("environmental", "sp\\.", "cf\\.", "uncultured", "unidentified", " clone", " enrichment") )
taxon_separate_dark_taxa_using_otol( taxon, filters = c("environmental", "sp\\.", "cf\\.", "uncultured", "unidentified", " clone", " enrichment") )
taxon |
A taxon to get all species for |
filters |
A character vector of strings to exclude |
A list containing a vector of dark names, a vector of known names, and fraction.dark
https://github.com/phylotastic/phylo_services_docs/tree/master/ServiceDescription or the rotl package, another interface to Open Tree of Life
Flowering plants families from Open Tree Taxonomy
terrestrial_plant_orders
terrestrial_plant_orders
A character vector
terrestrial_plant_orders <- datelife::get_ott_children(ott_ids = 56610, ott_rank = "order") terrestrial_plant_orders <- terrestrial_plant_orders[[1]] terrestrial_plant_orders <- rownames(terrestrial_plant_orders)[as.character(terrestrial_plant_orders[,"rank"]) == "order"] usethis::use_data(terrestrial_plant_orders, overwrite = TRUE)
Luna L. Sanchez-Reyes [email protected] Brian O'Meara [email protected]
https://tree.opentreeoflife.org/about/taxonomy-version/ott3.0
This takes a string of text and extracts any scientific names in the text. Other words in the text are ignored.
text_get_scientific_names(text)
text_get_scientific_names(text)
text |
The text string to extract names from |
A data.frame of scientific names and other data from GNRD
text <- "Formica polyctena is a species of European red wood ant in the genus Formica. The pavement ant, Tetramorium caespitum is an ant native to Europe." print(text_get_scientific_names(text))
text <- "Formica polyctena is a species of European red wood ant in the genus Formica. The pavement ant, Tetramorium caespitum is an ant native to Europe." print(text_get_scientific_names(text))
A common use case is having a set of traits for a set of species and wanting to get a tree for those species. This involves resolving to the same taxonomy, getting a tree for those species, and then (optionally) pruning the traits and tree to the same set of taxa (this is optional: there are approaches to make up traits or phylogenetic placement in the absence of information).
traits_get_tree( traits, tnrs_source = "otol", tree_source = "otol", prune = TRUE, summary_format = "phylo_biggest", ... )
traits_get_tree( traits, tnrs_source = "otol", tree_source = "otol", prune = TRUE, summary_format = "phylo_biggest", ... )
traits |
Data.frame with species names as rownames |
tnrs_source |
Source for taxonomic name resolution; options are "otol" and "gnr". If set to NULL, assumes names are fine as is |
tree_source |
Source for tree; options are "otol", "phylomatic", "datelife". |
prune |
If TRUE, delete taxa to matching sets only |
summary_format |
What format to return from datelife |
... |
Other options to pass to datelife::datelife_search |
For sources of trees, besides the standard ones (Open Tree of Life, Phylomatic), datelife is also an option. For this, you will have had to install the datelife package (which is only suggested, not required, by this code). It will return the chronogram in the tree store with the most overlap to your set of taxa by default, but you can change options with summary_format; see ?datelife::datelife_search for more info
list with a phy and a traits object, both pruned to the same taxon set, as well as citation information for the sources of the taxonomic resolution and phylogeny (also cite this package and, if you use it, datelife)
Update metadata of a list of species
update_species_in_list(userid, access_token, list_id, listObj)
update_species_in_list(userid, access_token, list_id, listObj)
userid |
A valid gmail address of the user |
access_token |
Access token of the gmail address |
list_id |
An integer id of the list to be modified |
listObj |
A list object to update with |
A list with modified list metadata
https://github.com/phylotastic/phylo_services_docs/tree/master/ServiceDescription
# This gives you the syntax, but since the access token expires after one hour, # this particular example will not work. ## Not run: userid = "[email protected]" access_token = "ya29..zQLmLjbyujJjwV6RVSM2sy-mkeaKu-9" list_id = 12 listObj = list(list_description="A sublist on the bird species", list_keywords=c("bird","Everglades")) update_species_in_list(userid, access_token, list_id, listObj) ## End(Not run)
# This gives you the syntax, but since the access token expires after one hour, # this particular example will not work. ## Not run: userid = "[email protected]" access_token = "ya29..zQLmLjbyujJjwV6RVSM2sy-mkeaKu-9" list_id = 12 listObj = list(list_description="A sublist on the bird species", list_keywords=c("bird","Everglades")) update_species_in_list(userid, access_token, list_id, listObj) ## End(Not run)
Function to pull scientific names from web pages
url_get_scientific_names(URL, search_engine = 0, above_species = FALSE)
url_get_scientific_names(URL, search_engine = 0, above_species = FALSE)
URL |
The URL to extract names from. Can be a pdf url. |
search_engine |
1 to use TaxonFinder, 2 to use NetiNeti, 0 to use both |
above_species |
Boolean. Default to FALSE. If TRUE it will only return scientific names above the species level. |
A vector of scientific names. It returns unique matches.
https://github.com/phylotastic/phylo_services_docs/tree/master/ServiceDescription
# get scientific names from a wikipedia web page: url_get_scientific_names(URL = "https://en.wikipedia.org/wiki/Plain_pigeon") # get scientific names from a pdf URL: url_get_scientific_names(URL = "http://darwin-online.org.uk/converted/pdf/1897_Insectivorous_F1229.pdf")
# get scientific names from a wikipedia web page: url_get_scientific_names(URL = "https://en.wikipedia.org/wiki/Plain_pigeon") # get scientific names from a pdf URL: url_get_scientific_names(URL = "http://darwin-online.org.uk/converted/pdf/1897_Insectivorous_F1229.pdf")