Minor release to fix failing tests on CRAN.
Zachary Foster is now the maintainer of taxize
.
tnrs()
and tnrs_sources()
functions are defunct. The service has been unreliable for years now, and AFAICT is down for good. Associated changes have been made throughout the package, eg. resolve()
no longer has an option for tnrs, etc. (#841) (#842)tol_resolve()
test following new version of rotl
package on cran (#816)class2tree()
function documentation regarding how the function works in more detail (#849) (#851)worms_downstream()
, children(..., db="worms")
and downstream(..., db="worms")
: now paginate automatically for the user to get all results, and allow parameter marine_only
to be passed through the high level functions children()
/downstream()
down to worrms::wm_children()
where it toggles whether marine only results are returned (#848) thanks @oharac !ncbi_downstream()
(which cascades up to downstream(..., db="ncbi")
): an unneeded line of code was removed that was also throwing an error in some cases (#850)worms_downstream()
, children(..., db="worms")
and downstream(..., db="worms")
: added ranks epifamily
and infraphylum
. In addition, when a rank is missing in data returned from WORMS, we'll change the missing rank to "no rank" (#847)worms_downstream()
docs: make it clear that users can use parameters passed down to worrms::wm_children()
(#831)get_pow_()
docs: add section on rate limits, what are rate limits for KEW POW and a user facing resolution (#836)rank_ref
) in the package: biotype, forma specialis, isolate, pathogroup, series, serogroup, serotype, and strain - queries from downstream()
and other functions that rely on relative rank information should not fail anymore when they contain these 8 rank names (#830)rank_ref_zoo
reference data.frame specfically for zoological rank types - right now only used for WORMS. main difference is section/subsection in rank_ref_zoo
are nested between the order and family, whereas in rank_ref
(used for all other data sources) section/subsection are on the genus rank level (#833)class2tree()
. Problem sorted out now (#835) (#838) (#839) (#840)sci
will always only accept a scientific name; com
accdepts only a common name; id
accepts a taxonomic identifier; sci_com
accepts or scientific or common name; sci_id
accepts a scientific name or taxonomic identifier. In most cases we have retained the old parameter name and you can still use it but you get a warning with information. In a future package version the replaced parameters will be removed completely. See https://github.com/ropensci/taxize/issues/723 for tables covering the functions affected, their old and new parameter names (#723) (#829)apg_families
and apg_order
) to v14 (from July 2017) (#827)worrms_downstream()
: three rank names were not accounted for in our internal set of ranks (suptertribe, subterclass, parvorder) (#824)classification.gbifid
was returning a duplicate last taxon, i.e., the last two rows in the output data.frame were the same. fixed. (#825)lowest_common()
due to problem in classification.uid()
when a taxon UID was merged into another taxon (#828)ELEMENT_GLOBAL.2.
part is redundant for every identifier (#823)rankagg()
and tax_agg()
fixes: rankagg()
examples now conditional on availability of vegan
as it should be, and now real abundance data are used in the example. tax_agg()
fixes species name ordering in dune
data (#822) work by @jarioksaclass2tree()
(#818) (#820) thx to @adriangeerre for the report & the fix by @trvinhworms_downstream()
: user encountered a rank name ("phylum (division)") we hadn't dealt with yet for worms (#821) thx @msweetlove for the reportbold_children()
, bold_downstream()
and new S3 methods for boldid
: children.boldid
and downstream.boldid
. Beware that these new methods are built on top of a function that scrapes BOLD's website - their API doesn't provide access to taxonomic children (only parents) - so we've taken the liberty of trying to liberate that data and make it easy to access (#817)tol_resolve()
test - upstream package rotl had the bug; told maintainer about it and he'll submit a new version soon; affected commented out for now (#814)synonyms()
gains a method for Plants of the World Online (synonyms.pow
); and new associated helper function pow_synonyms()
used within synonyms.pow
(#812)iucn_summary()
to allow get_iucn()
failures and the function to still proceed - to make a better experience when passing in more than 1 name (#810)species_plantarum_binomials
datasetclassification()
for data source GBIF wasn't working when the queried taxon rank was below species (e.g., subspecies or variety); GBIF didn't return the same fields for ranks below species, so we tack on that information with a bit of extra code (#809)classification()
with data source GBIF; at some point introduced bug in how results were sorted (#811)use_eol()
is now defunct; EOL no longer requires an API key (#749) (#803) thanks @padpadpadpadvascan_search()
, taxize_cite()
, all *_ping()
functions, get_wormsid()
, get_pow()
, get_eolid()
, get_gbifid()
, get_boldid()
, gbif_name_usage()
; and in various places in documentation (#799)classification.uid()
now does batch HTTP requests. NCBI Entrez web service allows requests with up to 50 identifiers; @zachary-foster did the work to make this method now use batch queries so its much faster (#678) (#798)class2tree()
improvement in taxonomy rank indexing (#805) work by @trvinhtaxon_state_messages
parameter in the taxize_options
help file (#806)ncbi_children()
now accepts numeric and character class ids (#800)classification.gbifid()
, was failing because GBIF changed the order of results (#802)class2tree()
fix: problem was due ultimately to a bug in classification.gbifid()
(see line above) (#801)tax_rank()
fix - for db="ncbi"
was not giving correct ranks for queried names - was due to a change in classification.uid
(#804)get_eolid()
when filtering by data source lead to no results (#808)ncbi_downstream
(and thereby fix for downstream()
with db="ncbi"
): for some taxa a query to NCBI resulted in children as well the queried name itself, and the next query would give the same results, leading to an endless while loop - now we remove the taxon itself that was queried to prevent this (#807)COL introduced rate limiting recently in 2019 - which has made the API essentially unusable - CoL+ is coming soon and we'll incorporate it here when it's stable. see https://github.com/ropensci/colpluz for the in development R client (#796)
gn_parse()
to access the Global Names scientific name parser. it's a super fast parser. see the section on name parsers (https://docs.ropensci.org/taxize/reference/index.html#section-name-parsers) for the 3 functions that do name parsing (#794)get_wormsid()
gains two new parameters: fuzzy
and marine_only
; both are passed through to worrms::wm_records_name()
/worrms::wm_records_name()
(#790)worrms_ranks
to apply rank names in cases where WORMS fails to return rank names in their dataget_tpsid()
example that passes in names as factors; get_*
functions no longer accept factorsclassification.tpsid()
: change to an internal fxn changed its output; fix for that (#797)get_boldid()
: when filtering (e.g., w/ rank
, division
, parent
) returned no match, get_boldid
was failing on downstream parsing; return NA nowget_wormsid_()
: was missing marine_only
and fuzzy
parameterspow_search()
: an if statement was leading to length > 1 booleanssynonyms()
: an if statement in internal fxn process_syn_ids
was leading to length > 1 booleansclassification.gbifid
: select columns only if they exist instead of failing on plucking non-existtent columnsget_ids()
gains a new parameter suppress
(default:FALSE
) to toggle pakage cli
messages stating which database is being worked on (#719)taxize::downstream()
: rank_ref
, theplantlist
, apg_families
, apg_orders
(#777) (#781)get_*
functions have S3 methods that dispatch on those get_*
output classes. however, you can still pass in a db
parameter, which is IGNORED when dispatching on the input class. the db
parameter is used (not ignored) when passing in a taxon id as character/numeric/etc. now these functions (children, classification, comm2sci, sci2comm, downstream, id2name, synonyms, upstream) warn when the user passes a db
value which will be ignored (#780)http_version=2L
across all Entrez requests (#783)col_search()
: COL now does rate limiting (if you make too many requests within a time period they will stop allowing requests from your IP address/your computer); documented rate limiting, what I know at least; changed checklist
parameter behavior: years 2014 and back dont provide JSON, so we return xml_document
objects now for those years that the user can parse themselves (#786)tax_rank
somehow (my bad) had two .default
methods. previous behavior is the same as current behavior (this version) (#784)ncbi_children()
: fixed regex that was supposed to flag ambiguous taxa only, it was supposed to flag sp.
and spp.
, but was including subsp.
, which we didn't want included (#777) (#781)ncbi_children()
: when ID is passed rather than a name, we need to then set id=NULL
after switching to the equivalent taxononmic name internally to avoid getting duplicate data back (#777) (#781)eubon_search()
gains new params limit
and page
; other eubon functions have no pagination (#766)ipni_search()
from http to https, via (#773)synonyms()
to always return NA
for name not found, and always return a zero row data.frame when name found BUT no synonyms found; updated docs to indicate better what's returned (#763) (#765)xml2
package, so we have to remove them using regex; we throw a message when we're doing this so the user knows (#768)classification()
docs with a new EOL
section discussing that EOL does not have good failure behavior, and what to expect from them (#775)taxize::downstream()
: rank_ref
, theplantlist
, apg_families
, apg_orders
(#777)sci2comm()
and comm2sci()
improvements: for db="ncbi"
we no longer stop with error when when there's no results for a query; instead we return character(0)
. In addition, now all data source options for both functions now return character(0)
when there's no results for a query (#778)id2name.uid()
now actually passes on ...
internally for curl optionsget_nbnid()
: was returning non-taxon entities, have ot add idxtype:TAXON
to the fq
query (#761)as.eolid()
and as.colid()
- don't run through helper function that was raising error on HTTP 404/etc., dont want to fail (#762)class2tree()
: set root node name to NA if it does not exist, ITIS does not set a root node (#767) (#769) work by @gpliipni_search()
: IPNI changed parameter names, fixes for that; and now returning tibble's instead of data.frame's (#773) thanks @joelnitta !ncbi_children()
: fixed regex that was supposed to flag ambiguous taxa only, it was supposed to flag sp.
and spp.
, but was including subsp.
, which we didn't want included (#777)ncbi_children()
: when ID is passed rather than a name, we need to then set id=NULL
after switching to the equivalent taxononmic name internally to avoid getting duplicate data back (#777)get_*
functions gain some new features (associated new fxns are taxon_last
and taxon_clear
): a) nicer messages printed to the console when iterating through taxa, and a summary at the end of what was done; and b) state is now saved when running get_*
functions. That is, in an object external to the get_*
function call we keep track of what happened, so that if an error is encountered, you can easily restart where you left off; this is especially useful when dealing with a large number of inputs to a get_*
function. To utilize, pass the output of taxon_last()
to a get_*
function call. Associated with these changes are new package imports: R6, crayon and cli (#736) (#757)taxize_options()
to set options when using taxize. the first reason for the function is to set two options for the above item for get_*
functions: taxon_state_messages
to allow taxon state tracking messages in get_*
functions or not, and quiet=TRUE
quiets output from the taxize_options()
function itselfid2name()
and worms_downstream()
use worrms::wm_record
instead of worrms::wm_record_
for newest version of worrms
(#760)get_*
functions and col_downstream()
parameter verbose
changed to messages
to not conflict with a verbose
curl options parameter passed in to crul
gbif_downstream()
- GBIF in some cases returns a rank of "unranked", which we hadn't accounted for in internal rank processing code (#758) thanks @ocstringhamclass2tree()
gains node labels when present (#644) (#748) thanks @gpliget_pow()
, get_pow_()
, as.pow()
, classification.pow()
, pow_search()
, and pow_lookup()
(#598) (#739)taxize
. the string will look something like r-curl/3.3 crul/0.7.0 rOpenSci(taxize/0.9.6)
, including the versions of the curl
R pkg, the crul
package, and the taxize
package (#662)get_colid
functionality: we weren't paginating for the user when there were more than 50 results for a query; we now paginate for the user using async HTTP requests; this means that some requests will take longer than they did before if they have more than 50 results; this is a good change given that you get all the results for your query now (#743)get_*
functions: in some of the get_*
functions we tried for a direct match (e.g., "Poa" == "Poa"
) and if one was found, then we were done and returned that record. however, we didn't deploy the same logic across all get_*
functions. Now all get_*
functions check for a direct match. Of course if there is a direct match with more than 1 result, you still get the prompt asking you which name you want. (#631) (#734)taxize-authentication
manual file covering authentication information across the package (#681)gnr_resolve()
docs about age of datasets used in the Global Names Resolver, and how to access age of datasets (#737)get_eolid()
fixes: gains new attribute pageid
; uri
's given are updated to EOL's new URL format; rank
and datasource
parameters were not documented, now are; we no longer use short names for data sources within EOL, but instead use their full names (#702) (#742)col_search()
now returns attributes on the output data.frame's with number of results found and returned, and other metadata about the searchgnr_datasources()
loses the todf
parameter; now always returns a data.frame and the data.frame has all the columns, whereas the default call returned a limited set of columns in previous versionsget_wormsid()
, was failing when there was a direct match found with more than 1 result (#740)get_*
functions: linting of the input to the rows
parmeter was failing with a vector of values in some cases (#741)iucn_summary()
; we weren't passing on the API key internally correctly (#735) thanks @PrincessPi314 for the reportiucn_summary_id()
is defunct, use iucn_summary()
insteadcol_downstream()
gains parameter extant_only
(logical) to optionally keep extant taxa only (#714) thanks @ArielGreiner for the inquirydownstream()
gains another db
options: Worms. You can now set db="worms"
to use Worms to get taxa downstream from a target taxon. In addition, taxize
gains new function worms_downstream()
, which is used under the hood in downstream(..., db="worms")
(#713) (#715)id2name()
with db
options for tol, itis, ncbi, worms, gbif, col, and bold. the function converts taxonomid IDs to names. It's sort of the inverse of the get_*()
family of functions. (#712) (#716)tax_rank()
gains new parameter rows
so that one can pass rows
down to get_*()
functionssynonyms()
warning from an internal cbind()
call now fixed (#704) (#705) thanks @vijaybarvetaxize
function calls thrown when notifying users about API keys (e.g., taxize::use_tropicos()
) to make it very clear where the functions live (to avoid confusion with usethis
) (#724) (#725) thanks @maelleiucn_summary()
to output the same structure when no match is found as when a match is found so that when output is passed to iucn_status()
behavior is the same (#708) thanks @Rekyttax_name()
tests on CRAN (#728)httr
replaced by crul
throughout (#590)vcr
, making tests much faster and not prone to errors to remote services being down (#729)eol_dataobjects()
gains new parameter language
. eol_pages()
loses iucn
, images
, videos
, sounds
, maps
, and text
parameters, and gains images_per_page
, videos_per_page
, sounds_per_page
, maps_per_page
, texts_per_page
, and texts_page
. Please do let us know if you find any problems with any EOL functions (#717) (#718)db
value for comm2sci()
and sci2comm()
is now ncbi
instead of eol
get_*()
functions changed parameter verbose
to messages
to not conflict with verbose
passed down to crul::HttpClient
ncbi_ping()
reworked to allow use of your api key as a parameter or pulled from your environemnt; eol_ping()
using https instead of http, and parsing JSON instead of XML.get_eolid()
was erroring when no results found for a query due to not assigning an internal variable (#701) (#709) thanks for the fix @taddallasget_tolid()
was erroring when values were NULL
- now replacing all NULLL
with NA_character_
to make data.table::rbindlist()
happy (#710) (#711) thanks @gpli for the fixrank_ref
data.frame of taxonomic ranks: species subgroup, forma, varietas, clade, megacohort, supercohort, cohort, subcohort, infracohort. when there's no matched rank errors can result in many of the downstream functions. The data.frame now has 43 rows. (#720) (#727)downstream()
and ncbi_get_taxon_summary()
: change in ncbi_get_taxon_summary
to break up queries into smaller chunks to avoid HTTP 414 errors ("URI too long") (#727) (#730) thanks for reporting @fischhoff and @benjaminschwetzuse_entrez()
, use_eol()
, use_iucn()
(which uses internally rredlist::rl_use_iucn()
), and use_tropicos()
(#682) (#691) (#693) By @maelletropicos_ping()
downstream()
and gbif_downstream()
: some of the results don't have a canonicalName
, so now safely try to get that field (#673)as.uid()
, was erroring when passing in a taxon ID (#674) (#675) by @zachary-fosterget_boldid()
(and by extension classification(..., db = "bold")
): was failing when no parent taxon found, just fill in with NA now (#680)synonyms()
: was failing for some TSNs for db="itis"
(#685)tax_name()
: rows
arg wasn't being passed on internally (#686)gnr_resolve()
and gnr_datasources()
: problems were caused by http scheme, switched to use https instead of http (#687)class2tree()
: organisms with unique rank lower than non-unique ranks will give extra wrong rows (#689) (#690) thanks @gplincbi_get_taxon_summary()
: changes in the NCBI API most likely lead to HTTP 414 (URI Too Long) errors. we now loop internally for the user. By extension this helps problems upsteam in downstream()
/ncbi_downstream()
/ncbi_children()
(#698)class2tree()
: was erroring when name strings contained pound signs (e.g., #
) (#699) (#700) thanks @gpliSys.sleep
for NCBI requests if the user has an API key (#667)?taxize-authentication
verbose
to messages
across the package so that supressing calls to message()
do not conflict with curl options passed ingenbank2uid()
and ncbi_get_taxon_summary()
to use crul
instead of httr
for HTTP requestsget_tolid()
: it was missing assignment of the att
attribute internally, causing failures in some cases (#663) (#672)ncbi_children()
(and thus children()
when requesting NCBI data) to not fail when there is an empty result from the internal call to classification()
(#664) thanks @arendseeclass2tree()
gets a major overhaul thanks to @gedankenstuecke and @trvinh (!!). The function now takes unnamed ranks into account when clustering, which fixes problem where trees were unresolved for many splits as the named taxonomy levels were shared between them. Now it makes full use of the NCBI Taxonomy string, including the unnamed ranks, leading to higher resolution trees that have less multifurcations (#611) (#634)?taxize-authentication
for help. Importantly, note that API key names (both R options and environment variables) have changed. They are now the same for R options and env vars: TROPICOS_KEY, EOL_KEY, PLANTMINER_KEY, ENTREZ_KEY. You no longer need an API key for Plantminer. (#640) (#646)crul
and zoo
downstream()
we now pass on limit
and start
parameters to gbif_downstream()
; we weren't doing that before; the two parameters control pagination (#638)genbank2uid()
now returns the correct ID when there are multiple possibilities and invalid IDs no longer make whole batches fail (#642) thanks @zachary-fosterchildren()
outputs made more consistent for certain cases when no results found for searches (#648) (#649) thanks @arendseedownstream()
by passing ...
(additional parameters) down to ncbi_children()
used internally. allows e.g., use of ambiguous
parameter in ncbi_children()
allows you to remove ambiguousl named nodes (#653) (#654) thanks @arendseehttr
for crul
in EOL and Tropics functions - note that this won't affect you unless you're passing curl options. see package crul
for help on curl options. Along with this change, the parameter verbose
has changed to messages
(for toggling printing of information messages)CONTRIBUTING.md
file for how to contribute to the test suite (#635)genbank2uid
now returns the correct ID when there are multiple possibilities and invalid IDs no longer make whole batches fail.downstream()
: passing numeric taxon ids to the function while using db="ncbi"
wasn't working (#641) thanks @arendseechildren()
: passing numeric taxon ids to the function while using db="worms"
wasn't working (#650) (#651) thanks @arendseesynonyms_df()
- that attemps to combine many outputs from the synonyms()
function - now removes NA/NULL/empy outputs before attempting the combination (#636)gnr_resolve()
: before if preferred_data_sources
was used, you would get the preferred data but only a few columns of the response. We now return all fields; however, we only return the preferred data part when that parameter is used (#656)children()
. It was returning unexpected results for amgiguous taxonomic names (e.g., there's some insects that are returned when searching within Bacteria). It was also failing when one tried to get the children of a root taxon (e.g., the children of the NCBI id 131567). (#639) (#647) fixed via PR (#659) thanks @arendsee and @zachary-fosterget*()
functions had NaN
as default rows
parameter
value. Those all changed to NA
rows
parameter value givenget_*()
functionsget_*()
functions to behave the
same when ask = FALSE, rows = 1
and ask = TRUE, rows = 1
as these
should result in the same outcome. (#627) thanks @zachary-foster !NA
with no inication that there were multiple matches.comm2sci()
to S3 setup with methods for character
, uid
,
and tsn
(#621)iucn_status()
now has S3 setup with a single method that only handles
output from the iucn_summary()
function.key
parameter to fxn iucn_id()
(#633)sci2comm()
: to indicate how to get non-simplified
output (which includes what language the common name is from) vs.
getting simplified output (#623) thanks @glaroc !sci2comm()
to not be case sensitive when looking for matches
(#625) thanks @glaroc !eol_search()
: link
and content
eol_search()
to describe returned data.frame
bold_bing()
to use new base URL for their APIrank_ref
, see ?rank_ref
downstream()
via fix to rank_ref
dataset to include
"infraspecies" and make "unspecified" and "no rank" requivalent.
Fix to col_downstream()
to remove properly ranks lower than
allowed. (#620) thanks @cdeterman !iucn_summary
: changed to using rredlist
package internally.
sciname
param changed to x
. iucn_summary_id()
now is
deprecated in favor of iucn_summary()
. iucn_summary()
now has a
S3 setup, with methods for character
and iucn
(#622)rank_ref
dataset as that rank sometimes used
at NCBI (from bug reported in ncbi_downstream()
) (#626)sci2comm()
, add tryCatch()
to internals to catch
failed requests for specific pageid's (#624) thanks @glaroc !get_nbnid()
(#632)ape::neworder_phylo
object, which is not used anymore in taxize
ncbi_downstream()
and now NCBI is an option in
the function downstream()
(#583) thanks for the push @andzandz11wikitaxa
, with contributions from @ezwelty (#317)scrapenames()
gains a parameter return_content
, a boolean, to
optionally return the OCR content as a text string with the results. (#614)
thanks @fgabriel1891get_iucn()
- to get IUCN Red List ids for taxa. In addition,
new S3 methods synonyms.iucn
and sci2comm.iucn
- no other methods could
be made to work with IUCN Red List ids as they do no share their taxonomic
classification data (#578) thanks @diogoprovbold
now an option in classification()
function (#588)genbank2uid()
can give back more than 1 taxon matched to a given
Genbank accession number. Now the function can return more than one
match for each query, e.g., try genbank2uid(id = "AM420293")
(#602)
thanks @sariyacbind()
usage to incclude ...
for method
consistency (#612)tax_rank()
used to be able to do only ncbi and itis. Can now do a
lot more data sources: ncbi, itis, eol, col, tropicos, gbif, nbn,
worms, natserv, bold (#587)classification()
docs in a section Lots of results
a
note about how to deal with results when there are A LOT of them. (#596)
thanks @ahhurlbert for raising the issuetnrs()
now returns the resulting data.frame in the oder of the
names passed in by the user (#613) thanks @wpetrygnr_resolve()
to now strip out taxonomic names submitted
by user that are NA, or zero length strings, or are not of class
character (#606)gnr_resolve()
(#610) thanks @kamaputnrs()
docs that the service doesn't provide any
information about homonyms. (#610) thanks @kamapuparvorder
to the taxize
rank_ref
dataset - used by NCBI -
if tax returned with that rank, some functions in taxize
were failing
due to that rank missing in our reference dataset rank_ref
(#615)get_colid()
via problem in parsing within col_search()
(#585)gbif_downstream
(and thus fix in downstream()
): there
was two rows with form in our rank_ref
reference dataset of rank names,
causing > 1 result in some cases, then causing vapply
to fail as it's
expecting length 1 result (#599) thanks @andzandz11genbank2uid()
: was failing when getting more than 1 result back,
works now (#603) and fails better now, giving back warnings/error messages
that are more informative (see also #602) thanks @sariyasynonyms.tsn()
: in some cases a TSN has > 1 accepted name. We
get accepted names first from the TSN, then look for synonyms, and hadn't
accounted for > 1 accepted name. Fixed now (#607) thanks @tdjamessci2comm()
- was not dealing internally with passing
the simplify
parameter (#616)worrms
package on CRAN.
Adds functions as.wormsid()
, get_wormsid()
, get_wormsid_()
,
children.wormsid()
, classification.wormsid()
, sci2comm.wormsid()
,
comm2sci.wormsid()
, and synonyms.wormsid()
(#574) (#579)as.natservid
,
get_natservid
, get_natservid_
, and classification.natservid
(#126)rankagg()
with respect to vegan
package to work with
older and new version of vegan
- thank @jarioksa (#580) (#581)get_tolid()
, get_tolid_()
, and as.tolid()
(#517)classification()
gains new method for TOL datalowest_common()
gains new method for TOL dataritis
package, an external dependency for ITIS taxonomy
data. Note that a large number of ITIS functions were removed, and are
now available via the package ritis
. However, there are still many
high level functions for working with ITIS data (see functions prefixed
with itis_
), and get_tsn()
, classification.tsn()
, and similar
high level functions remain unchanged. (#525)eubon()
fxn is now eubon_search()
, although either still
work - though eubon()
will be made defunct in the next version of
this package. Additional new functions were added: eubon_capabilities()
,
eubon_children()
, and eubon_hierarchy()
(#567)lowest_common()
function gains two new data source options: COL (Catalogue
of Life) and TOL (Tree of Life) (#505)synonyms_df()
as a slim wrapper around
data.table::rbindlist()
to make it easy to combine many outputs
from synonyms()
for a single data source - there is a lot of heterogeneity
among data sources in how they report synonyms data, so we don't attempt
to combine data across sources (#533)https
from http
(#571)tax_name()
in which when an invalid taxon was searched
for then classification()
returned no data and caused an error.
Fixed now. (#560) thanks @ljvillanueva for reporting it!gnr_resolve()
in which order of input names to the function
was not retained. fixed now. (#561) thanks @bomeara for reporting it!gbif_parse()
- data format changed coming back from
GBIF - needed to replace NULL
with NA
(#568) thanks @ChrKoenig for
reporting it!get_*()
functions now have new attributes to further help the user:
multiple_matches
(logical) indicating whether there were multiple
matches or not, and pattern_match
(logical) indicating whether a
pattern match was made, or not. (#550) from (#547) discussion,
thanks @ahhurlbert ! see also (#551)xml2::xml_find_one()
to xml2::xml_find_first()
for new xml2
version (#546)gnr_resolve()
now retains user supplied taxa that had no matches -
this could affect your code, make sure to check your existing code (#558)gnr_resolve()
- stop sorting output data.frame, so order of rows
in output data.frame now same as user input vector/list (#559)sub_rows()
inside of most get_*()
functions
to not fail when the data.frame rows were less than that requested by
the user in rows
parameter (#556)get_gbifid()
, as sometimes calls failed because we now
return numberic IDs but used to return character IDs (#555)get_()
functions to call the internal sub_rows()
function later in the function flow so as not to interfere with
taxonomic based filtering (e.g., user filtering by a taxonomic rank)
(#555)gnr_resolve()
, to not fail on parsing when no data
returned when a preferred data source specified (#557)iucn_summary()
(#543) thanks @mcsiplencbi_get_taxon_summary()
suggesting to break up the ids into chunks (#541) thanks @daattaliitis_acceptname()
to accept multiple names (#534) and now
gives back same output regardless of whether match found or not (#531)tax_name()
for some queries that return no classification data
via internal call to classification()
(#542) thanks @daattalitax_name()
(#530) thanks @ibartomeusrankagg()
function, use requireNamespace()
in examples
to make sure user has vegan
installed (#529)eol_invasive()
and gisd_invasive()
to point to new location in the originr
package. Also, cleaned out code in those functions as not avail.
anymore (#494)get_gbifid()
to use new internal code to provide two
ways to search GBIF taxonomy API, either via /species/match
or via
/species/search
, instead of /species/suggest
, which we used previously.
The suggest route was too coarse. get_gbifid()
also gains a parameter
method
to toggle whether you search for names using /species/match
or
/species/search
. (#528)col_search()
to handle when COL can return a value of
missapplied name
, which a switch()
statement didn't handle yet (#511)
thanks @JoStaerk !get_colid()
and col_search()
(#523) thanks @zachary-foster !bold
, which fixes
taxize::bold_search()
, so no actual changes in taxize
for
this, but take note (#521)gnr_resolve()
where we indexed to data
incorrectly. And added tests to account for this problem.
Thanks @raredd ! (#519) (#520)iucn_summary()
introduced in last version.
iucn_summary()
now uses the package rredlist
, which requires
an API key, and I didn't document how to use the key. Function
now allows user to pass the key in as a parameter, and documents
how to get a key and save it in either .Renviron
or in
.Rprofile
(#522)lowest_common()
for obtaining the lowest common taxon and
rank for a given taxon name or ID. Methods so far for ITIS, NCBI, and GBIF (#505)rredlist
rredlistiucn_summary_id()
- same as iucn_summary()
, except takes
IUCN IDs as input instead of taxonomic names (#493)iucn_summary()
fixes, long story short: a number of bug fixes, and uses
the new IUCN API via the newish package rredlist
when IDs are given as input,
but uses the old IUCN API when taxonomic names given. Also: gains new parameter distr_details
(#174) (#472) (#487) (#488)XML
with xml2
for XML parsing (#499)httr::content
to explicitly state encoding="UTF-8"
(#498)gnr_resolve()
now outputs a column (user_supplied_name
) for the exact input taxon
name - facilitates merging data back to original data inputs (#486) thanks @Alectoriaeol_dataobjects()
gains new parameter taxonomy
to toggle whether to return
any taxonomy details from different data providers (#497)classification()
was giving back rank values in mixed case from different data
providers (e.g., class
vs. Class
). All rank values are now all lowercase (#504)get_gbfid
to
50 from 20. Gives back more results, so more likely to get the thing searched for (#513)gni_search()
to make all output columns character
classiucn_id()
, tpl_families()
, and tpl_get()
all gain a new parameter ...
to
pass on curl options to httr::GET()
get_eolid()
: URI returned now always has the pageid, and goes to the
right place; API key if passed in now actually used, woopsy (#484)get_uid()
: when a taxon not found, the "match" attribute was saying
found sometimes anyway - that is now fixed; additionally, fixed docs to correctly
state that we give back 'NA due to ask=FALSE'
when ask = FALSE
(#489) Additionally,
made this doc fix in other get_*()
function docsapgOrders()
function (#490)tp_search()
which fixes get_tpsid()
: Tropicos doesn't allow periods (.
) in
query strings, so those are URL encoded now; Tropicos doesn't like sub-specific rank names
in name query strings, so we warn when those are found, but don't alter user inputs; and
improved docs to be more clear about how the function fails (#491) thanks @scelmendorf !classification(db = "itis")
to fail better when no taxa found (#495) thanks @ashenkin !eol_pages()
fixes: the EOL API route for this method gained a new parameter taxonomy
,
this function gains that parameter. That change caused this fxn to fail. Now fixed. Also,
parameter subject
changed to subjects
(#500)col_search()
due to when misapplied name
come back as a data slot. There
was previously no parser for that type. Now there is, and it works (#512)R >= 3.2.1
. Good idea to update your R installation anyway (#476)ion()
for obtaining data from Index of Organism Names (#345)eubon()
for obtaining data from EU (European Union) BON
taxonomy (#466) Note that you may onloy get partial results for some requests
as paging isn't implemented yet in the EU BON API (#481)fg_*()
for obtaining data from Index
Fungorum. More work has to be done yet on this data source, but these initial
functions allow some Index Fungorum data access (#471)gbif_downstream()
for obtaining downstream names from
GBIF's backbone taxonomy. Also available in downstream()
, where you can
request downstream names from GBIF, along with other data sources (#414)db
parameters to warn users that if they
provide the wrong db
value for the given taxon ID, they can get data
back, but it would be wrong. That is, all taxonomic data sources available
in taxize
use their own unique IDs, so a single ID value can be in multiple
data sources, even though the ID refers to different taxa in each data source.
There is no way we can think of to prevent this from happening, so be cautious.
(#465)gnr_resolve()
to by default capitalize first name of a name string
passed to the function. GNR is case sensitive, so case matters (#469)phylomatic_tree()
and phylomatic_format()
are defunct. They were deprecated
in recent versions, but are now gone. See the new package brranching
for
Phylomatic data (#479)stripauthority
argument in gnr_resolve()
has been renamed to canonical
to better match what it actually does (#451)gnr_resolve()
now returns a single data.frame in output, or NULL
when no data found. The input taxa that have no match at all are returned in
an attribute with name not_known
(#448)vascan_search()
changed callopts
parameter to ...
to pass in curl
options to the request.ipni_search()
changed callopts
parameter to ...
to pass in curl
options to the request. In addition, better http error handling, and
added a test suite for this function. (#458)stringsAsFactors=FALSE
now used for gibf_parse()
(https://github.com/ropensci/taxize/commit/c0c4175d3a0b24d403f18c057258b67d3fbf17f0)get_uid()
to make more clear
how to use the varoious parameters to get the desired result, and how to
avoid certain pitfalls (#436)asdf
from the function eol_dataobjects()
- now
returning data.frame's only.get_eolid()
via tryCatch()
to fail better
when names not found.openssl
as a package dependency. Not needed anymore because uBio
dropped.gnr_resolve()
failed when no canonical form was found.gnr_resolve()
when no results found when best_match_only=TRUE
(#432)itisdf()
to give back an empty data.frame
when no results found, often with subspecific taxa. Helps solve errors reported
in use of downstream()
, itis_downstream()
, and gethierarchydownfromtsn()
(#459)gnr_resolve()
gains new parameter with_canonical_ranks
(logical) to choose
whether infraspecific ranks are returned or not.iucn_id()
to get the IUCN ID for a taxon from it's name. (#431)ubio_classification()
, ubio_classification_search()
,
ubio_id()
, ubio_search()
, ubio_synonyms()
, get_ubioid()
, ubio_ping()
.
In addition, ubio has been removed as an option in the synonyms()
function,
and references for uBio have been removed from the taxize_cite()
utility
function. (#449)rankagg()
doesn't depend on data.table
anymore (fixes issue with CRAN checks)RCurl::base64Decode()
with openssl::base64_decode()
, needed for
ubio_*()
functions (#447)importFrom
) used across all imports now (#446).
In addition, importFrom
for all non-base R pkgs, including graphics
, methods
,
stats
and utils
packages (#441)query
parameter in GET()
, but can pass NULL
(#445)gni_*()
functions, including code tidying, some
DRYing out, and ability to pass in curl options (#444)taxize_cite()
classification()
where numeric IDs as input got
converted to itis ids just because they were numeric. Fixed now. (#434)synonyms
function to get
name synonyms. (#430)apgFamilies
and apgOrders
. (#418)col_search()
gains parameters response
to get a terse or full response, and
...
to pass in curl options.eol_dataobjects()
gains parameter ...
to pass in curl options, and parameter
returntype
renamed to asdf
(for "as data.frame").ncb_get_taxon_summary()
gains parameter ...
to pass in curl options.children()
function gains the rows
parameter passed on to get_*()
functions,
supported for data sources ITIS and Catalogue of Life, but not for NCBI.upstream()
function gains the rows
parameter passed on to get_*()
functions,
supported for both data sources ITIS and Catalogue of Life.classification()
function gains the rows
parameter passed on to get_*()
functions, for all sources used in the function.downstream()
function gains the rows
parameter passed on to get_*()
functions, for all sources used in the function.get_*()
) gain new parameters to
help filter results (e.g., division
, phylum
, class
, family
, parent
, rank
, etc.).
These parameters allow direct matching or regex filters (e.g., .a
to match any character
followed by an a
). (#410) (#385)get_*()
) now give back more
information (mostly higher taxonomic data) to help in the interactive decision
process. (#327)synonyms()
function: Catalogue of Life. (#430)vegan
package, used in class2tree()
function, moved from Imports to Suggests. (#392)taxize_cite()
a lot - get URLs and sometimes citation information
for data sources available in taxize. (#270)apg_lookup()
function. (#422)apg_families()
function. (#418)callopts
parameter in eol_pages()
, eol_search()
, gnr_resolve()
,
tp_accnames()
, tp_dist()
, tp_search()
, tp_summary()
, tp_synonyms()
,
ubio_search()
changed to ...
accepted
parameter in get_tsn()
changed to FALSE
by default. (#425)db
parameter in resolve()
changed to gnr
as tnrs
is
often quite slow.tpl_families()
and tpl_get()
. (#424)ncbi_getbyname()
, ncbi_getbyid()
, ncbi_search()
, eol_invasive()
,
gisd_isinvasive()
. These functions are available in the traits
package. (#382)phylomatic_tree()
is deprecated, but will be defunct in a upcoming version.taxize
. E.g., itis_ping()
pings ITIS and returns a logical, indicating if the ITIS API is working or not. You can also do a very basic test to see whether content returned matches what's expected. (#394)status_codes()
to get vector of HTTP status codes. (#394)itis_ping()
, and all *_ping()
functions.\donttest
into \dontrun
.genbank2uid()
to get a NCBI taxonomic id (i.e., a uid) from a either a GenBank accession number of GI number. (#375)get_nbnid()
to get a UK National Biodiversity Network taxonomic id (i.e., a nbnid). (#332)nbn_classification()
to get a taxonomic classification for a UK National Biodiversity Network taxonomic id. Using this new function, generic method classification()
gains method for nbnid
. (#332)nbn_synonyms()
to get taxonomic synonyms for a UK National Biodiversity Network taxonomic id. Using this new function, generic method synonyms()
gains method for nbnid
. (#332)nbn_search()
to search for taxa in the UK National Biodiversity Network. (#332)ncbi_children()
to get direct taxonomic children for a NCBI taxonomic id. Using this new function, generic method children()
gains method for ncbi
. (#348) (#351) (#354)upstream()
to get taxa upstream of a taxon. E.g., getting families upstream from a genus gets all families within the one level higher up taxonomic class than family. (#343)as.*()
to coerce numeric/alphanumeric codes to taxonomic identifiers for various databases. There are methods on this function for each of itis, ncbi, tropicos, gbif, nbn, bold, col, eol, and ubio. By default as.*()
funtions make a quick check that the identifier is a real one by making a GET request against the identifier URI - this can be toggle off by setting check=FALSE
. There are methods for returning itself, character, numeric, list, and data.frame. In addition, if the as.*.data.frame()
function is used, a generic method exists to coerce the data.frame
back to a identifier object. (#362)get_tsn_()
(the underscore is the only different from the previous function name). These functions don't do the normal interactive process of prompts that e.g., get_tsn()
do, but instead returned a list of all ids, or a subset via the rows
parameter. (#237)ncbi_get_taxon_summary()
to get taxonomic name and rank for 1 or more NCBI uid's. (#348)assertthat
removed from package imports, replaced with stopifnot()
, to reduce dependency load. (#387)eol_hierarchy()
now defunct (no longer available) (#228) (#381)tp_classifcation()
now defunct (no longer available) (#228) (#381)col_classification()
now defunct (no longer available) (#228) (#381)?fxn-name
.get_*()
functions gain a new parameter rows
to allow selection of particular rows. For example, rows=1
to select the first row, or rows=1:3
to select rows 1 through 3. (#347)classification()
now by default returns taxonomic identifiers for each of the names. This can be toggled off by the return_id=FALSE
. (#359) (#360)switch()
on the db
parameter, which helps give better error message when a db
value is not possible or spelled incorrectly. (#379)children()
, which is a single interface to various data sources to get immediate children from a given taxonomic name. (#304)bold_search()
that searches for taxa in the BOLD database of barcode data; get_boldid()
to search for a BOLD taxon identifier. (#301)get_ubioid()
to get a uBio taxon identifier. (#318)taxize
: taxize_cite()
. (#270)jsonlite
instead of RJSONIO
throughout the taxize
.get_ids()
gains new option to search for a uBio ID, in addition to the others, itis, ncbi, eol, col, tropicos, and gbif.stripauthority
parameter gnr_resolve()
. (#325)iplant_resolve()
now outputs data.frame structure instead of a list. (#306)seqrange
in ncbi_getbyname()
and ncbi_search()
(#328)synonyms()
gains new data source, can now get synonyms from uBio data source (#319)vascan_search()
giving back more useful results now.tnrs()
function, including more meaningful error messages on failures (#323) (#331)getpublicationsfromtsn()
that caused function to fail on data.frame's with no data on name assignment (#297)sci2comm()
that caused fxn to fail when using db=itis
sometimes (#293)scrapenames()
. Sending a text blob via the text
parameter now works.resolve()
so that function now works for all 3 data sources. (#337)iplant_resolve()
to do name resolution using the iPlant name resolution service. Note, this is different from http://taxosaurus.org/ that is wrapped in the tnrs()
function.ipni_search()
to search for names in the International Plant Names Index (IPNI).resolve()
that unifies name resolution services from iPlant's name resolution service (via iplant_resolve()
), Taxosaurus' TNRS (via tnrs()
), and GNR's name resolution service (via gnr_resolve()
).get_*()
functions how returning a new uri attribute that is a link to the taxon on on the web. If NA is given back (e.g. nothing found), the uri attribute is blank. You can go directly to the uri in your default browser by doing, for example: browseURL(attr(result, "uri"))
.get_eolid()
now returns an attribute provider because EOL collates taxonomic data form a lot of sources, then gives back IDs that are internal EOL ids, not those matching the id of the source they pull from. This should help with provenance, and should help if there is confusion about why the id givenb back by this function does not match that from the original source.get_tsn()
function, now using the function itis_terms()
, which gives back the accepted status of the taxa. This allows a new parameter in the function (accepted
, logical) that allows user to say give back only accepted status names (accepted=TRUE
), or to give back all names (accepted=FALSE
).gnr_resolve()
gains two new parameters best_match_only
(logical, to return best match only) and preferred_data_sources
(to return preferred data sources) and callopts
to pass in curl options.tnrs()
, tp_accnames()
, tp_refs()
, tp_summary()
, and tp_synonyms()
gain new parameter callopts
to pass in curl options.class2tree()
can now handle NA in classification objects.classification.eolid()
and classification.colid()
now return the submitted name along with the classification.plyr
functions, see #275.verbose
parameter to many more functions to allow suppression of help messages.httr
, now manually parsing JSON to a list then to another data format instead of allowing internal httr
parsing - in addition added checks on content type and encoding in many functions.match.arg
iternally to get_ids()
for the db
parameter so that a) unique short abbreviations of possible values are possible, and b) gives a meaningful warning if unsupported values are given.getexpertsfromtsn
, getgeographicdivisionsfromtsn
) gain parameter curlopts
to pass in curl options.stringsAsFactors=FALSE
to all data.frame
creations to eliminate factor variables.classification.gbifid()
did not return the correct result when taxon not found.classification()
used to fail when it was passed a subset of a vector of ids, in which case the class information was stripped off. Now works (#284)