Methods using a large number of parameters risk being overfit. This
usually translates in poor fitting with data and trees other than the
those originally used. With RRphylo methods this risk is usually very
low. However, the user can assess how robust the results got by applying
search.shift
, search.trend
, search.conv
, or
PGLS_fossil
are by running overfitRR
. The
basic idea of overfitRR
is using alternative tree
topologies to test for both phylogenetic and sampling uncertainty at the
same time. Such alternative phylogenies can be provided by the user as a
multiPhylo
object, otherwise are automatically generated
within overfitRR
. In this latter case, the original tree
and data are subsampled by specifying a s
parameter, that
is the proportion of tips to be removed from the tree, and species
positions are shuffled by using the function swapONE
.
overfitRR
always takes an object generated by RRphylo
and all the data used to
produce it (besides necessary phenotypic data, any other argument such
as covariate, predictor, and so on, passed to RRphylo
). If
no phylo.list
is available, the arguments s
and swap.args
can be used to set the intensity of
subsampling and phylogenetic alterations to be applied. Depending on
which tool is under testing, the user supplies to the function one or
more among trend.args
, shift.args
,
conv.args
, and pgls.args
each of them being a
list of arguments specific to the namesake function (see the examples below).
The output of overfitRR
is a RRphyloList
object whose elements depend on the case under testing (see below).
In some cases, removing as many tips as imposed by s
would delete too many tips right in clades and/or states under testing.
In these cases, the function maintains no less than 5 species at least
in each clade/state under testing (or all species if there is less),
reducing the sampling parameter s
if necessary. Thus, the
first element of the output list ($mean.sampling
) is the
mean proportion of species actually removed over the iterations.
In any case, the function returns a multiPhylo
and a
RRphyloList
object including the modified phylogenies
($tree.list
) and the outputs of RRphylo
performed on them ($RR.list
), respectively. Both objects
are treated as regular lists. overfitRR
also derives the
95% confidence interval around the original phenotypic value estimated
at the tree root ($rootCI
) and the regression parameters
describing the relation between the original values at internal nodes
and the corresponding figure after subsampling and swapping
($ace.regressions
). A regression slope close to one
indicates a better matching between original and subsampled values,
suggesting the estimation is robust to phylogenetic uncertainty and
subsampling.
When the robustness of search.shift
is tested, the
function returns separate results for clade
and sparse
conditions
($shift.results
). The first (clade)
includes the proportion of simulations producing significant and
positive (p.shift+) or significant and negative
(p.shift-) rate shifts for each single node, and for
all the clades taken as a whole (see Testing rate shifts pertaining to
entire clades for further details). Under the
sparse
condition (sparse), the same
figures as before are reported for each state category compared to the
rest of the tree and for all possible pair of categories (see Testing rate shifts pertaining to
phylogenetically unrelated species for further details).
When testing for search.trend
robustness,
overfitRR
returns results for both the entire tree and
specific clades if indicated ($trend.results
). Results for
the entire tree (tree) summarize the proportion of
simulations producing positive (slope+) or negative
(slope-) slopes significantly higher
(p.up) or lower (p.down) than BM
simulations for either phenotypes or rescaled rates versus time
regressions. Such evaluations is based on p.random only
(see Temporal trends on the entire
tree,for further details). When specific clades are under
testing, the same set of results as for the whole tree is returned for
each node (node). In this case, for phenotype versus
age regression through nodes, the proportion of significant and
positive/negative slopes
(slope+p.up,slope+p.down,slope-p.up,slope-p.down)
is accompanied by the same figures for the estimated marginal mean
differences (p.emm+ and p.emm-). As
for the temporal trend in absolute rates through node, the proportion of
significant and positive/negative estimated marginal means differences
(p.emm+ and p.emm-) and the same
figure for slope difference (p.slope+ and
p.slope-) are reported (see Temporal trends at clade
level). Finally when more than one node is tested, the
$trend.results
object also includes results for the
pairwise comparison between nodes.
Results for robustness of search.conv
($conv.results
) include separate objects for convergence
between clades
or
between/within states
.
Under the first case (clade), the proportion of
simulations producing significant instance of convergence
(p.ang.bydist) or convergence and parallelism
(p.ang.conv) between selected clades are returned (see
Morphological convergence between
clades for further details). As for convergence between/within
discrete categories (state), overfitRR
reports the proportion of simulations producing significant instance of
convergence either accounting (p.ang.state.time) or not
accounting (p.ang.state) for the time intervening
between the tips in the focal state Morphological convergence
within/between categories for explanations).
Results for robustness of PGLS_fossil
($pgls.results
) include separate objects for the pgls
performed on the original tree ($tree
) or on the tree
rescaled according to RRphylo
rates (i.e. tree branches
rescaled to the absolute branch-wise rate values while keeping the total
evolutionary time constant; $RR
).
library(ape)
# load the RRphylo example dataset including Ornithodirans tree and data
DataOrnithodirans$treedino->treedino # phylogenetic tree
DataOrnithodirans$massdino->massdino # body mass data
DataOrnithodirans$statedino->statedino # locomotory type data
### Testing search.shift
# perform RRphylo Ornithodirans tree and data
RRphylo(tree=treedino,y=massdino)->dinoRates
# perform search.shift under both "clade" and "sparse" condition
search.shift(RR=dinoRates, status.type= "clade")->SSnode
search.shift(RR=dinoRates, status.type= "sparse", state=statedino)->SSstate
# test the robustness of search.shift results
overfitRR(RR=dinoRates,y=massdino,swap.args =list(si=0.2,si2=0.2),
shift.args = list(node=rownames(SSnode$single.clades),state=statedino),
nsim=10)
### Testing search.trend
# Extract Pterosaurs tree and data
extract.clade(treedino,748)->treeptero # phylogenetic tree
massdino[match(treeptero$tip.label,names(massdino))]->massptero # body mass data
massptero[match(treeptero$tip.label,names(massptero))]->massptero
# perform RRphylo and search.trend on Pterosaurs tree and data
# by specifying a clade to be tested
RRphylo(tree=treeptero,y=log(massptero))->RRptero
search.trend(RR=RRptero, y=log(massptero),node=143,cov=NULL,ConfInt=FALSE)->STnode
# test the robustness of search.trend results
overfitRR(RR=RRptero,y=log(massptero),trend.args = list(node=143),nsim=10)
### Applying overfitRR on multiple RRphylo
# load the RRphylo example dataset including Cetaceans tree and data
data("DataCetaceans")
DataCetaceans$treecet->treecet # phylogenetic tree
DataCetaceans$masscet->masscet # logged body mass data
DataCetaceans$brainmasscet->brainmasscet # logged brain mass data
DataCetaceans$aceMyst->aceMyst # known phenotypic value for the most recent
# common ancestor of Mysticeti
# cross-reference the phylogenetic tree and body and brain mass data. Remove from
# both the tree and vector of body sizes the species whose brain size is missing
drop.tip(treecet,treecet$tip.label[-match(names(brainmasscet),
treecet$tip.label)])->treecet1
masscet[match(treecet1$tip.label,names(masscet))]->masscet1
# peform RRphylo on the variable (body mass) to be used as additional predictor
RRphylo(tree=treecet1,y=masscet1)->RRmass
RRmass$aces[,1]->acemass1
# create the predictor vector: retrieve the ancestral character estimates
# of body size at internal nodes from the RR object ($aces) and collate them
# to the vector of species' body sizes to create
c(acemass1,masscet1)->x1.mass
# peform RRphylo and search.trend on the brain mass
# by using the body mass as additional predictor
RRphylo(tree=treecet1,y=brainmasscet,x1=x1.mass)->RRmulti
search.trend(RR=RRmulti, y=brainmasscet,x1=x1.mass)->STcet
# test the robustness of search.trend results
overfitRR(RR=RRmulti,y=brainmasscet,trend.args = list(),x1=x1.mass,nsim=10)
### Testing PGLS_fossil
# peform RRphylo on cetaceans brain mass
RRphylo(tree=treecet1,y=brainmasscet)->RRbrain
# perform PGLS_fossil by using the original tree
PGLS_fossil(y~x,data=list(y=brainmasscet,x=masscet1),tree=treecet1)->pgls_noRR
# perform PGLS_fossil rescaling the tree according to RRphylo rates
PGLS_fossil(y~x,data=list(y=brainmasscet,x=masscet1),tree=RRbrain$tree,RR=RRbrain)->pgls_RR
# test the robustness of PGLS_fossil results
overfitRR(RR=RRbrain,y=brainmasscet,
pgls.args=list(modform=y~x,data=list(y=brainmasscet,x=masscet1),tree=TRUE,RR=TRUE),
nsim=10)
### Testing search.conv
# load the RRphylo example dataset including Felids tree and data
data("DataFelids")
DataFelids$PCscoresfel->PCscoresfel # mandible shape data
DataFelids$treefel->treefel # phylogenetic tree
DataFelids$statefel->statefel # conical-toothed or saber-toothed condition
# perform RRphylo on Felids tree and data
RRphylo(tree=treefel,y=PCscoresfel)->RRfel
# search for morphologicl convergence between clades (automatic mode)
# and within the category
search.conv(RR=RRfel, y=PCscoresfel, min.dim=5, min.dist="node9"))->SC.clade
as.numeric(c(rownames(SC.clade[[1]])[1],as.numeric(as.character(SC.clade[[1]][1,1]))))->conv.nodes
search.conv(tree=treefel, y=PCscoresfel, state=statefel)->SC.state
# test the robustness of seach.conv results
overfitRR(RR=RRfel, y=PCscoresfel,conv.args=
list(node=conv.nodes,state=statefel,declust=TRUE),nsim=10)