This vignette provides information
about how the package stores and outputs information about species
taxonomy via the taxonomy
object and available options for
simulating species taxonomy.
A phylogenetic tree object, in the format used by most software
packages, does not typically contain information about species over
time. A timescaled phylogeny is informative about the minimum number of
co-existing lineages at a given moment but not about the start and end
point of species or the mode of speciation. Three modes of
speciation that have been discussed widely in palaeobiological
literature are shown in Fig. 1. and can be modelled using
FossilSim
:
The distinction between these processes is important because they have consequences on parameter estimates using different methods.
In molecular phylogenetics, we typically draw trees as fully
bifurcating structures (i.e. each node gives rise to two descendant
branches). This is also how trees are stored in computer memory by most
phylogenetic software packages, including ape
. However, it
may not be the case that each internal edge represents a unique species.
Instead, some speciation events may be budding rather than bifurcating
or anagenesis may have occurred at some point along internal edges. To
explore the impact of different speciation modes it is valuable to know
the relationship between species through time (the taxonomy)
and the corresponding tree object.
FossilSim
contains options for simulating species
evolution under different speciation modes, described in the section Simulating taxonomy. Information about
the true species taxonomy is stored in the taxonomy
object,
which will be associated with the corresponding phylo
object used to simulate species evolution.
The object contains a dataframe detailing species taxonomy, which
contains the following 7 fields for each edge and species in the
corresponding phylo
object:
sp
the true species identity label. If all species
originated via budding or bifurcation this will always correspond to the
terminal-most edge label (i.e. the youngest node) associated with each
species. This is not the case if the data set also contains anagenetic
species, when multiple species may be associated with a single
edge
edge
edge label of the branch in the corresponding
phylo object. Note that some species may be associated with multiple
edges
parent
ancestor of species sp
. Parent
labels follow the same convention as species. The label assigned to the
parent of the origin or root will be zero
start
start time of the corresponding edge and/or
origin time of the species. If the corresponding edge is also the oldest
edge associated with the species this value will equal the species
origination time. If speciation mode is asymmetric or symmetric the
speciation time will match the start time of the corresponding edge. If
speciation mode is anagenetic the speciation time will be younger than
the start time of the corresponding edge
end
end time of the corresponding edge and/or end
time of the species. If the corresponding edge is also the youngest edge
associated with the species this value will equal the species end time.
Unless the species end time coincides with an anagenetic speciation
event, the species end time will match the end time of the corresponding
edge. If the species end time coincides with an anagenetic speciation
event, the end time will be older than the end time of the corresponding
edge
mode
mode = speciation mode. “o” = origin or “r” =
root (the edge/species that began the process). “b” = asymmetric or
budding speciation. “s” = symmetric or bifurcating speciation. “a” =
anagenetic speciation
A new taxonomy
object can be created by passing the
above information to the taxonomy()
function.
Cryptic species can also be simulated. Under cryptic speciation the
descendant species will be indistinguishable from the ancestor and the
taxonomy
object can incorporate this information using the
additional optional fields:
cryptic
TRUE if the speciation event was cryptic. If
this information is not passed to taxonomy()
, the function
assumes cryptic = FALSE
for all species
cryptic.id
cryptic species identity, i.e. ancestral
species that the current species will be identified as. If
cryptic = TRUE
, cryptic.id
will differ from
the true species identity sp
The relationship between information provided by the
taxonomy
object and the corresponding phylo
object will become clear when you start simulating species and exploring
the output.
The taxonomy
object can be used as the starting point
for simulating fossils or other downstream analyses that require
information about discrete species units.
Special Note The birth and death rates used as parameters in the birth-death process typically do not correspond to the rates of appearance and extinction of species under the above processes, unless speciation occurs entirely via budding.
Species evolution or taxonomy can be simulated in
FossilSim
under a mixed model of speciation that can
incorporate three modes of speciation – budding (or asymmetric),
bifurcating (or symmetric) and anagenetic – in addition to cryptic
speciation (Stadler et al. 2018).
This model of mixed speciation was described previously in (Bapst 2013) and is also available as part of
the paleotree
package (Bapst
2012).
In a birth-death process λ is the rate at which branching events occur, while μ is the rate at which lineage termination occurs. These are the parameters typically used to simulate birth-death trees (see the code snippet below for an example). The mixed model of speciation includes three additional parameters, β, λa and κ, which are defined as follows:
The relationship between these processes and the underlying tree
structure in the corresponding phylo
object is described in
section The taxonomy object.
Simulating species evolution or taxonomy for any user specified tree in
FossilSim
is straightforward.
# set the random number generator seed to generate the same results using the same code
set.seed(123)
# simulate a tree using TreeSim conditioned on tip number
lambda = 1
mu = 0.2
tips = 8
t = TreeSim::sim.bd.taxa(n = tips, numbsim = 1, lambda = lambda, mu = mu)[[1]]
# t is an object of class phylo
t
#>
#> Phylogenetic tree with 10 tips and 9 internal nodes.
#>
#> Tip labels:
#> t2, t5, t8, t4, t9, t10, ...
#>
#> Rooted; includes branch length(s).
# use t$edge, t$edge.length, t$root.edge to see the tree attributes
# simulate under complete budding speciation
s = sim.taxonomy(tree = t) # this is equivalent to using the default parameters beta = 0, lambda_a = 0, kappa = 0
# s is an object of class taxonomy
s
#> sp edge parent start end mode cryptic cryptic.id
#> 1 1 1 8 0.38710924 0.00000000 b 0 1
#> 2 2 2 4 0.14461369 0.00000000 b 0 2
#> 3 3 3 6 0.08786013 0.00000000 b 0 3
#> 4 4 14 10 0.84708717 0.52482777 b 0 4
#> 5 4 17 10 0.52482777 0.14461369 b 0 4
#> 6 4 4 10 0.14461369 0.00000000 b 0 4
#> 7 5 5 8 0.10233497 0.00000000 b 0 5
#> 8 6 19 8 0.44503437 0.08786013 b 0 6
#> 9 6 6 8 0.08786013 0.00000000 b 0 6
#> 10 7 7 4 0.52482777 0.00000000 b 0 7
#> 11 8 15 10 1.53730579 0.44503437 b 0 8
#> 12 8 16 10 0.44503437 0.38710924 b 0 8
#> 13 8 18 10 0.38710924 0.10233497 b 0 8
#> 14 8 8 10 0.10233497 0.00000000 b 0 8
#> 15 9 9 10 1.17167856 0.09161934 b 0 9
#> 16 10 11 0 3.34383891 1.53730579 o 0 10
#> 17 10 12 0 1.53730579 1.17167856 o 0 10
#> 18 10 13 0 1.17167856 0.84708717 o 0 10
#> 19 10 10 0 0.84708717 0.34120366 o 0 10
#> Taxonomy representing 10 species across 19 edges.
Note that the way sim.taxonomy
assigns budding species
is deterministic – at each branching event, the “left” descendant edge
will always be assigned the ancestral species label and the “right”
descendant edge will always be the new species. This means even without
setting the random number seed sim.taxonomy
will always
produce the same output when β = 0.
FossilSim
contains various options for plotting the
package output. The plot.taxonomy
function will produce a
plot highlighting species taxonomy along with the corresponding
bifurcating tree object.
Note that R detects what type of object you are trying to plot, in
this case taxonomy, and will automatically call
plot.taxonomy
when you apply the function plot
to an object of class taxonomy
. Use
?plot.taxonomy
to view further options that can be used to
change the appearance of the figure.
# simulate under complete bifurcating speciation
s = sim.taxonomy(tree = t, beta = 1)
plot(s, tree = t, legend.position = "topleft")
# simulate under mixed speciation
s = sim.taxonomy(tree = t, beta = 0.5, lambda.a = 1, kappa = 0.1)
plot(s, tree = t, legend.position = "topleft")
Note that plot.taxonomy
only plots the true species, and
ignores any information about cryptic speciation (i.e. the function uses
the sp
labels and ignores cryptic.id
).
Given an existing taxonomy
object you can also add
anagenetic or cryptic species downstream using the functions
sim.anagenetic.species
and
sim.cryptic.species
. These functions also return a
taxonomy
object.
# simulate taxonomy without anagenetic or cryptic species
s1 = sim.taxonomy(tree = t, beta = 0.5)
# simulate anagenetic species
# note this function also requires the corresponding tree object
s2 = sim.anagenetic.species(tree = t, species = s1, lambda.a = 1)
# simulate cryptic species
s3 = sim.cryptic.species(species = s2, kappa = 0.1)
These functions can not be used with taxonomy
objects
that already contain anagenetic or cryptic species.
See the paleotree
vignette to see how
FossilSim
objects can be converted into
paleotree
objects.