The most recent version may be installed from github using the devtools package:
Installing from github
library(devtools)
install_github("coleoguy/evobiR", build_vignettes = TRUE)
##
## Thank you for using evobiR!
##
## To acknowledge our work, please cite the package:
## Michelle M. Jonika, Maximos Chin, Nathan Anderson, Richard H. Adams, Jeffery P. Demuth, and Heath Blackmon. (2025).
## EvobiR: Tools for comparative analyses and teaching evolutionary biology.
## coleoguy/evobiR: EvobiR version 2.2
This function uses stochastic mapping and ancestral state reconstruction to determine if the derived state of a binary trait originates when a continuous trait has an extreme value.
The four steps in the ancestral condition test. A) Ancestral states are estimated for the continuous character assuming a Brownian motion model of evolution. B) Possible evolutionary histories for the discrete trait are generated through stochastic mapping C) Nodes are categorized as either ancestral or derived, and ancestral nodes that subtend an origin of the derived state of the discrete character are annotated. D) Depiction of the null distribution and the observed mean of producing nodes estimated from the data. In this example the producing nodes have a lower continuous value than expected if there is no relationship between the traits.
A more rigorous version of AncCond that builds its null
distribution using parametric bootstrapping instead of simple node
resampling. Rather than shuffling real node values, this function
simulates entire neutral evolutionary histories on the tree via
sim.history, puts each through the same stochastic-mapping
pipeline, and compares the real Grand Mean to the distribution of
neutral Grand Means.
library(ape)
library(phytools)
tree <- rcoal(30)
data <- data.frame(
species = tree$tip.label,
body_size = rnorm(30, 10, 2),
has_horn = sample(1:2, 30, replace = TRUE)
)
# Run parametric bootstrap with 50 null simulations, 50 maps each
result <- AncCondFast(tree, data, n_iter = 50, n_maps = 50)
hist(result$null_dist, main = "Null Distribution")
abline(v = result$obs_stat, col = "red", lwd = 2)
Estimates how fast a discrete character has been evolving at each tip of a phylogenetic tree. The rate for a tip is calculated as the absolute difference between the tip state and the reconstructed parent state divided by the branch length connecting them.
library(ape)
library(castor)
tree <- rcoal(10)
Q <- matrix(c(-0.2, 0.1, 0.1,
0.1, -0.2, 0.1,
0.1, 0.1, -0.2), 3, 3)
colnames(Q) <- rownames(Q) <- c("1", "2", "3")
tip.states <- setNames(sample(1:3, 10, replace = TRUE), tree$tip.label)
rates <- GetTipRates(tree, Q, tip.states)
When assembling data from different sources typos can sometimes cause a loss of perfect matches between trees and datasets. This function helps you find these close matches that can be hand curated to keep as many species as possible in your analysis.
data(hym.tree)
names <- c("Pepsis_elegans", "Plagiolepis_alluaudi", "Pheidele_lucreti", "Meliturgula_scriptifronsi", "Andrena_afimbriat")
FuzzyMatch(tree = hym.tree, data = names, max.dist=3)## Found 3 names that were close but imperfect matches
## name.in.data name.in.tree differences
## 1 Pheidele_lucreti Pheidole_lucretii 2
## 2 Meliturgula_scriptifronsi Meliturgula_scriptifrons 1
## 3 Andrena_afimbriat Andrena_afimbriata 1
SampleTrees(trees = system.file("trees.nex", package = "evobiR"),
burnin = .1, final.number = 20, format = 'new', prefix = 'sample')This function takes as its input a large collection of trees from a program like MrBayes or BEAST and allows the user to select the number of randomly drawn trees they wish to retrieve and to save them in either Newick or Nexus format.
Classifies each tree in a collection against a set of reference topologies and counts how many trees match each reference. This is useful after a Bayesian phylogenetic analysis when you want to know what fraction of posterior trees support each possible branching arrangement.
result <- countTrees("posterior_trees.nwk", "candidate_topologies.nwk")
result[[1]] # counts per reference topology
result[[2]] # which topology each individual tree matched
SuperMatrix(missing = "N", prefix = "DATASET2", save = TRUE)
This function reads all fasta format alignments in the working directory and constructs a single supermatrix that includes all taxa present in any of the fasta files and inserts missing symbols for taxa that are missing sequences for some loci. A list with two elements is returned. The first element contains partition data that records the alignment positions of each input fasta file in the combined supermatrix. The second element is a dataframe version of the supermatrix. If the argument save is set to TRUE then both of these files are also saved to the working directory.
The scaleTreeRates function analyzes heterogeneity in
rates of discrete character evolution across a phylogeny. It estimates a
scalar multiplier for each edge that modifies the overall transition
rate matrix, allowing rates to vary across the tree.
library(ape)
library(phytools)
## Fit rate heterogeneity model
tree <- rtree(20)
tip.states <- sample(1:2, 20, replace = TRUE)
names(tip.states) <- tree$tip.label
result <- scaleTreeRates(tree, tip.states, model = "ER", nbins = 3)
## Visualize with scaled branch lengths
plot(result, method = "multiply")
## Visualize with colored branches
plot(result, method = "color")The plot.phyloscaled function provides two visualization
methods: "multiply" scales branch lengths by their
associated scalar, and "color" assigns colors from a
diverging palette to indicate rate variation.
The make.simmap2 function is an extension of phytools’
make.simmap that adds rejection monitoring and a maximum
rejection limit per edge. When the rejection limit is exceeded, the edge
is assigned a “fail” state rather than stalling the simulation. The
fix.simmap function uses graph theory to resolve these
failed edges.
library(ape)
library(phytools)
tree <- rtree(20)
x <- sample(c("a","b"), 20, replace = TRUE)
names(x) <- tree$tip.label
## Run stochastic mapping with rejection limit
result <- make.simmap2(tree, x, model = "ER", nsim = 10,
monitor = FALSE, rejmax = 100000)
The functions CalcD and CalcPopD are implementations of the algorithm for detecting introgression in genomic data. Significance of the D-statistic can be calculated either through bootstrapping or jackknifing. Bootstrapping is appropriate for datasets where SNPs are unlinked, for instance unmapped RADSeq data. Jackknifing is the appropriate approach when SNPs are potentially in linkage, for instance gene alignments or genome alignments. The WinCalcD function is identical to CalcD but performs testing in windows along an alignment.
Durand, Eric Y., et al. Testing for ancient admixture between closely related populations. Molecular biology and evolution 28.8 (2011): 2239-2252.
Eaton, D. A. R., and R. H. Ree. 2013. Inferring phylogeny and introgression using RADseq data: An example from flowering plants (Pedicularis: Orobanchaceae). Syst. Biol. 62:689-706
##
## Sites in alignment = 1000
## Number of sites with ABBA pattern = 342
## Number of sites with BABA pattern = 520
##
## D raw statistic = -0.2064965
## [1] -0.2064965
## [1] "Sites in alignment = 416"
## [1] "Number of sites with ABBA or BABA patterns = 358"
## [1] "Number of ABBA or BABA sites that are still segregating in at least one population = 13"
## [1] "D statistic = -0.192012635379061"
## $d.stat
## [1] -0.1920126
##
## $align.length
## [1] 416
##
## $useful.sites
## [1] 358
##
## $seg.sites
## [1] 13
The WinCalcD function is identical to CalcD
but performs testing in sliding windows along the alignment, letting you
see how introgression signal varies across the genome.
results <- WinCalcD("my_alignment.fasta",
win.size = 500, step.size = 250)
plot(1:nrow(results), as.numeric(results$d), type = "l",
xlab = "Window", ylab = "D-statistic")
abline(h = 0, lty = 2, col = "red")
Calculate the variance effective population size due to unequal sex ratio. Formulas are available for both autosomal loci and X chromosome loci.
## [1] 266.6667
## [1] 225
This function computes the probability that a random chromosomal fusion joins a sex chromosome to an autosome. Two null models are available: an occurrence model (Anderson et al. 2020) and a fixation-weighted model that accounts for the different effective copy numbers of autosomes, X/Z, and Y/W chromosomes.
# Fixation-weighted probability of SA fusion for an XY system
# with 13 autosome pairs (Da = 26)
pSAF(n_auto = 13, n_x = 1, n_y = 1, mu = 0.4)## [1] 0.2622951
## [1] 0.1428571
Reads a FASTA alignment and writes a new file with sequences reordered by their starting position (where the first non-gap character appears). This is helpful when sequences cover different regions of a gene and you want to visualize the alignment more clearly. Optionally, reference sequences can be pinned to the top.
ReOrderAlignment("my_alignment.fasta", "reordered.fasta")
# Pin a reference sequence to the top
ReOrderAlignment("my_alignment.fasta", "reordered.fasta",
ref = "reference_seq")
Applies a function within a sliding window of a numeric vector. Both the step size and the window size can be set by the user. Sliding window analyses are important tools particularly during data exploration. Often we can find patterns at scales that we might miss otherwise. As an example lets look at the sunspot data included in R.
data(sunspot.month)
foo <- as.vector(sunspot.month)
plot(foo, ylab="Number of sunspots", xlab="record number")When we look at the data at this scale the pattern that really catches our attention is the 25 peaks that we see across these 260 years. This is the well documented 11-year sunspot cycle. However, our sun has longer cycles that are less evident in this graphing.
# first lets use the sliding window function to get the number sun spots
# average over 11 years and do this moving in 1 year steps through time
sunspots <- SlidingWindow(FUN = "mean",
data = foo,
window = 132,
step = 12,
strict = F)
# generate year labels using the length of the data
n.months <- length(foo)
start.year <- 1749
year.seq <- rep(start.year:(start.year + ceiling(n.months/12) - 1), each = 12)[1:n.months]
years <- round(SlidingWindow("mean",
data = year.seq,
window = 132,
step = 12,
strict=F))
{plot(x=years,y=sunspots, type="l", lwd=3)
abline(v=1810, col="red",lwd=3)
text(y=90, x=1810, "Dalton Min.", col="red",pos=4)}Now we can easily spot just how exceptional the Dalton minimum of the early 1800s was.
This function takes measurements of multiple traits and performs a linear regression and identifies those records with the largest and smallest residual. Originally it was written to perform a regression of horn size on body size allowing for high and low selection lines. It allows users to choose the trait to select on and the trait to control for. It also lets the user choose the number of individuals selected (strength of selection).
The first column of the data file should contain the identifier i.e. the specimen ID or vial that the measurement is from while the traits should be in the next two columns.
## Item Body.Length Horn.Length
## 1 1 3449.63 469.62
## 2 4 3113.88 217.51
## 3 7 3559.51 424.19
## 4 10 2941.22 118.12
## 5 13 3504.98 306.53
## 6 16 3449.12 401.48
## 7 19 3814.26 472.35
## 8 22 3264.59 256.03
## 9 25 3165.55 255.22
## 10 28 3364.54 256.32
We can then run the residual selection function and it will provide us with both a visual depiction of the data and will return a list with elements (high line and low line) providing us with ID numbers of selected individuals.
## $`high line`
## [1] 76 91 103 112 127 139 151 172 175 199 229 235 238 250 256
##
## $`low line`
## [1] 4 10 13 22 25 28 37 40 43 61 106 178 181 286
For questions or comments contact Heath Blackmon