Installing

The most recent version may be installed from github using the devtools package:

Installing from github

library(devtools)
install_github("coleoguy/evobiR", build_vignettes = TRUE)

## 
## Thank you for using evobiR!

## 
## To acknowledge our work, please cite the package:

##  Michelle M. Jonika, Maximos Chin, Nathan Anderson, Richard H. Adams, Jeffery P. Demuth, and Heath Blackmon. (2023).

##  EvobiR: Tools for comparative analyses and teaching evolutionary biology.

##  coleoguy/evobiR: EvobiR version 2.1

Comparative Methods

AncCond:

data(mite.trait)
data(trees.mite)
AncCond(trees[[1]], mite.trait)

This function uses stochastic mapping and ancestral state reconstruction to determine if the derived state of a binary trait originates when a continuous trait has an extreme value.

The four steps in the ancestral condition test. A) Ancestral states are estimated for the continuous character assuming a Brownian motion model of evolution. B) Possible evolutionary histories for the discrete trait are generated through stochastic mapping C) Nodes are categorized as either ancestral or derived, and ancestral nodes that subtend an origin of the derived state of the discrete character are annotated. D) Depiction of the null distribution and the observed mean of producing nodes estimated from the data. In this example the producing nodes have a lower continuous value than expected if there is no relationship between the traits.

FuzzyMatch

When assembling data from different sources typos can sometimes cause a loss of perfect matches between trees and datasets. This function helps you find these close matches that can be hand curated to keep as many species as possible in your analysis.

data(hym.tree)
names <- c("Pepsis_elegans", "Plagiolepis_alluaudi", "Pheidele_lucreti", "Meliturgula_scriptifronsi", "Andrena_afimbriat")
FuzzyMatch(tree = hym.tree, data = names, max.dist=3)

## Found 3 names that were close but imperfect matches

##                name.in.data             name.in.tree differences
## 1          Pheidele_lucreti        Pheidole_lucretii           2
## 2 Meliturgula_scriptifronsi Meliturgula_scriptifrons           1
## 3         Andrena_afimbriat       Andrena_afimbriata           1

SampleTrees

SampleTrees(trees = system.file("trees.nex", package = "evobiR"), 
            burnin = .1, final.number = 20, format = 'new', prefix = 'sample')

This function takes as its input a large collection of trees from a program like MrBayes or Beast and allows the user to select the number of randomly drawn trees they wish to retrieve and to save them in either newick or nexus format.

SuperMatrix

SuperMatrix(missing = "N", prefix = "DATASET2", save = T)

This function reads all fasta format alignments in the working directory and constructs a single supermatrix that includes all taxa present in any of the fasta files and inserts missing symbols for taxa that are missing sequences for some loci. A list with two elements is returned. The first element contains partition data that records the alignment positions of each input fasta file in the combined supermatrix. The second element is a dataframe version of the supermatrix. If the argument save is set to True then both of these files are also saved to the working directory.

Population Genetics

CalcD & WinCalcD

The functions CalcD and CalcPopD are implementations of the algorithm introgression in genomic data. Significance of the D-stat can be calculated either through bootstrapping or jackknifing. Bootstrapping is appropriate for datasets where SNPs are unlinked for instance unmapped RADSeq data. Jackknifing is the appropriate approach when SNPs are potentially in linkage for instance gene alignments or genome alignments. The WinCalcD function is identical to CalcD but performs testing in windows along an alignment.

Durand, Eric Y., et al. Testing for ancient admixture between closely related populations. Molecular biology and evolution 28.8 (2011): 2239-2252.

Eaton, D. A. R., and R. H. Ree. 2013. Inferring phylogeny and introgression using RADseq data: An example from flowering plants (Pedicularis: Orobanchaceae). Syst. Biol. 62:689-706

CalcD(alignment = system.file("1.fasta", package = "evobiR"), sig.test = "N")

## 
## Sites in alignment = 1000
## Number of sites with ABBA pattern = 342
## Number of sites with BABA pattern = 520
## 
## D raw statistic =  -0.2064965

## [1] -0.2064965

CalcPopD(alignment = system.file("3.fasta", package = "evobiR"), sig.test = "N")

## [1] "Sites in alignment = 416"
## [1] "Number of sites with ABBA or BABA patterns = 358"
## [1] "Number of ABBA or BABA sites that are still segregating in at least one population = 13"
## [1] "D statistic = -0.192012635379061"

Utility/Misc. Functions

Sliding window

Applies a function within a sliding window of a numeric vector. Both the step size and the window size can be set by the user. Sliding window analyses are important tools particularly during data exploration. Often we can find patterns at scales that we might miss otherwise. As an example lets look at the sunspot data included in R.

data(sunspot.month)
foo <- as.vector(sunspot.month)
plot(foo, ylab="Number of sunspots", xlab="record number")

When we look at the data at this scale the pattern that really catches our attention is the 25 peaks that we see across these 260 years. This is the well documented 11-year sunspot cycle. However, our sun has longer cycles that are less evident in this graphing.

# first lets use the sliding window function to get the number sun spots 

# average over 11 years and do this moving in 1 year steps through time
sunspots <- SlidingWindow(FUN = "mean", 
                          data = foo, 
                          window = 132, 
                          step = 12, 
                          strict = F)

# we repeat this on the years so we can plot agains a sensible x axis
years <- round(SlidingWindow("mean", 
                             data = rep(1749:2013, each=12)[1:3177], 
                             window = 132, 
                             step = 12, 
                             strict=F))
{plot(x=years,y=sunspots, type="l", lwd=3)
abline(v=1810, col="red",lwd=3)
text(y=90, x=1810, "Dalton Min.", col="red",pos=4)}

now we can easily spot just how exceptional the Dalton minimum of the early 1800s was.

ResSel

This function takes measurements of multiple traits and performs a linear regression and identifies those records with the largest and smallest residual. Originally it was written to perform a regression of horn size on body size allowing for high and low selection lines. It allows users to choose the trait to select on and the trait to control for. It also lets the user choose the number of individuals selected (strength of selection).

data <- read.csv(file = system.file("horn.beetle.csv", package = "evobiR"))

The first column of the data file should contain the identifier i.e. the specimen ID or vial that the measurement is from while the traits should be in the next two columns.

data[1:10,]

##    Item Body.Length Horn.Length
## 1     1     3449.63      469.62
## 2     4     3113.88      217.51
## 3     7     3559.51      424.19
## 4    10     2941.22      118.12
## 5    13     3504.98      306.53
## 6    16     3449.12      401.48
## 7    19     3814.26      472.35
## 8    22     3264.59      256.03
## 9    25     3165.55      255.22
## 10   28     3364.54      256.32

We can then run the residual selection function and it will provide us with both a visual depiction of the data and will return a list with elements (high line and low line providing us with ID numbers of selected individuals.

ResSel(data = data, traits = c(2,3), percent = 15, identifier = 1, model = "linear")

## $`high line`
##  [1]  76  91 103 112 127 139 151 172 175 199 229 235 238 250 256
## 
## $`low line`
##  [1]   4  10  13  22  25  28  37  40  43  61 106 178 181 286

For questions or comments contact Heath Blackmon

Evolutionary biology in R (evobiR) tutorial