Title: | Evolutionary Biology in R |
---|---|
Description: | Comparative analysis of continuous traits influencing discrete states, and utility tools to facilitate comparative analyses. Implementations of ABBA/BABA type statistics to test for introgression in genomic data. |
Authors: | Michelle M. Jonika, Maximos Chin, Nathan Anderson, Richard H. Adams, Jeffery P. Demuth, and Heath Blackmon |
Maintainer: | Heath Blackmon <[email protected]> |
License: | GPL (>=2) |
Version: | 2.1 |
Built: | 2024-10-12 06:57:49 UTC |
Source: | https://github.com/coleoguy/evobir |
evobiR is a collection of tools for use in evolutionary biology. Some of the functions manipulate data in a way not implemented by other functions while others calculate sequence statistics or perform simulations, either of data across trees or genetic and genomic simulations.
Package: | evobiR |
Type: | Package |
Version: | 1.1 |
Date: | 2017-5-05 |
License: | GPL (>=2) |
More information on evobiR is available at https://github.com/coleoguy/evobir
Heath Blackmon
Maintainer: Heath Blackmon <[email protected]>
This file contains simulated SNP data
Heath Blackmon
http://coleoguy.github.io/
This function uses ancestral state estimations for a discrete character based on stochastic mapping under an Mk2 model and ancestral state estimates for a continuous trait under a Brownian motion model to determine if transitions in the binary trait coincide with extreme values of a continuous trait.
AncCond(tree, data, mc = 1000, drop.state=NULL, mat=c(0,2,1,0), pi="equal", n.tails = 1, message = TRUE)
AncCond(tree, data, mc = 1000, drop.state=NULL, mat=c(0,2,1,0), pi="equal", n.tails = 1, message = TRUE)
tree |
tree of class phylo |
data |
a dataframe with 3 columns. The first should match the taxa names in the tree, the second should have the continuous trait values and the third the states for the binary character. The binary trait should be coded as 1 and 2 if one is ancestral then it must be coded as 1. |
mc |
the number of iterations to use in building the null distribution. |
drop.state |
|
mat |
a vector describing the possible transition for the discrete trait. The default is equivelant to APE's |
pi |
The probabilities for the binary trait at the root of the tree. The values possible are |
n.tails |
numeric value of 1 or 2 to indicate whether a 1 or 2 tailed p-value should be calculated |
message |
Logical value if |
This function uses ancestral state estimates to determine if the transitions in the binary trait are associated with extreme values of the continuous trait. This test can incorporate the possibility that the derived state of a binary character may lead to correlated selection in the continuous trait. If this is desired then the drop.state argument should be used to specify the derived state of the binary character that should not be used in the ancestral state estimation of the continuous trait.
Returns a list of length 4:
OriginatingVals |
the mean value(s) for the continuous trait at the transition points of the binary character |
NTrans |
the number of transitions in the binary character |
NullDist |
the null distribution(s) produced by simulation |
pval |
pvalue |
Nathan Anderson, Jeffery P. Demuth, Richard H. Adams, and Heath Blackmon
http://coleoguy.github.io/
## Not run: data(mite.trait) data(trees.mite) AncCond(trees[[1]], mite.trait) ## End(Not run)
## Not run: data(mite.trait) data(trees.mite) AncCond(trees[[1]], mite.trait) ## End(Not run)
These functions calculate Patterson's D-statistic to compare the frequencies of discordant SNP genealogies. These tests assume equal substitution rates and unlinked loci, D-statistics significantly different from 0 suggest that introgression has occurred.
CalcD(alignment = "alignment.fasta", sig.test = "N", ambig="D", block.size = 1000, replicate = 1000, align.format='fasta') CalcPopD(alignment = "alignment.fasta", sig.test = "N", ambig="D", block.size = 1000, replicate = 1000, align.format='fasta')
CalcD(alignment = "alignment.fasta", sig.test = "N", ambig="D", block.size = 1000, replicate = 1000, align.format='fasta') CalcPopD(alignment = "alignment.fasta", sig.test = "N", ambig="D", block.size = 1000, replicate = 1000, align.format='fasta')
alignment |
This is an alignment in fasta format. Sequences should be in the order: P1, P2, P3, Outgroup. In the case of CalcPopD sequence from each populations should have identical names see file 3.fasta for an example |
sig.test |
This indicates whether or if to test for significance. Options are "B" bootstrap, "J" jackknife, or "N" none. |
ambig |
This flag indicate how to deal with sequence ambiguities. Options are "D" drop all ambigous loci, "R" resolve each biallelic ambiguity, or "I" ignore ambiguity and perform analysis without checking sequences. If the argument "R"" is chosen there is becomes a degree of stochasticity in the analysis and the user should run the analysis more than once. Also it would be wise to compare this result to setting the argument to "D". |
block.size |
The number of sites to be dropped in the jackknife approach |
replicate |
Number of replicates to be used in estimating variance |
align.format |
a character string specifying the format of the alignment file : mase, clustal, phylip, fasta or msf |
The functions CalcD and CalcPopD are implementations of the algorithm described in Durand et al. 2011. Significance of the D-stat can be calculated either through bootstrapping or jackknifing. Bootstrapping is appropriate for datasets where SNPs are unlinked for instance unmapped RADSeq data. Jackknifing is the appropriate approach when SNPs are potentially in linkage for instance gene alignments or genome alignments.
Returns the number of each type of site, Z scores and p-values
Heath Blackmon
http://coleoguy.github.io/
Durand, Eric Y., et al. Testing for ancient admixture between closely related populations. Molecular biology and evolution 28.8 (2011): 2239-2252.
Eaton, D. A. R., and R. H. Ree. 2013. Inferring phylogeny and introgression using RADseq data: An example from flowering plants (Pedicularis: Orobanchaceae). Syst. Biol. 62:689-706
CalcD(alignment = system.file("1.fasta", package = "evobiR"), sig.test = "N") CalcPopD(alignment = system.file("3.fasta", package = "evobiR"), sig.test = "N")
CalcD(alignment = system.file("1.fasta", package = "evobiR"), sig.test = "N") CalcPopD(alignment = system.file("3.fasta", package = "evobiR"), sig.test = "N")
This function counts the number of times that a set of topologies is present in a collection of trees.
countTrees(collection = NULL, ref = NULL, classes = T, verbose=T)
countTrees(collection = NULL, ref = NULL, classes = T, verbose=T)
collection |
path to a collection of trees in a Newick format file |
ref |
path to a Newick format file with the topologies of interest |
classes |
if T then will return a vector with classification of each tree |
verbose |
returns intermediate progress messages if TRUE |
If classes is T returns a list with the first element being a numeric vector of the same length as the number of trees in the ref file. The elements of the returned vector correspond to the occurences of trees in the collection file that match the topologies supplied in the ref file. The second element is a vector the same length as the input tree collection and each tree is assigned a number based on the topology it matches.
Heath Blackmon
http://coleoguy.github.io/
This function facilitates automated resolution of failed edges in a modified stochastic map produced by make.simmap2
through application of graph theory implemented in igraph.
fix.simmap(hists, tips, transition.matrix)
fix.simmap(hists, tips, transition.matrix)
hists |
an object of class |
tips |
two column dataframe with first column listing species name as shown in tree tips and second column listing model state for each species |
transition.matrix |
square matrix describing possible transitions between states |
A object of class simmap
or multiSimmap
, see make.simmap
.
All edges which failed in the original stochastic maps should be resolved.
Maximos Chin, Matthew Marano, and Heath Blackmon
http://coleoguy.github.io/
When assembling data from different sources typos can sometimes cause a loss of perfect matches between trees and datasets. This function helps you find these close matches that can be hand curated to keep as many species as possible in your analysis.
FuzzyMatch(tree, data, max.dist)
FuzzyMatch(tree, data, max.dist)
tree |
a phylogenetic tree of the class "phylo". |
data |
character vector with the names from your dataset. |
max.dist |
This is the maximum number of characters that can differ between your tree and data and still be recognized as a close match. |
A dataframe with the following rows:
Name in data
Name in tree
Number of differences
Heath Blackmon
http://coleoguy.github.io/
data(hym.tree) names <- c("Pepsis_elegans", "Plagiolepis_alluaudi", "Pheidele_lucreti", "Meliturgula_scriptifronsi", "Andrena_afimbriat") FuzzyMatch(tree = hym.tree, data = names, max.dist=3)
data(hym.tree) names <- c("Pepsis_elegans", "Plagiolepis_alluaudi", "Pheidele_lucreti", "Meliturgula_scriptifronsi", "Andrena_afimbriat") FuzzyMatch(tree = hym.tree, data = names, max.dist=3)
This function calculates the variance effective population size due to unequal sex ratio. Formula are available for both autosomal loci and X chromosome loci.
getNe(males, females, locus)
getNe(males, females, locus)
males |
number of males |
females |
number of females |
locus |
"A" or "X" to denote the population size of interest is for autosomal locus or X chromosome locus respectively. |
Returns a numeric vector of length 1 which contains the variance effective population size.
Heath Blackmon
http://coleoguy.github.io/
getNe(males=100, females=200)
getNe(males=100, females=200)
This function calculates the rate of change from parent nodes to extant tips of a phylogeny.
GetTipRates(tree, Q, tip.states, hyper, p.mat)
GetTipRates(tree, Q, tip.states, hyper, p.mat)
tree |
an ultrametric tree of class phylo |
Q |
transition matrix containing estimated rates with column names |
tip.states |
An integer vector with a length equal to the number of species on the phylogeny. It should have values of 1 to N with N being the number of total states. Elements of the vector should match the tip names for the phylogeny. |
hyper |
logical vector of length one. TRUE indicates the model includes a binary hyperstate. Default is FALSE and indicates no binary hyperstate |
p.mat |
a probability matrix where each column represent a discrete state and each row is a species. Values in the matrix describe the probability of being in any of these states |
A named numeric vector with the rate for each tip in the phylogeny.
Michelle M. Jonika and Heath Blackmon
http://coleoguy.github.io/
A csv file containing measurements of horn and body size for the beetle Gnatocerus cornutus.
This function is an extension of the make.simmap
function of phytools which allows users to monitor rejections at specific edges during the simulation proces and set an upper limit on the number of rejections permitted per edge.
make.simmap2(tree, x, model="SYM", nsim=1 , monitor = FALSE, rejmax = NULL, rejint = 1000000, ...)
make.simmap2(tree, x, model="SYM", nsim=1 , monitor = FALSE, rejmax = NULL, rejint = 1000000, ...)
tree |
see |
x |
see |
model |
see |
nsim |
see |
monitor |
|
rejmax |
|
rejint |
|
... |
optional arguments. see |
A object of class simmap
or multiSimmap
, see make.simmap
.
In addition to the states present in the model, an additional state fail
in the maps
and mapped.edge
elements is assigned to edges which are skipped due to exceeding the rejection limit.
Matthew Marano, Maximos Chin, and Heath Blackmon
http://coleoguy.github.io/
A dataframe of sexual system and chromosome number data for mites. The first column has species names, the second column has diploid numbers, and the third column contains 0 and 1 to indicate diplodiploidy or haplodiploidy sex determination.
Thes three functions, Pfsa, Pfaa, and Pfss use sex chromosome systems and diploid autosome number to calculate the proportion of all fusions that are expected to be sex chromosome autosome fusions (Pfsa), autosome autosome fusions (Pfaa), and sex chromosome sex chromosome fusions (Pfss).
Pfsa(Da, scs, mud) Pfaa(Da, scs, mud) Pfss(Da, scs, mud)
Pfsa(Da, scs, mud) Pfaa(Da, scs, mud) Pfss(Da, scs, mud)
Da |
the diploid autosome count |
scs |
a text string describing the sex chromosome system for instance: "XO", "XXO", "XY", "ZO"", or "ZWW" |
mud |
the proportion of fusions that originate in females. |
This approach assumes that all chromosomes have equal probability of being involved in a fusion and that X and Y (or Z and W) chromosomes are not able to fuse. It will provide accurate results for any sex chromosome system that has any number of one sex chromosomes and either zero or one of the other. For instance it will work for XO, XY, XXY, XYYYY, but not for XXYY. It is applicable to both male and female heterogameic systems.
Returns a numeric vector of length 1 which contains the proportion of fusions expected to be of the specified type.
Nathan Anderson and Heath Blackmon
http://coleoguy.github.io/
Pfsa(Da = 26, scs = "XY", mud=0.4)
Pfsa(Da = 26, scs = "XY", mud=0.4)
This function provides two methods for visualizing a phyloscaled
tree produced by scaleTreeRates
.
## S3 method for class 'phyloscaled' plot(tree, method = "multiply", palette = "RdYlGn", edge.width = 1, cex = 1, show.tip.label = T)
## S3 method for class 'phyloscaled' plot(tree, method = "multiply", palette = "RdYlGn", edge.width = 1, cex = 1, show.tip.label = T)
tree |
a tree of class |
method |
a |
palette |
a |
edge.width |
numeric value that determines branch width |
cex |
numeric value for size |
show.tip.label |
logical indicating whether to print tip labels |
The two plotting methods currently supported are "multiply"
and "color"
.
multiply
takes the length of each edge on the phylogeny and multiplies it by the scalar associated with the edge before plotting the scaled tree.
color
assigns a color from a diverging palette to each edge depending on it's associated scalar. Currently supported palettes are any of the RColorBrewer palettes or the standard viridis palette (specified by string "viridis"
). Because RColorBrewer sets the maximum number of distinct colors for divergent palettes to be 11, any phylogeny which has greater than 11 unique scalar bins represented within it's edges must use a viridis palette. The ultrametric phylogeny is then plotted with each edge colored accordingly
A plotted phylogeny with edges either multiplied or colored by their associated scalars
Maximos Chin and Heath Blackmon
http://coleoguy.github.io/
This function re-orders aligned sequences with sequences based on the site of the alignment that they first have data. It also allows user to select reference sequences that will stay at the top of the alignmment regardless of starting position.
ReOrderAlignment(file, newfile, ref = "")
ReOrderAlignment(file, newfile, ref = "")
file |
the name of the fasta file which has sequences need to be re-ordered |
newfile |
name of the output file |
ref |
Either an integer or a characrter vector. Elements of the vector should match the sequences that should be kept at the top of the alignment. |
A fasta file
Sean Chien and Heath Blackmon
http://coleoguy.github.io/
This function takes measurements of multiple traits and performs a linear regression and identifies those records with the largest and smallest residual. Originally it was written to perform a regression of horn size on body size allowing for high and low selection lines.
ResSel(data, traits, percent = 10, identifier = 1, model = "linear")
ResSel(data, traits, percent = 10, identifier = 1, model = "linear")
data |
this is a dataframe with subject identifiers and phenotypic trait values |
traits |
a numeric vector indicating the column containing the predictor and response variables in that order |
percent |
the percentage of highest and lowest residuals that should be identified |
identifier |
the column which contains the record numbers to identify individuals |
model |
currently this is not used |
This function returns a list
high line |
the ID numbers for the individuals selected for the high line |
low line |
the ID numbers of the individuals selected for the low line |
Heath Blackmon
data <- read.csv(file = system.file("horn.beetle.csv", package = "evobiR")) ResSel(data = data, traits = c(2,3), percent = 15, identifier = 1, model = "linear")
data <- read.csv(file = system.file("horn.beetle.csv", package = "evobiR")) ResSel(data = data, traits = c(2,3), percent = 15, identifier = 1, model = "linear")
This function takes as its input a large collection of trees from a program like MrBayes or Beast and allows the user to select the number of randomly drawn trees they wish to retrieve
SampleTrees(trees, burnin, final.number, format, prefix)
SampleTrees(trees, burnin, final.number, format, prefix)
trees |
a nexus format file containing trees that the user wants to sample from |
burnin |
the proportion of trees to remove as burnin |
final.number |
the number of trees desired |
format |
options are "new" or "nex" indicating to save the trees in newick format or nexus format |
prefix |
a text string to assing to the new treefile name |
an object of the class "multiPhylo" is returned
Heath Blackmon
SampleTrees(trees = system.file("trees.nex", package = "evobiR"), burnin = .1, final.number = 20, format = 'new', prefix = 'sample')
SampleTrees(trees = system.file("trees.nex", package = "evobiR"), burnin = .1, final.number = 20, format = 'new', prefix = 'sample')
This function performs the phylogenetic methods for analysis of heterogenity in rates of discrete character evolution described in Jonika et al. (2023).
scaleTreeRates(tree, tip.states, model, fixedQ = NULL, max.ratio = 2, nbins = 10, max.transition = 1, var.start = FALSE, pi = "fitzjohn")
scaleTreeRates(tree, tip.states, model, fixedQ = NULL, max.ratio = 2, nbins = 10, max.transition = 1, var.start = FALSE, pi = "fitzjohn")
tree |
a tree of class |
tip.states |
a named vector of tip states for some discrete character which is associated with the phylogeny. Order can differ from order of tips on phylogeny |
model |
the model which should be used to perform likelihod calculations. This can either be a |
fixedQ |
optional argument to be used when Q-matrix with pre-estimated rates is available. Deafults to |
max.ratio |
|
nbins |
|
max.transition |
|
var.start |
|
pi |
|
A phylogeny of class phylo
and phyloscaled
. Phylogeny has all elements normally included in an object of class phylo
, with an additional element:
scalar |
a |
Maximos Chin and Heath Blackmon
http://coleoguy.github.io/
Applies a function within a sliding window of a numeric vector or matrix. Both the step size and the window size can be set by the user. For the matrix impelentation the step size and window size is constrained to be the same in both the X and Y dimensions.
SlidingWindow(FUN, data, window, step, strict)
SlidingWindow(FUN, data, window, step, strict)
FUN |
a function to be applied within each window. |
data |
a numerical vector or matrix |
window |
an integer setting the size of the window |
step |
an integer setting the size of step between windows |
strict |
TRUE or FALSE indicating whether validation testing should be performed |
If the input data is a vector then returns a vector of numeric values representing the application of the selected function within each window. If the input data is a matrix then returns a matrix of numeric values representing the application fo the selected function within each window.
Heath Blackmon
# vector example x1 <- rnorm(100, sd=3) z1 <- SlidingWindow(FUN="mean", data=x1, window=10, step=5, strict=TRUE) # matrix example x2 <- matrix(rnorm(10000),100,100) z2 <- SlidingWindow(FUN="mean", data=x2, window=10, step=5, strict=TRUE)
# vector example x1 <- rnorm(100, sd=3) z1 <- SlidingWindow(FUN="mean", data=x1, window=10, step=5, strict=TRUE) # matrix example x2 <- matrix(rnorm(10000),100,100) z2 <- SlidingWindow(FUN="mean", data=x2, window=10, step=5, strict=TRUE)
combines all alignments in a folder into a single supermatrix
SuperMatrix(missing = "-", prefix = "concatenated", save = T, input = "", format.in = "f", format.out = "f", concatenate = T)
SuperMatrix(missing = "-", prefix = "concatenated", save = T, input = "", format.in = "f", format.out = "f", concatenate = T)
missing |
the character to use when no data is available for a taxa |
prefix |
prefix for the resulting supermatrix |
save |
if True then supermatrix and partition file will be saved |
input |
a regular expression determining which files will be read in for example "*.fasta" will read in all files which end in ".fasta". Default is blank and will result in all files in the working directory being read in. |
format.in |
A character string specifying the format of the alignments to be read in. The argument is passed to read.dna in APE: "interleaved", "sequential", "clustal", or "fasta", or abbreviations of these are available. |
format.out |
A character string specifying the format for the supermatrix to be saved to. The argument is passed to write.dna in APE: "interleaved", "sequential", "clustal", or "fasta", or abbreviations of these are available. |
concatenate |
logical value when TRUE sequences are concatenated into a single fasta file. When set to FALSE sequences are saved as individual fasta files but are expanded to include all taxa in combined dataset. |
This function reads all alignments in the working directory and constructs a single supermatrix that includes all taxa present in any of the files and inserts missing symbols for taxa that are missing sequences for some loci.
A list with two elements is returned. The first element contains partition data that records the alignment positions of each input alignment file in the combined supermatrix. The second element is a dataframe version of the supermatrix. If the argument save is set to TRUE then both of these files are also saved to the working directory.
Heath Blackmon
## Not run: SuperMatrix(missing = "N", prefix = "DATASET2", save = T, format = "f") ## End(Not run)
## Not run: SuperMatrix(missing = "N", prefix = "DATASET2", save = T, format = "f") ## End(Not run)
These are trees from a previously published work on mite sexual system evolution.
This is a collection of 100 simulated phylogenetic trees with 10 tips each.
This functions calculate Patterson's D-statistic in windows.
WinCalcD(alignment = "alignment.fasta", win.size = 100, step.size=50, boot = F, replicate = 1000)
WinCalcD(alignment = "alignment.fasta", win.size = 100, step.size=50, boot = F, replicate = 1000)
alignment |
This is an alignment in fasta format |
win.size |
This is the size of the window used |
step.size |
This is the size of steps in the sliding window |
boot |
This indicates whether or not bootstrapping should be performed to estimate variance |
replicate |
Number of replicates to be used in estimating variance |
This function is just an extension of CalcD and calculates D statistic for windows.
Returns a table with the number of each type of site, Z scores and p-values for each window in the genome
Heath Blackmon
http://coleoguy.github.io/
Durand, Eric Y., et al. Testing for ancient admixture between closely related populations. Molecular biology and evolution 28.8 (2011): 2239-2252.
Eaton, D. A. R., and R. H. Ree. 2013. Inferring phylogeny and introgression using RADseq data: An example from flowering plants (Pedicularis: Orobanchaceae). Syst. Biol. 62:689-706
WinCalcD(alignment = system.file("1.fasta", package = "evobiR"), win.size=100, step.size=50, boot = TRUE, replicate=10)
WinCalcD(alignment = system.file("1.fasta", package = "evobiR"), win.size=100, step.size=50, boot = TRUE, replicate=10)