Title: | Interfaces to Phylogenetic Software in R |
---|---|
Description: | Functions that wrap popular phylogenetic software for sequence alignment, masking of sequence alignments, and estimation of phylogenies and ancestral character states. |
Authors: | Christoph Heibl [aut, cre], Natalie Cusimano [aut], Franz-Sebastian Krah [aut] |
Maintainer: | Christoph Heibl <[email protected]> |
License: | GPL-3 |
Version: | 0.0.12 |
Built: | 2024-11-21 04:20:16 UTC |
Source: | https://github.com/heibl/ips |
This package presents a set of functions that were formerly included in the phyloch package and which wrap popular phylogenetic software for sequence alignment, masking of sequence alignments, and estimation of phylogenies and ancestral character states.
Package: | ips |
Type: | Package |
Version: | 0.0.12 |
Date: | 2019-11-14 |
License: | GPL (>= 2) |
There are several functions for reading and writing DNA sequences in FASTA, PHYLIP, and NEXUS format: read.fas
, read.phy
, read.nex
, write.fas
, write.phy
, and write.nex
.
Some functions are available for integrating BEAST with R. XML input files for BEAST can be generated with rbeauti
. Two functions are designed to read TreeAnnotator output: read.beast
will render an object of class phylo
with additional node statistics appended as list elements. These additional node statistics will be lost be the subsequent use of ladderize
or rotate
(or similar functions that change the ordering of internal nodes).read.beast.table
also parses the TreeAnnotator output, but returns a matrix of node statistics.
This package itself does not implement techniques for phylogenetic analyses, but provides a series of wrappers for commonly used software packages. Sequence alignment can be done with the mafft
and prank
; cleaning of sequences with gblocks
and aliscore
. The function raxml
and mrbayes
are intended for phylogenetic tree search. Running mrbayes
with argument run = FALSE
can be used to create MrBayes-executable NEXUS files. Finally, wrappers is provided for Multistate
in the BayesTraits
package (see multistateML
and multistateMCMC
).
Several plotting functions (HPDbars
, clade.bars
, box.clades
, box.tips
, tip.color
, edge.color
have been moved to the viper package.
Natalie Cusimano, Christoph Heibl, Franz-Sebastian Krah, Maintainer: Christoph Heibl ([email protected])
Provides a interface to Aliscore, in order to remove problematic regions of a DNA sequence alignment.
aliscore(x, gaps = "5state", w = 6, r, t, l, s, o, exec)
aliscore(x, gaps = "5state", w = 6, r, t, l, s, o, exec)
x |
DNA sequences of class |
gaps |
A vector of mode |
w |
An integer giving the size of the sliding window. |
r |
An integer giving the number of random pairwise sequence
comparisons; defaults to |
t |
Not yet implemented. |
l |
Not yet implemented. |
s |
Not yet implemented. |
o |
A vector of mode |
exec |
A character string, giving the path to the Aliscore script. |
A matrix
of class "DNAbin"
.
This function was developed with ALISCORE version 2.
Misof, B. and K. Misof. 2009. A Monte Carlo approach successfully identifies randomness in multiple sequence alignments: a more objective means of data exclusion. Syst. Biol. 58: 21–34.
Kueck, P., K. Meusemann, J. Dambach, B. Thormann, B.M. von Reumont, J.W. Waegele and B. Misof. 2010. Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees. Frontiers in Zoology 7: 10.
Aliscore website: https://bonn.leibniz-lib.de/en/research/research-centres-and-groups/aliscore
mafft
and prank
for multiple sequence
alignment; gblocks
for another alignment masking algorithm.
data(ips.28S) ## Not run: aliscore(ips.28S)
data(ips.28S) ## Not run: aliscore(ips.28S)
Provides an interface to BLASTN
blastn(query, db)
blastn(query, db)
query |
An object of class |
db |
An object of class |
code.simple.gaps
takes an aligned DNA sequence matrix and codes the simple gaps, i.e. gaps that do not overlap with other gaps. The gapped positions are excluded from the matrix and the coded gap characters are appended to the matrix.
code.simple.gaps(x, append = TRUE)
code.simple.gaps(x, append = TRUE)
x |
An object of class |
append |
Logical. |
An object of class DNAbin.
Christoph Heibl
Simmons, M.P. & H. Ochoterena. 2000. Gaps as characters in sequence-based phylogenetic analyses. Systematic Biology 49(2): 369–381.
deleteGaps
, deleteEmptyCells
, trimEnds
Given a set of node support values (e.g., bootstrap proportions, posterior probabilities) and a certain threshold, all edges receiving less support than the threshold will be collapsed.
collapseUnsupportedEdges(phy, value = "node.label", cutoff)
collapseUnsupportedEdges(phy, value = "node.label", cutoff)
phy |
An object of class |
value |
A character string giving the name of the list element
that contains the support values; default is |
cutoff |
A numeric value giving the threshold below which edges will be collapsed. |
An object of class phylo
.
## phylogeny of bark beetles data(ips.tree) ## non-parametric bootstrap proportions (BP) ips.tree$node.label ## collapse clades with < 70 BP tr <- collapseUnsupportedEdges(ips.tree, "node.label", 70) ## show new topology par(mfrow = c(1, 2)) plot(ips.tree, no.margin = TRUE) nodelabels(ips.tree$node.label, cex = .5, frame = "n", adj = c(0, .5)) plot(tr, no.margin = TRUE)
## phylogeny of bark beetles data(ips.tree) ## non-parametric bootstrap proportions (BP) ips.tree$node.label ## collapse clades with < 70 BP tr <- collapseUnsupportedEdges(ips.tree, "node.label", 70) ## show new topology par(mfrow = c(1, 2)) plot(ips.tree, no.margin = TRUE) nodelabels(ips.tree$node.label, cex = .5, frame = "n", adj = c(0, .5)) plot(tr, no.margin = TRUE)
Graft polytomies on the tips of a class phylo
object.
combMyTree(phy, data, brlen = 0, annotate = FALSE)
combMyTree(phy, data, brlen = 0, annotate = FALSE)
phy |
An object of class |
data |
A data frame containing two columns. The entries of one column must be identical to the tip labels of the phylogeny; the other column contains the new tip labels. The column are matched automatically. |
brlen |
A numeric giving the branch lengths for the polytomies. |
annotate |
Logical, if |
An object of class phylo
with nrow(data)
tips.
data(ips.tree) ## Simulate varying number of intraspecific observations s <- sapply(1:Ntip(ips.tree), sample, x = 1:3, size = 1) x <- rep(ips.tree$tip.label, times = s) x <- data.frame(x, paste0(x, unlist(lapply(s, function(z) 1:z)))) ## Create polytomies tre <- combMyTree(ips.tree, x) plot(tre, no.margin = TRUE, cex =.5)
data(ips.tree) ## Simulate varying number of intraspecific observations s <- sapply(1:Ntip(ips.tree), sample, x = 1:3, size = 1) x <- rep(ips.tree$tip.label, times = s) x <- data.frame(x, paste0(x, unlist(lapply(s, function(z) 1:z)))) ## Create polytomies tre <- combMyTree(ips.tree, x) plot(tre, no.margin = TRUE, cex =.5)
Remove gaps ("-") and/or missing and ambiguous data ("N", "?") from a sample of DNA sequences.
del.miss(x)
del.miss(x)
x |
A matrix, a list, or a vector of class |
A list or a vector of class DNAbin
.
Remove indel positions (or gaps) from a DNA sequence
alignment. For faster execution, deleteGaps
handles sequences
in ape's bit-level coding scheme.
deleteGaps(x, gap.max = nrow(x) - 4)
deleteGaps(x, gap.max = nrow(x) - 4)
x |
An object of class |
gap.max |
An integer, which gives the maximum number of
gap characters ("-") that will be tolerated at any given alignment
position (column). Only values between |
The default, nmax = nrow(x) - 4
, removes all those
positions from the alignment, which contain at least four non-gap
characters, which is the minimum number of sequences needed to
produce a non-trivial unrooted topology. All gaps will be excluded
by selecting nmax = 0
and half of all gaps with nmax =
nrow(x) / 2
.
In contrast, del.gaps
removes all gap characters
from the alignment, so most probably the result will not be a set of
sequences of equal length and the matrix will be coerced to a list.
An object of class DNAbin.
code.simple.gaps
for coding of simple gaps,
del.gaps
for removal of all gap symbols from an
alignment, gblocks
and aliscore
for more
sophisticated methods of cleaning/masking alignments.
For any given internal node of a phylogeny, the function returns a vector containing the node numbers descending from that node.
descendants(phy, node, type = "t", ignore.tip = TRUE, labels = FALSE)
descendants(phy, node, type = "t", ignore.tip = TRUE, labels = FALSE)
phy |
an object of class |
node |
an integer giving the number of the internal node. |
type |
a character string, may be |
ignore.tip |
logical, if |
labels |
logical, determines if node labels are returned instead of node number, currently ignored unless |
A vector containing terminal node numbers or tip labels.
Christoph Heibl
# generate a random tree with 12 terminal and 11 internal nodes: tree <- rtree(12) # get the descendants of internal node 15: x <- descendants(tree, 15)
# generate a random tree with 12 terminal and 11 internal nodes: tree <- rtree(12) # get the descendants of internal node 15: x <- descendants(tree, 15)
Extract the indices of non-empty positions in a sample of DNA sequences to
DNAbin2index(x)
DNAbin2index(x)
x |
A matrix of class |
After subsetting (see e.g. DNAbin
), DNA sequence
alignments can contain rows and columns that consist entirely of missing
and/or ambiguous character states. identifyEmptyCells
will identify
and deleteEmptyCells
will delete all such rows (taxa) and columns
(characters) from a DNA sequence alignment.
deleteEmptyCells( DNAbin, margin = c(1, 2), nset = c("-", "n", "?"), quiet = FALSE ) identifyEmptyCells( DNAbin, margin = c(1, 2), nset = c("-", "n", "?"), quiet = FALSE )
deleteEmptyCells( DNAbin, margin = c(1, 2), nset = c("-", "n", "?"), quiet = FALSE ) identifyEmptyCells( DNAbin, margin = c(1, 2), nset = c("-", "n", "?"), quiet = FALSE )
DNAbin |
An object of class |
margin |
A vector giving the subscripts the function will be applied
over: |
nset |
A vector of mode character; rows or columns that consist
only of the characters given in |
quiet |
Logical: if set to |
For faster execution, deleteEmptyCells
handles sequences in
ape's bit-level coding scheme.
An object of class DNAbin
.
Cornish-Bowden, A. 1984. Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. Nucleic Acids Res. 13: 3021–3030.
# COX1 sequences of bark beetles data(ips.cox1) # introduce completely ambiguous rows and colums x <- as.character(ips.cox1[1:6, 1:60]) x[3, ] <- rep("n", 60) x[, 20:24] <- rep("-", 6) x <- as.DNAbin(x) image(x) # identify those rows and colums (id <- identifyEmptyCells(x)) xx <- x[-id$row, -id$col] # delete those rows and colums x <- deleteEmptyCells(x) image(x) identical(x, xx)
# COX1 sequences of bark beetles data(ips.cox1) # introduce completely ambiguous rows and colums x <- as.character(ips.cox1[1:6, 1:60]) x[3, ] <- rep("n", 60) x[, 20:24] <- rep("-", 6) x <- as.DNAbin(x) image(x) # identify those rows and colums (id <- identifyEmptyCells(x)) xx <- x[-id$row, -id$col] # delete those rows and colums x <- deleteEmptyCells(x) image(x) identical(x, xx)
noi
(node of interest)
identifies the most recent common ancestor (MRCA) and eoi
(edge of interest) its subtending stem-lineage
edge of one or more sets of taxa/tips.
eoi(phy, node, group, regex = FALSE, stem = FALSE, monophyletic = FALSE) noi(phy, group, regex = FALSE, stem = FALSE, monophyletic = FALSE)
eoi(phy, node, group, regex = FALSE, stem = FALSE, monophyletic = FALSE) noi(phy, group, regex = FALSE, stem = FALSE, monophyletic = FALSE)
phy |
An object of class |
node |
A vector of mode |
group |
A vector or list of vectors of mode |
regex |
A logical, if |
stem |
Logical, ... |
monophyletic |
Logical, ... |
A vector of mode "numeric"
containing node numbers.
mrca
; descendants
for the contrary
operation to noi
.
# molecular phylogeny of Ips bark beetles # --------------------------------------- data(ips.tree) ips.tree <- ladderize(ips.tree) ips.tree <- fixNodes(ips.tree) clade1 <- descendants(ips.tree, 44, labels = TRUE) mrca <- noi(ips.tree, clade1) stem_lineage <- eoi(ips.tree, mrca) ecol <- rep("black", Nedge(ips.tree)) ecol[stem_lineage] <- "red" plot(ips.tree, no.margin = TRUE, edge.color = ecol) nodelabels(node = mrca, pch = 22, col = "blue") #gen <- sapply(viperidae$tip.label, function(x) unlist(strsplit(x, "_"))[1]) #tax <- data.frame(genus = gen, species = viperidae$tip.label, row.names = NULL) # group can be a list # ------------------- #myclades <- split(tax$species, tax$genus) #nds <- noi(viperidae, myclades) #plot(viperidae) #nodeInfo(nds) # group might contain tip numbers # ------------------------------- #group <- list(c(17, 22), c(13, 1)) #plot(viperidae) #append2tips(phy, tips = unlist(group), pch = 19) #nds <- noi(viperidae, myclades) #nodeInfo(nds) # the 'group' argument can also take regular expressions # ------------------------------------------------------ #rex <- "aspis" #node <- noi(viperidae, rex, regex = TRUE) #plot.phylo(viperidae, tip.color = 0, edge.color = 0) #box.clades(viperidae, nodes = node, col = "#D2A6A7", align = "all") #plot.phylo.upon(viperidae) #nodelabels(node = node, pch = 21, cex = 1.2, col = "red", bg = "#D2A6A7") # if the 'group' argument is a list of elements of length 2, # n = length(group) nodes of interest will be returned # ---------------------------------------------------- #group <- list( # c("Vipera_berus", "Vipera_ursinii"), # c("Vipera_aspis_ssp._aspis", "Vipera_latastei"), # c("Vipera_ammodytes_ssp._ammodytes", # "Vipera_ammodytes_ssp._montandoni"), # c("Macrovipera_lebetina", "Vipera_wagneri") #) #clades <- noi(viperidae, group) #plot.phylo(viperidae, tip.color = 0, edge.color = 0) #box.clades(viperidae, nodes = clades, col = c("#FFFFA5", "#D2A6A7", # "#A7D2A5", "#A5A6D2"), align = "all") #plot.phylo.upon(viperidae)
# molecular phylogeny of Ips bark beetles # --------------------------------------- data(ips.tree) ips.tree <- ladderize(ips.tree) ips.tree <- fixNodes(ips.tree) clade1 <- descendants(ips.tree, 44, labels = TRUE) mrca <- noi(ips.tree, clade1) stem_lineage <- eoi(ips.tree, mrca) ecol <- rep("black", Nedge(ips.tree)) ecol[stem_lineage] <- "red" plot(ips.tree, no.margin = TRUE, edge.color = ecol) nodelabels(node = mrca, pch = 22, col = "blue") #gen <- sapply(viperidae$tip.label, function(x) unlist(strsplit(x, "_"))[1]) #tax <- data.frame(genus = gen, species = viperidae$tip.label, row.names = NULL) # group can be a list # ------------------- #myclades <- split(tax$species, tax$genus) #nds <- noi(viperidae, myclades) #plot(viperidae) #nodeInfo(nds) # group might contain tip numbers # ------------------------------- #group <- list(c(17, 22), c(13, 1)) #plot(viperidae) #append2tips(phy, tips = unlist(group), pch = 19) #nds <- noi(viperidae, myclades) #nodeInfo(nds) # the 'group' argument can also take regular expressions # ------------------------------------------------------ #rex <- "aspis" #node <- noi(viperidae, rex, regex = TRUE) #plot.phylo(viperidae, tip.color = 0, edge.color = 0) #box.clades(viperidae, nodes = node, col = "#D2A6A7", align = "all") #plot.phylo.upon(viperidae) #nodelabels(node = node, pch = 21, cex = 1.2, col = "red", bg = "#D2A6A7") # if the 'group' argument is a list of elements of length 2, # n = length(group) nodes of interest will be returned # ---------------------------------------------------- #group <- list( # c("Vipera_berus", "Vipera_ursinii"), # c("Vipera_aspis_ssp._aspis", "Vipera_latastei"), # c("Vipera_ammodytes_ssp._ammodytes", # "Vipera_ammodytes_ssp._montandoni"), # c("Macrovipera_lebetina", "Vipera_wagneri") #) #clades <- noi(viperidae, group) #plot.phylo(viperidae, tip.color = 0, edge.color = 0) #box.clades(viperidae, nodes = clades, col = c("#FFFFA5", "#D2A6A7", # "#A7D2A5", "#A5A6D2"), align = "all") #plot.phylo.upon(viperidae)
The function (re-)establishes the standard numbering of terminal and internal nodes in phylogenies represented as objects of class phylo
.
fixNodes(phy)
fixNodes(phy)
phy |
An object of class |
When reading phylogenetic trees from a NEXUS file that contains a translate
section, it can happen that the terminal nodes (tips, leaves) of the corresponding phylo
object are not numbered consecutively, which can be a problem in some downstream applications. You can use fixNodes
to get the correct order of terminal node numbers.
fixNodes
is also intended to re-establish the standard numbering of internal nodes and reorder all node value elements (e.g. node.label, posterior, ...) if a phylo
object has been modified by either root
, ladderize
, or rotate
.
An object of class phylo
.
fixNodes
has been completely rewritten for ips version 1.0-0. It should now run absolutely stable and is much quicker. Nevertheless, I recommend checking carefully the results of fixNodes
, until the function has been tested by a number of users. Then this comment will be removed.
Christoph Heibl
read.tree
,
read.nexus
, read.beast
for reading trees in NEWICK and NEXUS format; ladderize
and rotate
for tree manipulation; node.support
for plotting node support values has been moved to package viper.
Modify terminal edge lengths to create "exactly" (see Details) equal tip heights (sum of edge lengths from root to tip)
forceEqualTipHeights(phy, baseline = "mean")
forceEqualTipHeights(phy, baseline = "mean")
phy |
An object of class |
baseline |
A character string giving a function to calculate the
baseline tip height, e.g. |
What is "exactly" equal depends on the precision of the system
(.Machine
); in any case the resulting phylogeny will pass
is.ultrametric
with default arguments.
An object of class phylo
with changed terminal
edge lengths.
forceEqualTipHeights
is only intended to correct small rounding
errors in edge lengths, not to make an additive phylogeny ultrametric. For
the latter, see e.g. chronos
.
Provides a wrapper to Gblocks, a computer program written in ANSI C language that eliminates poorly aligned positions and divergent regions of an alignment of DNA or protein sequences. Gblocks selects conserved blocks from a multiple alignment according to a set of features of the alignment positions.
gblocks( x, b1 = 0.5, b2 = b1, b3 = ncol(x), b4 = 2, b5 = "a", target = "alignment", exec )
gblocks( x, b1 = 0.5, b2 = b1, b3 = ncol(x), b4 = 2, b5 = "a", target = "alignment", exec )
x |
A matrix of DNA sequences of classes |
b1 |
A real number, the minimum number of sequences for a conserved position given as a fraction. Values between 0.5 and 1.0 are allowed. Larger values will decrease the number of selected positions, i.e. are more conservative. Defaults to 0.5 |
b2 |
A real number, the minimum number of sequences for a flank
position given as a fraction. Values must be equal or larger than
|
b3 |
An integer, the maximum number of contiguous nonconserved positions; any integer is allowed. Larger values will increase the number of selected position, i.e. are less conservative. Defaults to the number of positions in the alignment. |
b4 |
An integer, the minimum length of a block, any integer equal to or bigger than 2 is allowed. Larger values will decrease the number of selected positions, i.e. are more conservative. Defaults to 2. |
b5 |
A character string indicating the treatment of gap
positions. Three choices are possible. 1. |
target |
A vector of mode |
exec |
A character string indicating the path to the GBLOCKS executable. |
Explanation of the routine taken from the Online Documentation: First, the degree of conservation of every positions of the multiple alignment is evaluated and classified as nonconserved, conserved, or highly conserved. All stretches of contiguous nonconserved positions bigger than a certain value (b3) are rejected. In such stretches, alignments are normally ambiguous and, even when in some cases a unique alignment could be given, multiple hidden substitutions make them inadequate for phylogenetic analysis. In the remaining blocks, flanks are examined and positions are removed until blocks are surrounded by highly conserved positions at both flanks. This way, selected blocks are anchored by positions that can be aligned with high confidence. Then, all gap positions -that can be defined in three different ways (b5)- are removed. Furthermore, nonconserved positions adjacent to a gap position are also eliminated until a conserved position is reached, because regions adjacent to a gap are the most difficult to align. Finally, small blocks (falling below the limit of b4) remaining after gap cleaning are also removed.
A matrix
of class "DNAbin"
gblocks
was last updated and tested to work with Gblocks 0.91b.
If you have problems getting the function to work with a newer version of
Gblocks, please contact the package maintainer.
Castresana, J. 2000. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Molecular Biology and Evolution 17, 540-552.
Talavera, G., and J. Castresana. 2007. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Systematic Biology 56, 564-577.
Gblocks website: https://www.biologiaevolutiva.org/jcastresana/Gblocks.html
mafft
and prank
for multiple sequence
alignment; aliscore
for another alignment masking algorithm.
data(ips.28S) ## Not run: gblocks(ips.28S)
data(ips.28S) ## Not run: gblocks(ips.28S)
Use indices of non-empty positions to convert a list of DNA sequences into a matrix.
index2DNAbin(DNAbin, index)
index2DNAbin(DNAbin, index)
DNAbin |
A list of class |
index |
A list of integers containing the indices of base positions. |
This DNA alignment contains 376 positions of 42 sequences of 16S ribosomal DNA of the bark beetle genera Ips, Orthotomicus, and Pityogenes (Scolytinae, Curculionidae, Coleoptera).
data(ips.16S)
data(ips.16S)
The sequences are stored in binary format (see DNAbin
).
The sequences were downloaded and assembled from the Nucleotide repository at GenBank on February 8, 2014.
The nucleotide database on the NCBI website: https://www.ncbi.nlm.nih.gov/nuccore
data(ips.16S)
data(ips.16S)
This DNA alignment contains 562 positions of 28 sequences of 28S ribosomal DNA of the bark beetle genus Ips (Scolytinae, Curculionidae, Coleoptera).
data(ips.28S)
data(ips.28S)
The sequences are stored in binary format (see DNAbin
).
The sequences were downloaded and assembled from the Nucleotide repository at GenBank on February 8, 2014.
The nucleotide database on the NCBI website: https://www.ncbi.nlm.nih.gov/nuccore
data(ips.28S)
data(ips.28S)
This DNA alignment contains 770 positions of 26 sequences of cox1 of the bark beetle genera Ips, Orthotomicus, and Pityogenes (Scolytinae, Curculionidae, Coleoptera).
data(ips.cox1)
data(ips.cox1)
The sequences are stored in binary format (see DNAbin
).
The sequences were downloaded and assembled from the Nucleotide repository at GenBank on February 8, 2014.
The nucleotide database on the NCBI website: https://www.ncbi.nlm.nih.gov/nuccore
data(ips.cox1)
data(ips.cox1)
Phylogenetic tree of bark beetles (genus Ips).
data(ips.tree)
data(ips.tree)
The format is: List of 5 $ edge : int [1:72, 1:2] 38 39 39 40 41 42 42 43 44 45 ... $ Nnode : int 36 $ tip.label : chr [1:37] "Ips_acuminatus" "Ips_duplicatus" "Ips_integer" "Ips_plastographus" ... $ edge.length: num [1:72] 0.2806 0.0727 0.0295 0.0097 0.021 ... $ node.label : chr [1:36] "" "100" "21" "12" ... - attr(*, "class")= chr "phylo" - attr(*, "order")= chr "cladewise"
data(ips.tree) plot(ips.tree)
data(ips.tree) plot(ips.tree)
This function is a wrapper for MAFFT and can be used for (profile) aligning of DNA and amino acid sequences.
mafft( x, y, add, method = "auto", maxiterate = 0, op = 1.53, ep = 0, gt = NULL, options, thread = -1, exec, quiet, file )
mafft( x, y, add, method = "auto", maxiterate = 0, op = 1.53, ep = 0, gt = NULL, options, thread = -1, exec, quiet, file )
x |
An object of class |
y |
An object of class |
add |
A character string giving the method used for adding |
method |
A character string giving the alignment method. Available
accuracy-oriented methods for less than 200 sequences are
|
maxiterate |
An integer giving the number of cycles of iterative
refinement to perform. Possible choices are |
op |
A numeric giving the |
ep |
A numeric giving the offset value, which works like |
gt |
An object of class |
options |
A vector of mode character specifying additional arguments to
MAFFT, that are not included in |
thread |
Integer giving the number of physical cores MAFFT should use;
with |
exec |
A character string giving the path to the MAFFT executable
including its name, e.g. something like |
quiet |
Logical, if set to |
file |
A character string indicating the filename of the output FASTA
file; if this is missing the the alignment will be returned as matrix of
class |
"localpair"
selects the L-INS-i algorithm, probably
most accurate; recommended for <200 sequences; iterative refinement method
incorporating local pairwise alignment information.
"globalpair"
selects the G-INS-i algorithm suitable for
sequences of similar lengths; recommended for <200 sequences; iterative
refinement method incorporating global pairwise alignment information.
"genafpair"
selects the E-INS-i algorithm suitable for
sequences containing large unalignable regions; recommended for <200
sequences.
"retree 1"
selects the FFT-NS-1 algorithm, the simplest
progressive option in MAFFT; recommended for >200 sequences.
"retree 2"
selects the FFT-NS-2 algorithm that uses a second
iteration of alignment based on a guide tree computed from an FFT-NS-1
alignment; this is the default in MAFFT; recommended for >200 sequences.
A matrix
of class "DNAbin"
or "AAbin"
.
mafft
was last updated and tested to work with MAFFT 7.205. If
you have problems getting the function to work with a newer version of
MAFFT, please contact the package maintainer.
Katoh, K. and H. Toh. 2008. Recent developments in the MAFFT multiple sequence alignment program. Briefings in Bioinformatics 9: 286-298.
Katoh, K., K.-i. Kuma, H. Toh, and T. Miyata. 2005. Mafft version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Research 33: 511–518.
Katoh, K., K. Misawa, K.-i. Kuma, and T. Miyata. 2002. Mafft: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleid Acids Research 30: 3059–3066.
https://mafft.cbrc.jp/alignment/software/index.html
read.fas
to import DNA sequences; prank
for another alignment algorithm; gblocks
and
aliscore
for alignment cleaning.
Merge two or more DNA or amino acid sequence alignments by profile alignment with MAFFT.
mafft.merge(subMSA, method = "auto", gt, thread = -1, exec, quiet = TRUE)
mafft.merge(subMSA, method = "auto", gt, thread = -1, exec, quiet = TRUE)
subMSA |
|
method |
A character string giving the alignment method. Available
accuracy-oriented methods for less than 200 sequences are
|
gt |
An object of class |
thread |
Integer giving the number of physical cores MAFFT should use;
with |
exec |
A character string giving the path to the MAFFT executable
including its name, e.g. something like |
quiet |
Logical, if set to |
An object of class "DNAbin"
or "AAbin"
.
Provides a wrapper for Bayesian phylogenetic tree search through MrBayes (Huelsenbeck & Ronquist, 2001; Ronquist & Huelsenbeck, 2003).
mrbayes( x, file = "", lset, prset, mcmc, unlink, constraint, burnin = 10, contype = "allcompat", exec, run = FALSE )
mrbayes( x, file = "", lset, prset, mcmc, unlink, constraint, burnin = 10, contype = "allcompat", exec, run = FALSE )
x |
An object of class |
file |
A character string, giving the name of the MrBayes input file. |
lset |
A list as returned by |
prset |
A list as returned by |
mcmc |
A list as returned by |
unlink |
xxx |
constraint |
xxx |
burnin |
An integer; the number of samples from the MCMC to be discarded prior to further analysis. |
contype |
A character string; the type of consensus tree calculated from
the posterior distribution of trees: either |
exec |
A character string giving the full path of the MrBayes program. |
run |
Logical; |
mrbayes
was last updated and tested with MrBayes
v3.2.2 under R 3.1.0 on a x86_64-apple-darwin10.8.0 (64-bit)
platform. It is intended to offer a simply parameterized building block for
larger scripts.
None; a NEXUS file with MrBayes block is written to a file and, if
run = TRUE
, the MCMC runs in MrBayes are started.
J. P. Huelsenbeck & Ronquist F. 2001. MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics 17: 754-755. Ronquist F. & J. P. Huelsenbeck. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Biometrics 19: 1572-1574. MrBayes website: https://mrbayes.sourceforge.net/.
mafft
and prank
for sequence alignment;
raxml
for maximum likelihood tree search.
data(ips.cox1) x <- ips.cox1[, 100:140] # tiny alignment mrbayes(x, file = "", mcmc = mrbayes.mcmc(ngen = 100), run = FALSE) ## Not run: library(phangorn) tree <- rtree(10) Y1 <- simSeq(tree, l = 20) Y2 <- simSeq(tree, l = 20, type = "USER", levels=c("0", "1")) Y <- cbind(as.character(Y1), as.character(Y2)) ## End(Not run)
data(ips.cox1) x <- ips.cox1[, 100:140] # tiny alignment mrbayes(x, file = "", mcmc = mrbayes.mcmc(ngen = 100), run = FALSE) ## Not run: library(phangorn) tree <- rtree(10) Y1 <- simSeq(tree, l = 20) Y2 <- simSeq(tree, l = 20, type = "USER", levels=c("0", "1")) Y <- cbind(as.character(Y1), as.character(Y2)) ## End(Not run)
Set model parameters for mrbayes
.
mrbayes.lset(..., partition)
mrbayes.lset(..., partition)
... |
arguments in |
partition |
a character string giving the labelling for a partion. |
a list containing a subset (including the empty and the full set) of model parameters.
"4by4"
, "doublet"
, "codon"
, or "protein"
.
1
, 2
, 6
, or "mixed"
.
"universal"
, "vertmt"
, "mycoplasma"
, "yeast"
, "ciliates"
, or "metmt"
.
"haploid"
, "diploid"
, or "zlinked"
.
"equal"
, "gamma"
, "propinv"
,"invgamma"
, or "adgamma"
.
1-24
1-24
"equal"
, "ny98"
, or "m3"
.
"no"
or "yes"
.
"all"
, "variable"
, "noabsencesites"
, or "nopresencesites"
.
"no"
or "yes"
.
Christoph Heibl
J.P. Huelsenbeck & Ronquist F. 2001. MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics 17: 754-755.
Ronquist F. & J.P. Huelsenbeck. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Biometrics 19: 1572-1574.
MrBayes website: https://mrbayes.sourceforge.net/.
mrbayes.prset
to set prior distributions, mrbayes.mcmc
to set parameters of the Markov chain Monte Carlo (MCMC), and mrbayes
to run MrBayes locally or prepare input files for a computer cluster.
## F81 mrbayes.lset(nst = 2) ## GTR + Gamma mrbayes.lset(nst = 6, rates = "gamma") ## GTR + Gamma + I mrbayes.lset(nst = 6, rates = "invgamma")
## F81 mrbayes.lset(nst = 2) ## GTR + Gamma mrbayes.lset(nst = 6, rates = "gamma") ## GTR + Gamma + I mrbayes.lset(nst = 6, rates = "invgamma")
Set Markov chain Monte Carlo (MCMC) parameters for mrbayes
.
mrbayes.mcmc(...)
mrbayes.mcmc(...)
... |
arguments in |
a list containing a subset (including the empty and the full set) of model parameters.
"NUMERIC"
"NUMERIC"
"NUMERIC"
"NUMERIC"
"NUMERIC"
"NUMERIC"
"NUMERIC"
"NUMERIC"
"yes"
or "no"
"NUMERIC"
"yes"
or "no"
"NUMERIC"
"avgstddev"
or "maxstddev"
"NUMERIC"
"yes"
or "no"
"yes"
or "no"
"yes"
or "no"
"NUMERIC"
"NUMERIC"
"yes"
or "no"
"NUMERIC"
"yes"
or "no"
"yes"
or "no"
"NUMERIC"
"current"
or "reset"
"current"
, "random"
, or "parsimony"
"NUMERIC"
"yes"
or "no"
"yes"
or "no"
"yes"
or "no"
"yes"
or "no"
"NUMERIC"
The parameters reweight and filename cannot be set via mrbayes.mcmc
.
Christoph Heibl
J.P. Huelsenbeck & Ronquist F. 2001. MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics 17: 754-755.
Ronquist F. & J.P. Huelsenbeck. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Biometrics 19: 1572-1574.
MrBayes website: https://mrbayes.sourceforge.net/.
mrbayes.lset
to set model parameters, mrbayes.prset
to set prior distributions, and mrbayes
to run MrBayes locally or prepare input files for a computer cluster.
mrbayes.mcmc()
mrbayes.mcmc()
Set prior distributions for mrbayes
.
mrbayes.prset(...)
mrbayes.prset(...)
... |
arguments in |
a list of length zero (see 'Note')
This function currently returns an empty set of prior distribution parameters, i.e., you cannot change the MrBayes default parameters.
Christoph Heibl
J.P. Huelsenbeck & Ronquist F. 2001. MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics 17: 754-755.
Ronquist F. & J.P. Huelsenbeck. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Biometrics 19: 1572-1574.
MrBayes website: https://mrbayes.sourceforge.net/.
mrbayes.lset
to set model parameters, mrbayes.mcmc
to set parameters of the Markov chain Monte Carlo (MCMC), and mrbayes
to run MrBayes locally or prepare input files for a computer cluster.
mrbayes.prset()
mrbayes.prset()
These functions provide wrappers to BayesMultiState
in the BayesTraits package written by Mark Pagel and Andrew Meade.
multistateML(phy, traits, model = "ARD", anc.states = TRUE, path = "/Applications/BayesTraits", dir = NULL) multistateMCMC(phy, traits, model = "ARD", anc.states = TRUE, rd = 2, rjhp = NULL, fixNodes = NULL, it = 1e+05, bi = 10000, sa = 1000, path = "/Applications/BayesTraits", dir = NULL)
multistateML(phy, traits, model = "ARD", anc.states = TRUE, path = "/Applications/BayesTraits", dir = NULL) multistateMCMC(phy, traits, model = "ARD", anc.states = TRUE, rd = 2, rjhp = NULL, fixNodes = NULL, it = 1e+05, bi = 10000, sa = 1000, path = "/Applications/BayesTraits", dir = NULL)
phy |
an object of class |
traits |
a |
model |
a character string to select a model of rate evolution. One of |
anc.states |
either |
rd |
a real number, giving the |
rjhp |
a character string giving the details of priors and hyperpriors for the reversible jump MCMC (rjMCMC). If left |
fixNodes |
a list giving fixed character states of certain internal nodes. This argument corresponds to the |
it |
numeric, sets the number of iterations to run the MCMC for. |
bi |
numeric, sets the number of iterations of the MCMC that will be discarded as burn-in. |
sa |
numeric, sets the the sample period in the MCMC. |
path |
a character string giving the path to executables in the BayesTraits package. |
dir |
a character string giving a directory name where the input and output files will be stored. The directory will be created by |
Christoph Heibl
The BayesTraits manual: http://www.evolution.reading.ac.uk/BayesTraitsV4.1.1/Files/BayesTraitsV4.1.1-Manual.pdf.
Pagel, M., A. Meade, and D. Barker. 2004. Bayesian estimation of ancestral character states on phylogenies. Syst. Biol. 53: 673-684.
Pagel, M. and A. Meade. 2006. Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte Carlo. Am. Nat. 167: 808-825.
Finds all pairs of adjacent nodes, i.e. nodes separated by only one edge, in a minimum spanning tree
neighboringPairs(mst)
neighboringPairs(mst)
mst |
An object of class |
Counts the number of tips of a given clade of a phylogenetic tree.
ntip(phy, node)
ntip(phy, node)
phy |
An object of class |
node |
An integer given the number of an internal node. |
An integer giving the number of tips.
set.seed(1234) tr <- rtree(12) plot(tr); nodelabels() ntip(tr, 16)
set.seed(1234) tr <- rtree(12) plot(tr); nodelabels() ntip(tr, 16)
Provides a wrapper to the PartitionFinder software.
partitionfinder( alignment, user.tree, branchlengths = "linked", models = "all", model.selection = "BIC", search = "greedy", exec = "/Applications/PartitionFinderV1.1.1_Mac/PartitionFinder.py" )
partitionfinder( alignment, user.tree, branchlengths = "linked", models = "all", model.selection = "BIC", search = "greedy", exec = "/Applications/PartitionFinderV1.1.1_Mac/PartitionFinder.py" )
alignment |
A |
user.tree |
A |
branchlengths |
A |
models |
A |
model.selection |
A |
search |
A |
exec |
A character string giving the path to the executable (python script). |
Lanfear, R., B. Calcott, S.Y.W. Ho, and S. Guindon. 2012. PartitionFinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Molecular Biology and Evolution 29: 1695-1701.
Lanfear, R., B. Calcott, K. David, C. Mayer, and A. Stamatakis. 2014. Selecting optimal partitioning schemes for phylogenomic datasets. BMC Evolutionary Biology 14: 82.
This function is a wrapper for PATHd8 and can be used for phylogenetic dating, especially of large trees
pathd8(phy, exec = "/Applications/PATHd8/PATHd8", seql, calibration)
pathd8(phy, exec = "/Applications/PATHd8/PATHd8", seql, calibration)
phy |
An object of class |
exec |
A character string giving the path to the PATHd8 program. |
seql |
sequence length of alignment |
calibration |
A data frame with 4 columns and as many rows as calibration points. Columns are: taxon 1; taxon 2; one of c("minage", "maxage", "fixage"); age. |
tree list of ultrametric trees returned from PATHd8 of class
phylo
. First tree is PATHd8 chronogram, which is a calibrated
ultrametric tree. Second is a PATH tree, which is a ultrametric tree
without calibration.
Franz-Sebastian Krah
Britton et al (2006). PATHd8—a new method for estimating divergence times in large phylogenetic trees without a molecular clock. Available from the authors (www.math.su.se/PATHd8)
Britton et al. (2007). Estimating divergence times in large phylogenetic trees. Systematic biology. 56:741–752
## Not run: ## This example is taken from the PATHD8 manual cal <- rbind(cal1 = c("Rat", "Ostrich", "minage", 260), cal2 = c("Human", "Platypus", "fixage", 125), cal3 = c("Alligator", "Ostrich", "minage", 150)) colnames(cal) = c("tax1", "tax2", "age_type", "age") phy <- read.tree(text = paste0("((((Rat:0.007148,Human:0.001808):0.024345,", "Platypus:0.016588):0.012920,(Ostrich:0.018119,", "Alligator:0. 006232):0.004708):0.028037,Frog:0);") seql <- 1823 pathd8(phy, exec = "/Applications/PATHd8/PATHd8", seql = seql, calibration = cal) ## End(Not run)
## Not run: ## This example is taken from the PATHD8 manual cal <- rbind(cal1 = c("Rat", "Ostrich", "minage", 260), cal2 = c("Human", "Platypus", "fixage", 125), cal3 = c("Alligator", "Ostrich", "minage", 150)) colnames(cal) = c("tax1", "tax2", "age_type", "age") phy <- read.tree(text = paste0("((((Rat:0.007148,Human:0.001808):0.024345,", "Platypus:0.016588):0.012920,(Ostrich:0.018119,", "Alligator:0. 006232):0.004708):0.028037,Frog:0);") seql <- 1823 pathd8(phy, exec = "/Applications/PATHd8/PATHd8", seql = seql, calibration = cal) ## End(Not run)
Converts a phylogenetic tree of class "phylo"
to a format
usable as a guide tree by MAFFT. This function is called internally by
mafft
.
phylo2mafft(phy, file)
phylo2mafft(phy, file)
phy |
A phylogenetic tree of class |
file |
A character string giving a filename. May be missing, in which case the results are only printed on the screen. |
A matrix coding the MAFFT-formatted tree, as a side effect the same
matrix is written to file
.
The MAFFT website: https://mafft.cbrc.jp/alignment/software/index.html
mafft
for an interface to MAFFT.
Converts a phylogenetic tree (class phylo
) into a
minimum spanning tree (class mst
).
phylo2mst(phy)
phylo2mst(phy)
phy |
An object of class |
The current version of phylo2mst
does not handle polytomies
and does not incorporate branch length information. Note that topological
information is lost during the conversion.
phy <- rtree(12) plot(phy) mst <- phylo2mst(phy) plot(mst)
phy <- rtree(12) plot(phy) mst <- phylo2mst(phy) plot(mst)
Returns the number or positions of potentially-informative (parsimony-informative, phylogenetically-informative) sites in DNA sequence alignment.
pis(x, what = "fraction", use.ambiguities = FALSE)
pis(x, what = "fraction", use.ambiguities = FALSE)
x |
An object of class |
what |
Either of |
use.ambiguities |
Not yet available. |
Numeric (depending on what
, the number, fraction, or indices
of potentially-informative nucleotide sites).
data(ips.16S) # number of potentially-informative sites: pis(ips.16S, what = "abs") # proportion of potentially-informative sites: pis(ips.16S, what = "frac") # indices of potentially-informative sites: pis(ips.16S, what = "ind")
data(ips.16S) # number of potentially-informative sites: pis(ips.16S, what = "abs") # proportion of potentially-informative sites: pis(ips.16S, what = "frac") # indices of potentially-informative sites: pis(ips.16S, what = "ind")
DNA sequence Alignment Using the program PRANK.
prank(x, outfile, guidetree = NULL, gaprate = 0.025, gapext = 0.75, path)
prank(x, outfile, guidetree = NULL, gaprate = 0.025, gapext = 0.75, path)
x |
an object of class |
outfile |
a character string giving a name for the output file. |
guidetree |
an object of class |
gaprate |
numeric giving the gap opening rate; defaults to 0.025. |
gapext |
numeric giving the gap extension penalty; defaults to 0.75. |
path |
a character string indicating the path to the PRANK executable. |
matrix
of class "DNAbin"
prank
was last updated and tested to work with PRANK v. 120814 on Windows XP. If you have problems getting the function to work with a newer version of PRANK, contact the package maintainer.
http://wasabiapp.org/software/prank/
read.fas
to import DNA sequences;
mafft
for another alignment algorithm;
gblocks
and aliscore
for alignment cleaning.
Provides an interface to the C program RAxML (see Reference section) for maximum likelihood estimation of tree topology and/or branch lengths, rapid and conventional non-parametric bootstrapping, mapping splits onto individual topologies, and a lot more. See the RAxML manual for details, especially if you are a new user of RAxML.
raxml( DNAbin, m = "GTRCAT", f, N, p, b, x, k, weights, partitions, outgroup, backbone = NULL, file = paste0("fromR_", Sys.Date()), exec, threads )
raxml( DNAbin, m = "GTRCAT", f, N, p, b, x, k, weights, partitions, outgroup, backbone = NULL, file = paste0("fromR_", Sys.Date()), exec, threads )
DNAbin |
A matrix of DNA sequences of class |
m |
A vector of mode |
f |
A vector of mode |
N |
Either of mode |
p |
Integer, setting a random seed for the parsimony starting trees. |
b |
Integer, setting a random seed for bootstrapping. |
x |
Integer, setting a random seed for rapid bootstrapping. |
k |
Logical, if |
weights |
A vector of mode |
partitions |
A data frame giving the partitions of the alignment. |
outgroup |
A vector of mode |
backbone |
A |
file |
A vector of mode |
exec |
A vector of mode |
threads |
Integer, giving the number of parallel threads to use (PTHREADS only). |
There are some limitations of this wrapper compared to RAxML run directly from the command line.
Only DNA is allowed as data type.
Option f
can only take a limited number of values
(d
, a
).
RAxML needs the specification of random seeds for parsimony estimation of
starting trees and for bootstrap resampling. The corresponding argument
names in raxml
are identical to the flags used by RAxML (-p
,
-b
, and -x
). If you choose not to give any values,
raxml
will generate a (different) value for each required random
seed every time it is called. Be aware that set.seed
will work
only for p
, but not for b
or x
.
A list with a variable number of elements, depending on the analysis chosen:
"info" |
RAxML log file as character string |
"bestTree" |
MLE of tree |
"bipartitions" |
MLE of tree annotated with bootstrap proportions |
"bootstrap" |
bootstrapped trees |
RAxML is a C program and the source code is not contained in this
package. This means that in order to run this function you will need to
install RAxML yourself. See
https://cme.h-its.org/exelixis/web/software/raxml/ for the most recent
documentation and source code of RAxML. Depending on where you chose to
install RAxML, you need to adjust the exec
argument.
raxml
was last tested and running fine on Mac OS X with RAxML 8.0.29.
Please be aware that calling third-party software from within R is a
platform-specific process and I cannot guarantee that raxml
will
behave properly on any system.
(in chronological order)
Stamatakis, A., T. Ludwig and H. Meier. 2004. RAxML-III: A fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics 1: 1–8.
Stamatakis, A. 2006. RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22: 2688–2690.
Stamatakis, A., P. Hoover, and J. Rougemont. 2008. A rapid bootstrap algorithm for the RAxML web-servers. Syst. Biol. 75: 758–771.
Pattengale, N. D., M. Alipour, O. R. P. Bininda-Emonds, B. M. E. Moret, and A. Stamatakis. 2010. How many bootstrap replicates are necessary? Journal of Computational Biology 17: 337-354.
Stamatakis, A. 2014. RAxML Version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics Advance Access.
raxml.partitions
to store partitioning information in a
data frame suitable for input as partitions
argument in raxml
.
## bark beetle sequences data(ips.cox1) data(ips.16S) data(ips.28S) ips <- cbind(ips.cox1, ips.16S, ips.28S, fill.with.gaps = TRUE) exec <- "/Applications/RAxML-code/standard-RAxML/raxmlHPC-PTHREADS-AVX" w <- sample(1:5, ncol(ips.cox1), replace = TRUE) ## Not run: # Simple tree search with GTRCAT and GTRGAMMA tr <- raxml(ips.cox1, f = "d", N = 2, p = 1234, exec = exec) # -1743.528461 tr <- raxml(ips.cox1, m = "GTRGAMMA", f = "d", N = 2, p = 1234, exec = exec) # Applying weights to columns tr <- raxml(ips.cox1, f = "d", N = 2, p = 1234, weights = w, exec = exec) # -1743.528461 # Rapid bootstrap tr <- raxml(ips.cox1, m = "GTRGAMMA", f = "a", N = 10, p = 1234, x = 1234, exec = exec) # Rapid bootstrap with automatic halt tr <- raxml(ips.cox1, m = "GTRGAMMA", f = "a", N = "autoMRE", p = 1234, x = 1234, exec = exec) ## End(Not run)
## bark beetle sequences data(ips.cox1) data(ips.16S) data(ips.28S) ips <- cbind(ips.cox1, ips.16S, ips.28S, fill.with.gaps = TRUE) exec <- "/Applications/RAxML-code/standard-RAxML/raxmlHPC-PTHREADS-AVX" w <- sample(1:5, ncol(ips.cox1), replace = TRUE) ## Not run: # Simple tree search with GTRCAT and GTRGAMMA tr <- raxml(ips.cox1, f = "d", N = 2, p = 1234, exec = exec) # -1743.528461 tr <- raxml(ips.cox1, m = "GTRGAMMA", f = "d", N = 2, p = 1234, exec = exec) # Applying weights to columns tr <- raxml(ips.cox1, f = "d", N = 2, p = 1234, weights = w, exec = exec) # -1743.528461 # Rapid bootstrap tr <- raxml(ips.cox1, m = "GTRGAMMA", f = "a", N = 10, p = 1234, x = 1234, exec = exec) # Rapid bootstrap with automatic halt tr <- raxml(ips.cox1, m = "GTRGAMMA", f = "a", N = "autoMRE", p = 1234, x = 1234, exec = exec) ## End(Not run)
Given a set of DNA sequence alignments, raxml.partitions
creates a data frame with partition bounderies that can be input into raxml
.
raxml.partitions(...)
raxml.partitions(...)
... |
Two or more DNA sequence alignments of class |
For raxml.partitions
to make sense, the DNA sequence alignments must be given exactly in the same order in which they are concatenated into a supermatrix (see Examples section). Without any testing, the type of sequences is supposed to be DNA.
A data frame with four columns (type
, locus
, begin
, and end
) and number of rows corresponding to the number of partitions.
cbind.DNAbin
to concatenate multiple alignments; raxml
for an interface to RAxML.
## bark beetle sequences data(ips.cox1) data(ips.16S) data(ips.28S) ## Note the same order of individual ## alignments in both functions: ## ---------------------------- raxml.partitions(cox1 = ips.cox1, r16S = ips.16S, r28S = ips.28S) cbind(ips.cox1, ips.16S, ips.28S, fill.with.gaps = TRUE)
## bark beetle sequences data(ips.cox1) data(ips.16S) data(ips.28S) ## Note the same order of individual ## alignments in both functions: ## ---------------------------- raxml.partitions(cox1 = ips.cox1, r16S = ips.16S, r28S = ips.28S) cbind(ips.cox1, ips.16S, ips.28S, fill.with.gaps = TRUE)
Prepare XML files for BEAST with R. BEAST uses an MCMC approach to estimate rooted phylogenies from molecular data (Drummond & Rambaut, 2007).
rbeauti( ..., file, template = "standard", link.clocks = TRUE, link.trees = TRUE, subst.model, clock, tree, taxonset, chain.length = 1e+07, log.every = 1000 )
rbeauti( ..., file, template = "standard", link.clocks = TRUE, link.trees = TRUE, subst.model, clock, tree, taxonset, chain.length = 1e+07, log.every = 1000 )
... |
One or more object(s) of class |
file |
A connection, or a character string naming the file to write to. If left empty the XML tree will be printed to the screen (see Examples). |
template |
Currently unused. |
link.clocks |
Logical, indicating if clock models should be linked over partitions. |
link.trees |
Logical, indicating if tree models should be linked over partitions. |
subst.model |
A character string defining a substituion model, either
|
clock |
A character string defining a clock model, either |
tree |
A character string defining a tree model. |
taxonset |
A list containing one or more taxon sets. |
chain.length |
Integer, the number of generations to run the MCMC. |
log.every |
Integer, defining how often samples from the posterior will be written to log files and shown on screen. |
rbeauti
has been completely rewritten to work with
BEAST 2. Currently rbeauti
offers few options, because the
idea is not to create ready-to-use XML file. That can be done conveniently
with BEAUti (the BEAST package's genuine XML generator). Instead,
rbeauti
is intended to make the definition of large numbers of taxon
sets easy. The creation of taxon sets can be done via R scripts and the
resulting XML files can be further modified with BEAUti.
The XML reference at the BEAST 2 website: http://beast.community/xml_reference Drummond, A.J. & A. Rambaut. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evolutionary Biology 7: 240.
data(ips.16S) ## define taxon sets spec <- rownames(ips.16S) ingroup <- spec[grep("Ips|Orthomotomicus", spec)] outgroup <- spec[grep("Pityogenes", spec)] ts <- list(list(id = "ingroup", taxon = ingroup), list(id = "outgroup", taxon = outgroup)) ## print XML file to screen # rbeauti(ips.16S, taxonset = ts)
data(ips.16S) ## define taxon sets spec <- rownames(ips.16S) ingroup <- spec[grep("Ips|Orthomotomicus", spec)] outgroup <- spec[grep("Pityogenes", spec)] ts <- list(list(id = "ingroup", taxon = ingroup), list(id = "outgroup", taxon = outgroup)) ## print XML file to screen # rbeauti(ips.16S, taxonset = ts)
Reverse, complement or reverse-complement of DNA sequences.
rc(seqs, i, complement = TRUE, reverse = TRUE, delete.gaps = FALSE)
rc(seqs, i, complement = TRUE, reverse = TRUE, delete.gaps = FALSE)
seqs |
An object of class |
i |
Logical or numeric index to indicate a subset of sequences to manipulate. If not given the entire set of sequences will be manipulated. |
complement |
Logical, indicating if sequences will be turned into their complement. |
reverse |
Logical, indication if sequences will be reversed. |
delete.gaps |
Logical, indicating if gap character will be removed prior to sequence manipulation. |
An object of the same class and dimension as seqs
.
## A minimal sequence alignment: x <- list( seqA = c("a", "a", "c", "c", "g", "t"), seqB = c("n", "-", "r", "y", "g", "t")) x <- as.DNAbin(x) ## Three choices of manipulation: as.character(x) as.character(rc(x)) ## reverse-complement as.character(rc(x, complement = FALSE)) ## only reverse as.character(rc(x, reverse = FALSE)) ## only complement ## You can remove gaps: as.character(rc(x, delete.gaps = TRUE)) ## gaps/indels removed
## A minimal sequence alignment: x <- list( seqA = c("a", "a", "c", "c", "g", "t"), seqB = c("n", "-", "r", "y", "g", "t")) x <- as.DNAbin(x) ## Three choices of manipulation: as.character(x) as.character(rc(x)) ## reverse-complement as.character(rc(x, complement = FALSE)) ## only reverse as.character(rc(x, reverse = FALSE)) ## only complement ## You can remove gaps: as.character(rc(x, delete.gaps = TRUE)) ## gaps/indels removed
Read DNA and amino acid sequences from FASTA, PHILIP, and NEXUS formatted files.
read.fas(x, text) read.nex(x) read.phy(x)
read.fas(x, text) read.nex(x) read.phy(x)
x |
A character string, giving the file name. |
text |
A character string in FASTA format. |
An matrix (aligned sequences) or list (unaligned sequences) of class
DNAbin
or AAbin
.
Maddison, D.R., D.L. Swofford, and W.P. Maddison. 1997. NEXUS: an extensible file format for systematic information. Syst. Biol. 46: 590-621.
mafft
and prank
for sequence alignment,
gblocks
and aliscore
for quality check and
cleaning of sequence alignments, cbind.DNAbin
for
concatenation of sequence alignments.
## bark beetle COX1 sequences data(ips.cox1) ## create temporary file names format <- c(".fas", ".phy", ".nex") fn <- sapply(format, tempfile, pattern = "ips", tmpdir = tempdir()) ## write sequences files write.fas(ips.cox1, fn[".fas"]) write.phy(ips.cox1, fn[".phy"]) write.nex(ips.cox1, fn[".nex"]) ## read sequence files fas <- read.fas(fn[".fas"]) phy <- read.phy(fn[".phy"]) nex <- read.nex(fn[".nex"]) ## remove sequence files unlink(fn)
## bark beetle COX1 sequences data(ips.cox1) ## create temporary file names format <- c(".fas", ".phy", ".nex") fn <- sapply(format, tempfile, pattern = "ips", tmpdir = tempdir()) ## write sequences files write.fas(ips.cox1, fn[".fas"]) write.phy(ips.cox1, fn[".phy"]) write.nex(ips.cox1, fn[".nex"]) ## read sequence files fas <- read.fas(fn[".fas"]) phy <- read.phy(fn[".phy"]) nex <- read.nex(fn[".nex"]) ## remove sequence files unlink(fn)
Thess functions parse chronograms in NEXUS format as produced by TreeAnnotator or output by MrBayes.
read.mrbayes(file, digits = NULL) read.beast(file, digits = NULL) read.starbeast(file)
read.mrbayes(file, digits = NULL) read.beast(file, digits = NULL) read.starbeast(file)
file |
A character string giving the input file, which must be a TreeAnnotator-generated chronogram in NEXUS format. |
digits |
NULL or integer, if |
An object of class phylo
.
read.starbeast
currently parses only skalars and ranges; node statistics with more than two values will be deleted and a warning message will be issued. Future version of read.starbeast
will hopefully be able to append list or data frames to phylo
objects. If you have any opinion or wishes regarding the question of how this exactly should be managed, send me a message.
read.mrbayes
is intended to read single trees with annotated nodes. For reading tree samples from the posterior distribution there is a collection of perl script written by Johan Nylander (https://github.com/nylander/Burntrees).
Christoph Heibl
TreeAnnotator: http://beast.community/treeannotator
Metacomments in NEXUS: https://code.google.com/archive/p/beast-mcmc/wikis/NexusMetacommentFormat.wiki
read.beast.table
to extract internal node data from NEXUS file, rbeauti
to create XML input for BEAST. HPDbars
for plotting highest posterior densities on phylogenies has been moved to package viper.
This function reads a BEAST chronogram such as produced by TreeAnnotator and extracts time, rate, and support values for internal and external nodes. Nodes in the resulting data frame are ordered exactly like in the NEXUS file.
read.beast.table(file, digits = 2)
read.beast.table(file, digits = 2)
file |
character string giving the input file, which must be a TreeAnnotator-generated chronogram in NEXUS format |
digits |
NULL or integer, if |
A matrix; each row corresponds to an internal node, the (ape!)number of which is given in the first column; the remaining columns list the node values extracted from the chronogram.
Christoph Heibl
read.beast
to parse TreeAnnotator output, rbeauti
to create XML input for BEAST. HPDbars
for plotting highest posterior densities on phylogenies has been moved to package viper.
For any given internal node in a phylogeny, this function returns the sister clade.
sister(phy, node, type = "terminal", label = FALSE)
sister(phy, node, type = "terminal", label = FALSE)
phy |
An object of class |
node |
A vector of mode |
type |
A character string, may be |
label |
Logical, determining if tip number or tip labels will be returned. |
A vector of mode "numeric"
or "character"
, containing
either tip numbers or labels, respectively.
# A phylogeny of bark beetles ... data(ips.tree) tcol <- rep("black", Ntip(ips.tree)) tcol[ips.tree$tip.label %in% c("Ips_typographus", "Ips_nitidus")] <- "blue" tcol[ips.tree$tip.label %in% c("Ips_duplicatus")] <- "red" plot(ips.tree, no.margin = TRUE, tip.color = tcol) # What is the sister species of Ips typographus? sister(ips.tree, "Ips_typographus", label = TRUE) # Return the MRCA of the sister clade of Ips duplicatus x <- sister(ips.tree, "Ips_duplicatus", "daughter") nodelabels(node = x, pch = 21, bg = "red")
# A phylogeny of bark beetles ... data(ips.tree) tcol <- rep("black", Ntip(ips.tree)) tcol[ips.tree$tip.label %in% c("Ips_typographus", "Ips_nitidus")] <- "blue" tcol[ips.tree$tip.label %in% c("Ips_duplicatus")] <- "red" plot(ips.tree, no.margin = TRUE, tip.color = tcol) # What is the sister species of Ips typographus? sister(ips.tree, "Ips_typographus", label = TRUE) # Return the MRCA of the sister clade of Ips duplicatus x <- sister(ips.tree, "Ips_duplicatus", "daughter") nodelabels(node = x, pch = 21, bg = "red")
Takes a phylogeny and a subset of its tiplabels and splits the list of tiplabels into monophyletic groups (clades).
splitIntoClades(phy, tips)
splitIntoClades(phy, tips)
phy |
An object of class |
tips |
A vector of mode |
A list.
Finds pairs of sister species in a phylogenetic tree.
terminalSisters(phy, labels = TRUE)
terminalSisters(phy, labels = TRUE)
phy |
An object of class |
labels |
Logical, indicating whether to return tip labels or tip numbers. |
A list of which each element contains the tip labels of a sister species pair.
set.seed(1234) tr <- rtree(12) plot(tr) terminalSisters(tr)
set.seed(1234) tr <- rtree(12) plot(tr) terminalSisters(tr)
For each tip (leave, terminal node) in the phylogenetic tree the edge lengths (branch lengths) from root to tip, be it units of time or divergence, is summed up.
tipHeights(phy)
tipHeights(phy)
phy |
an object of class |
a numeric vector with distances from root to tip for each tip in the phylogenetic tree.
Christoph Heibl
Detection of trait-dependent shifts in the rate of molecular evolution with traitRate (Mayrose & Otto, 2011).
traitRate(phy, seq, x, mainType = "Optimize_Model", n, charModelParam1 = 0.5, charModelParam2 = 1, gammaParam = 0.5, seqModelParam1 = 2, exec = "/Applications/traitRate-1.1/programs/traitRate")
traitRate(phy, seq, x, mainType = "Optimize_Model", n, charModelParam1 = 0.5, charModelParam2 = 1, gammaParam = 0.5, seqModelParam1 = 2, exec = "/Applications/traitRate-1.1/programs/traitRate")
phy |
a ultrametric phylogenetic tree of class |
seq |
a multiple sequence alignment of class |
x |
data frame containing a binary character in the first column. |
mainType |
character string giving the type of analysis; two choices are possible: |
n |
numeric, the number of bootstrap replicates. Will be ignored if |
charModelParam1 |
numeric, giving an initial value for the rate of transitions of character state 0 to 1. |
charModelParam2 |
numeric, giving an initial value for the rate of transitions of character state 1 to 0. |
gammaParam |
numeric, giving an initial value for the alpha parameter of the model of sequence evolution. |
seqModelParam1 |
numeric, giving an initial value for the kappa parameter of the model of sequence evolution. |
exec |
character string giving the path to the program directory. |
Currently none, but look for the output files in the 'RESULTS'
subdirectory in the current working directory.
This function is under development!
Christoph Heibl
Mayrose, I. & S.P. Otto. 2011. A likelihood method for detecting trait-dependent shifts in the rate of molecular evolution. Mol. Biol. Evol. 28: 759-770
read.tree
for reading phylogenetic trees, read.fas
for reading multiple sequence alignments in FASTA format.
Trims both ends of a DNA sequence alignment to the first and
last alignment positions that contain a minimum number of IUPAC base
characters ("a"
, "c"
, "g"
, "t"
, "r"
,
"y"
, "s"
, "w"
, "k"
, "m"
, "b"
,
"d"
, "h"
, "v"
). In addition, all gap characters
("-"
) beyond the first and last base characters of each sequence are
replaced by the character "n"
.
trimEnds(x, min.n.seq = 4)
trimEnds(x, min.n.seq = 4)
x |
An object of class |
min.n.seq |
A |
An object of class DNAbin
.
# simple example alignment: x <- structure(list(nb = 5, seq = c("acaaggtaca", "-caaggtac-", "acaaggtaca", "aca--gtaca", "-ccaggta--"), nam = LETTERS[1:5]), .Names = c("nb", "seq", "nam"), class = "alignment") # convert to DNAbin: x <- as.DNAbin(x) # fill missing nucleotides: x <- trimEnds(x) # show results: as.character(x[2, ])
# simple example alignment: x <- structure(list(nb = 5, seq = c("acaaggtaca", "-caaggtac-", "acaaggtaca", "aca--gtaca", "-ccaggta--"), nam = LETTERS[1:5]), .Names = c("nb", "seq", "nam"), class = "alignment") # convert to DNAbin: x <- as.DNAbin(x) # fill missing nucleotides: x <- trimEnds(x) # show results: as.character(x[2, ])
Does the same as unlist
, but recurses only
one level.
unlistFirstLevel(z, use.names = TRUE)
unlistFirstLevel(z, use.names = TRUE)
z |
A list of lists. |
use.names |
Logical, indicating if element names from the element should be preserved. |
Write DNA sequences and morphological data to FASTA, PHYLIP, or NEXUS formatted files.
write.fas(x, file, block.width = FALSE, truncate = FALSE, append = FALSE) write.phy(x, file, block.width = FALSE, strict = FALSE) write.nex(x, file, block.width = 60, taxblock = FALSE)
write.fas(x, file, block.width = FALSE, truncate = FALSE, append = FALSE) write.phy(x, file, block.width = FALSE, strict = FALSE) write.nex(x, file, block.width = 60, taxblock = FALSE)
x |
an object of class |
file |
a character string giving the filename; a special case is |
block.width |
an integer, giving the number of characters per line. |
truncate |
truncation of taxon names to the number of characters given as a integer, otherwise (default) taxon names will not be changed. |
append |
logical, if |
strict |
logical, if |
taxblock |
logical, if |
write.nex
can handle multiple DNA sequence alignments, which are handed over as a list of objects of class DNAbin
. Correct matching of the rows in the alignments is cared for automatically, hence the individual alignments can contain different numbers of samples and samples need not be in the same order.
None, except when called with file
left unspecified, which causes the file content to be returned as a vector of mode "character"
. This is particularly useful for constructing special types of input files, e.g. for MrBayes (mrbayes
).
Christoph Heibl
Maddison, D.R., D.L. Swofford, and W.P. Maddison. 1997. NEXUS: an extensible file format for systematic information. Syst. Biol. 46: 590-621.
read.fas
, read.phy
, and read.nex
for reading of DNA sequence files.
data(ips.cox1) data(ips.28S) ## Examples for FASTA files ## ------------------------ write.fas(ips.cox1[1:5, 1:120], block.width = 60) ## Examples for PHYLIP files ## ------------------------ write.phy(ips.cox1[1:5, 1:20], block.width = 40) ## Examples for NEXUS files ## ------------------------ x <- list(cox1 = ips.cox1[1:5, 1:10], rna28S = ips.28S[1:5, 1:30]) write.nex(x, block.width = 20) # Truncation of taxonnames: # ------------------------- rownames(ips.cox1)[1] <- "AVeeeeeeeeeeeeeeeeeryLongName" write.fas(ips.cox1, truncate = 10) # If truncation leads to identical taxonnames, # a warning will be issued: # ------------------------- rownames(ips.cox1)[1:2] <- "AVeeeeeeeeeeeeeeeeeryLongName" write.fas(ips.cox1, truncate = 10)
data(ips.cox1) data(ips.28S) ## Examples for FASTA files ## ------------------------ write.fas(ips.cox1[1:5, 1:120], block.width = 60) ## Examples for PHYLIP files ## ------------------------ write.phy(ips.cox1[1:5, 1:20], block.width = 40) ## Examples for NEXUS files ## ------------------------ x <- list(cox1 = ips.cox1[1:5, 1:10], rna28S = ips.28S[1:5, 1:30]) write.nex(x, block.width = 20) # Truncation of taxonnames: # ------------------------- rownames(ips.cox1)[1] <- "AVeeeeeeeeeeeeeeeeeryLongName" write.fas(ips.cox1, truncate = 10) # If truncation leads to identical taxonnames, # a warning will be issued: # ------------------------- rownames(ips.cox1)[1:2] <- "AVeeeeeeeeeeeeeeeeeryLongName" write.fas(ips.cox1, truncate = 10)