Title: | Tools to work with MCMCtree |
---|---|
Description: | Tools to work with MCMCtree, a program for Bayesian inference of species divergence times. |
Authors: | Sandra Alvarez-Carretero [aut, cre] |
Maintainer: | Sandra Alvarez-Carretero <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.5.7 |
Built: | 2024-11-01 12:23:52 UTC |
Source: | https://github.com/dosreislab/mcmc3r |
Matrix which, once multiplied by its transpose, yields to the inverse of the estimate of the shrinkage correlation matrix, R.sh. The latter is used to transform the data set as it corrects the morphological data set for the corresponding character correlation
A
A
A matrix of size n x n, where n = 87 (morphological traits: 29 landmarks x 3D coordinates):
Number of traits for which the correlation values have been calculated, 87
Convert an array with landmark points into an object of class matrix.
array2matrix(X, coords = c(2, 3))
array2matrix(X, coords = c(2, 3))
X |
Array, |
coords |
Integer, 2 or 3, for 2D or 3D landmarks, respectively. |
The object X
, class array, has format k x q x s
, where k
is
the number of landmarks, q
the number of coordinates, and s
the number
of specimens. See C.arr.unal
for an example of the format of a 3D array
and data-raw/C.R
for the details about how to generate this object.
An object of class matrix, with s
rows, one for specimen, and n
columns, one
for each coordinate of the landmarks.
Each landmark can be given in 2D or 3D. For instance,
if the landmarks are 3D, the first 3 columns will be the
coordinates x, y, and z for the first landmark, the next 3
columns for the second landmark, and so on.
specimens | lmk1.x | lmk1.y | lmk1.z | lmk2.x | lmk2.y | lmk2.z | ... |
Sp_1 | 0.143 | -0.028 | -0.044 | 0.129 | 0.028 | -0.043 | ... |
Sp_2 | 0.128 | -0.024 | -0.028 | 0.124 | 0.027 | -0.025 | ... |
... | ... | ... | ... | ... | ... | ... | ... |
See object C
for a more detailed example of the format of the
object that is returned and data-raw/C.R
for the explanation
about how to generate this object.
Note that if the names for the s
matrices, one per specimen, in the array
are not provided, i.e. the names for specimens are not given,
the specimens in the returned matrix, row.names(matrix)
, will be
labelled as '1', '2', and so on.
Sandra Alvarez-Carretero and Mario dos Reis
Calculate Bayes factors and posterior model probabilities
bayes.factors(..., prior = NULL, boot = TRUE, n = 10000, prob = 0.95)
bayes.factors(..., prior = NULL, boot = TRUE, n = 10000, prob = 0.95)
... |
list of marginal likelihood objects, see details |
prior |
numeric, the prior model probabilities |
boot |
logical, whether to perform parametric boostrap of probabilities |
n |
numeric, number of bootstrap samples |
prob |
numeric, the probability used to calculate the boostrap CI |
Input is a list of marginal likelihood objects, with each object generated by
either stepping.stones()
or gauss.quad()
. If boot =
TRUE
, parametric bootstrap is performed by assuming the log-marginal
likelihood estimates are normally distributed with standard deviation equal
to the standard error. The re-sampled n
marginal log-likelihoods are
used to estimate re-sampled posterior probabilities and to calculate an
equal-tail bootstrap confidence interval for these.
Note that the length of prior
should be the same as the number of
models being compared. The prior
is rescaled so that
sum(prior) == 1
.
A list with elements bf
and logbf
, the Bayes factors and
log-Bayes factors; pr
, the posterior model probabilities; prior
the prior model probabilities and, if boot = TRUE
, pr.ci
the
equal-tail bootstrap confidence interval.
Mario dos Reis
# See Table 5 in dos Reis et al. (2018, Syst. Biol., 67: 594-615) # Bayesian selection of relaxed clock models for the 1st and 2nd sites # of mitochondrial protein-coding genes of primates # Models: strick clock, independent-rates, and autocorrelated-rates sc <- list(); sc$logml <- -16519.03; sc$se <- .01 ir <- list(); ir$logml <- -16480.58; ir$se <- .063 ar <- list(); ar$logml <- -16477.82; ar$se <- .035 bayes.factors(sc, ir, ar) bayes.factors(sc, ir, ar, prior=c(.25,.5,.25)) bayes.factors(sc, ir, ar, prior=c(0,1,0))
# See Table 5 in dos Reis et al. (2018, Syst. Biol., 67: 594-615) # Bayesian selection of relaxed clock models for the 1st and 2nd sites # of mitochondrial protein-coding genes of primates # Models: strick clock, independent-rates, and autocorrelated-rates sc <- list(); sc$logml <- -16519.03; sc$se <- .01 ir <- list(); ir$logml <- -16480.58; ir$se <- .063 ar <- list(); ar$logml <- -16477.82; ar$se <- .035 bayes.factors(sc, ir, ar) bayes.factors(sc, ir, ar, prior=c(.25,.5,.25)) bayes.factors(sc, ir, ar, prior=c(0,1,0))
Generate block bootstrap replicates of sampled power likelihoods
block.boot(R, p = 0.1, mcmcf = "mcmc.txt", betaf = "beta.txt", preff = "lnL")
block.boot(R, p = 0.1, mcmcf = "mcmc.txt", betaf = "beta.txt", preff = "lnL")
R |
numeric, number of bootstrap replicates |
p |
numeric, block length, giving as a proportion of the MCMC sample size |
mcmcf |
character, mcmc output file name |
betaf |
character, file with beta values |
preff |
character, prefix for files storing boot replicates |
Block bootstrap replicates are generated using the stationary boostrap method
of Politis and Romano (1994). The replicates are stored in files named using
preff
and the replicate number. For example, if preff = "lnL"
(the default) then the files are lnL0.txt, lnL1.txt, lnL2.txt, ..., etc, with
lnL0.txt corresponding to the original log-likelihood sample. Replicates are
stored within the directories corresponding to the appropriate beta values.
The collection of files can grow large quickly so you may want to use a small
number of replicates (say R = 10 to R = 100).
This function uses code from the boot package by Canty and Ripley.
Álvarez-Carretero et al (2022) A species-level timeline of mammal evolution integrating phylogenomic data. Nature, 602: 263–267.
Politis and Romano (1994) The stationary boostrap. J. Am. Stat. Assoc., 89: 1303–1313.
stepping.stones.boot and tsboot (from the boot package).
A matrix containing the 29 3D landmarks collected from the skulls of 19 carnivoran specimens after carrying out a Procrustes analysis (PA). Please take a look at the description in morpho/data-raw/C.R to understand how this object was generated. This is the morphological alignment.
C
C
A matrix with s = 19 rows and n = 87 columns (87/3 = 29 landmarks):
Rows, specimens from which landmarks were collected, 19
Columns, 87 traits (29 landmarks in 3D) after the PA
A 3D array containing the 29 3D landmarks collected from the skulls of 19 carnivoran specimens before carrying out a Procrustes analysis (PA). Please take a look at the description in morpho/data-raw/C.R to understand how this object was generated.
C.arr.unal
C.arr.unal
An array with k = 29 (landmarks), q = 3 (coordinates) and s = 19 (specimens):
landmark points collected from 19 carnivoran specimens, 29
coordinates in 3D or 2D, 3
number of specimens, 18
A matrix containing the 29 3D landmarks collected from the skulls of 19 carnivoran specimens before carrying out a Procrustes analysis (PA). Please take a look at the description in morpho/data-raw/C.R to understand how this object was generated.
C.mat.unal
C.mat.unal
A matrix with s = 19 rows and n = 87 columns (87/3 = 29 landmarks):
Rows, specimens from which landmarks were collected, 19
Columns, 87 traits (29 landmarks in 3D) after the PA
Object of class procSym output by Morpho, which mean shape is later used to align the foxes ("Vulpes vulpes") specimens to it. Please take a look at the description in morpho/data-raw/C.R to understand how this object was generated.
C.PS
C.PS
Object procSym
Check procSym
for more details
Density, distribution, and quantile functions for calibrations used in MCMCtree.
dL(x, tL, p = 0.1, c = 1, pL = 0.025) pL(q, tL, p = 0.1, c = 1, pL = 0.025) qL(prob, tL, p = 0.1, c = 1, pL = 0.025) dB(x, tL, tU, pL = 0.025, pU = 0.025) dU(x, tU, pU = 0.025)
dL(x, tL, p = 0.1, c = 1, pL = 0.025) pL(q, tL, p = 0.1, c = 1, pL = 0.025) qL(prob, tL, p = 0.1, c = 1, pL = 0.025) dB(x, tL, tU, pL = 0.025, pU = 0.025) dU(x, tU, pU = 0.025)
x |
numeric, vector of quantiles |
tL |
numeric, minimum age |
p |
numeric, mode parameter for truncated Cauchy |
c |
numeric, tail decay parameter for truncated Cauchy |
pL |
numeric, minimum probability bound |
q |
numeric, quantile |
prob |
numeric probability |
tU |
numeric, maximum age |
pU |
numeric, maximum probability bound |
Calculates the density, distribution and quantile functions for the minimum (dL) calibration, and the density function for the joint (dB) and maximum (dU) calibration bounds as implemented in MCMCtree. See Yang and Rannala (2007) and Inoue et al. (2010) for details.
A vector of density, probability, or quantile values as appropriate.
Mario dos Reis
Yang and Rannala. (2006) Bayesian Estimation of Species Divergence Times Under a Molecular Clock Using Multiple Fossil Calibrations with Soft Bounds. Mol. Biol. Evol., 23: 212–226.
Inoue, Donoghue and Yang (2010) The Impact of the Representation of Fossil Calibrations on Bayesian Estimation of Species Divergence Times. Syst. Biol., 59: 74–89.
# Plot a minimum bound calibration density: curve(dL(x, 1), from=0, to=10, n=5e2) # Plot a joint bounds calibration density: curve(dB(x, 1, 6), from=0, to=10, n=5e2) # Plot a maximum bound calibration density: curve(dU(x, 6), from=0, to=10, n=5e2) # Probability and quantile function for minimum bound (or truncated-Cauchy): qv <- 0:20 # calculate probability vector from quantiles: pv <- pL(qv, tL=1) # calculate quantiles back from probability vector: # (note numerical error) qL(pv, tL=1)
# Plot a minimum bound calibration density: curve(dL(x, 1), from=0, to=10, n=5e2) # Plot a joint bounds calibration density: curve(dB(x, 1, 6), from=0, to=10, n=5e2) # Plot a maximum bound calibration density: curve(dU(x, 6), from=0, to=10, n=5e2) # Probability and quantile function for minimum bound (or truncated-Cauchy): qv <- 0:20 # calculate probability vector from quantiles: pv <- pL(qv, tL=1) # calculate quantiles back from probability vector: # (note numerical error) qL(pv, tL=1)
Dataset of carnivores containing alignments of morphlogical continuous characters, a phylogeny, and an MCMC matrix produced by MCMCtree.
carnivores
carnivores
carnivores
is a list with elements:
C.raw
, a matrix of 29 3D-landmarks measurements from 19 species;
V.raw
, a matrix of 29 3D-landmarks for 21 common foxes (Vulpes
vulpes);
C.proc
and V.proc
, the corresponding matrices after Procrustes
alignment;
var.foxes
and R.sh
, with estimates of the variance vector and
correlation matrix for the landmarks in the 21 foxes;
M
, a matrix of transformed landmarks (using the foxes variances
and correlation matrix, with Cholesky metric) for the 19 carnivores;
pairedLM
, a matrix describing the symmetry of the landmarks;
tree
, the phylogeny of the 19 carnivores; and
mcmc
, an MCMC sample of divergence times and morphological and
molecular rates from an MCMCtree analysis.
Alvarez-Carretero S, Goswami A, Yang Z and dos Reis M. (2018) Bayesian estimation of species divergence times using correlated quantitative characters. Syst. Biol., 68: 967–986.
A dataset containing the 29 3D landmarks collected from the skulls of 19 carnivoran specimens This data.frame consists of a first column with the specimen labels used by MCMCtree, a second column with the voucher names of the specimens collected, and then 87 columns with the coordinates for each trait (29 landmarks x 3D coordinates). Please take a look at the description in morpho/data-raw/carnivores19x29.raw.R to understand how this object was generated.
carnivores19x29.raw
carnivores19x29.raw
A data.frame with s = 19 rows and p = 89 columns ( 2 info columns + 87 traits ):
Rows, specimens from which landmarks were collected, 19
Columns, specimens information in 1st and 2nd column + 87 traits (29 landmarks in 3D)
Function that outputs the control file to run MCMCtree. It can be used to run only with morphological data or morphological+molecular data (more than one partition in the alignment file) together.
ctlMCMCtree( filename, mol = FALSE, seed = -1, seqfile, treefile, mcmcfile = "mcmc.txt", outfile = "out.txt", ndata, seqtype = 0, usedata = 1, clock, RootAge, TipDate, alpha, ncatG, cleandata = 0, BDparas, kappa_gamma, alpha_gamma, rgene_gamma, sigma2_gamma, print = 2, burnin, sampfreq, nsample, model )
ctlMCMCtree( filename, mol = FALSE, seed = -1, seqfile, treefile, mcmcfile = "mcmc.txt", outfile = "out.txt", ndata, seqtype = 0, usedata = 1, clock, RootAge, TipDate, alpha, ncatG, cleandata = 0, BDparas, kappa_gamma, alpha_gamma, rgene_gamma, sigma2_gamma, print = 2, burnin, sampfreq, nsample, model )
filename |
Character, name for the output control file. |
mol |
Boolean, TRUE if you include molecular data, FALSE otherwise. Default = FALSE. |
seed |
Numeric, seed value. Default = -1 (assigns random seed using computer's current time). |
seqfile |
Character, path to the alignment file. |
treefile |
Character, path to the tree file. |
mcmcfile |
Character, path to the file with the report of MCMC runs. By default it generates a file called "mcmc.txt" in the directory where MCMCtree is run. |
outfile |
Character, path to the summary results file. By default it generates a file called "out.txt" in the directory where MCMCtree is run. |
ndata |
Numeric, number of partitions in the alignment file. |
seqtype |
Numeric, 0 for nucleotide sequences, 1 for codon sequences, and 2 for amino acid sequences. Default = 0. |
usedata |
Numeric, 1: Calculate the likelihood function in the normal way, 0: Likelihood is not calculated (likelihood = 1), 2 and 3: approximate likelihood calculation and ML estimation of branch lengths (see details). Default = 1. |
clock |
Numeric, 1: strict clock model, 2: independent rates model, 3: autocorrelated rates model (see details). |
RootAge |
Character, calibration for the root. |
TipDate |
Numeric, time unit to scale the estimated divergence times. See details. |
alpha |
Numeric, alpha value for the discrete-gamma model of rate variation. Only used if molecular data is available. |
ncatG |
Numeric, number of categories for the discrete-gamma model of rate variation. Only used if molecular data is available. |
cleandata |
Numeric, 0: alignment gaps and ambiguity characters are treated as missing data, 1: any site where at least one sequences has an alignment gap or ambiguity character is deleted (see details). Default = 0. |
BDparas |
Numeric, vector with the parameters controlling the birth-death-sequential-sampling (BDSS) process (see details). |
kappa_gamma |
Numeric, vector with the parameters for the substitution model parameter kappa (transition/transversion rate ratio). Only used if molecular data is available. |
alpha_gamma |
Numeric, vector with the parameters for the substitution model parameter gamma (gamma shape parameter for variable rates among sites). Only used if molecular data is available. |
rgene_gamma |
Numeric, vector with the parameters for the Dirichlet-gamma prior for the mean substitution rate (see details). |
sigma2_gamma |
Numeric, vector with the parameters for the Dirichlet-gamma prior for the rate drift parameter (see details). |
print |
Numeric, 0: results are printed to screen only, 1: MCMC is written to "mcmcfile" and the summary to the "outfile", 2: same as 1 but rates for branches for each partitions are appended to "outfile". Default = 2. |
burnin |
Numeric, number of iterations to be discarded (burn-in). |
sampfreq |
Numeric, number of iterations after which a sample will be collected. |
nsample |
Numeric, number of samples to be gathered. |
model |
Numeric, substitution model to be used (see details). 0:JC69, 1:K80, 2:F81, 3:F84, 4:HKY85, 5:T92, 6:TN93, 7:REV, 8:UNREST, 9:REVu; 10:UNRESTu. Only used if molecular data is available. |
For more information, please check the MCMCtree tutorial and the PAML documentation.
# First create objects with the path to alignment and tree files and then # call the function to generate the control file. The parameters not passed # to the function are used as the default values tree <- system.file( "extdata", "19s.trees", package = "morpho") aln <- system.file( "extdata", "seqfile.aln", package = "morpho") # Uncomment the following lines followed by "##" and change the path in the filename so # it is saved where you want ##ctlMCMCtree( filename = "../mcmctree.ctl", mol = FALSE, seqfile = aln, treefile = tree, ##ndata = 1, clock = 2, TipDate = 1, RootAge = c("B(37.3, 66.0, 0.025, 0.025)"), ##BDparas = c( 1, 1, 0, 0.001 ), rgene_gamma = c( 2, 5 ), ##sigma2_gamma = c( 2, 2 ), burnin = 50000, sampfreq = 50, nsample = 20000 )
# First create objects with the path to alignment and tree files and then # call the function to generate the control file. The parameters not passed # to the function are used as the default values tree <- system.file( "extdata", "19s.trees", package = "morpho") aln <- system.file( "extdata", "seqfile.aln", package = "morpho") # Uncomment the following lines followed by "##" and change the path in the filename so # it is saved where you want ##ctlMCMCtree( filename = "../mcmctree.ctl", mol = FALSE, seqfile = aln, treefile = tree, ##ndata = 1, clock = 2, TipDate = 1, RootAge = c("B(37.3, 66.0, 0.025, 0.025)"), ##BDparas = c( 1, 1, 0, 0.001 ), rgene_gamma = c( 2, 5 ), ##sigma2_gamma = c( 2, 2 ), burnin = 50000, sampfreq = 50, nsample = 20000 )
Kernel density function for the birth-death process with species sampling.
dBD(x, lambda, mu, rho, t1 = 1)
dBD(x, lambda, mu, rho, t1 = 1)
x |
numeric, vector of quantiles. |
lambda |
numeric, birth rate. |
mu |
numeric, death rate. |
rho |
numeric, proportion of species sampled. |
t1 |
numeric, age of the phylogeny's root. |
MCMCtree uses the BD kernel to generate the prior on node ages for those nodes without fossil calibrations. You can look at the examples below for some suggestions. Note rho must be between 0 and 1. The special case mu = lambda, rho=0 gives a uniform density. See Yang and Rannala (2006) for full details.
A numeric vector of probability densities.
Mario dos Reis
Yang and Rannala. (2006) Bayesian Estimation of Species Divergence Times Under a Molecular Clock Using Multiple Fossil Calibrations with Soft Bounds. Mol. Biol. Evol., 23: 212–226.
Yang (2014) Molecular Evolution: A Statistical Approach. Oxford University Press
# Reproduce Fig. 10.10 from Yang (2014) # (a) lambda = mu = 1, rho = 0 (uniform density): curve(dBD(x, 1, 1, 0), xlim=c(0, 1), ylim=c(0, 4), xaxs="i", yaxs="i") # (b) lambda = 10, mu = 5, rho = 0.01 (old node ages, useful for diversified # sampling): curve(dBD(x, 10, 5, .01), from=0, to=1, lty=2, add=TRUE) # (c) lambda = 10, mu = 5, rho = 0.001 (old node ages, useful for diversified # sampling): curve(dBD(x, 10, 5, .001), from=0, to=1, lty=3, add=TRUE) # (d) lambda = 10, mu = 5, rho = 0.99 (young node ages, useful for dense # sampling of diverse phylogenies): curve(dBD(x, 10, 5, .99), from=0, to=1, lty=4, add=TRUE)
# Reproduce Fig. 10.10 from Yang (2014) # (a) lambda = mu = 1, rho = 0 (uniform density): curve(dBD(x, 1, 1, 0), xlim=c(0, 1), ylim=c(0, 4), xaxs="i", yaxs="i") # (b) lambda = 10, mu = 5, rho = 0.01 (old node ages, useful for diversified # sampling): curve(dBD(x, 10, 5, .01), from=0, to=1, lty=2, add=TRUE) # (c) lambda = 10, mu = 5, rho = 0.001 (old node ages, useful for diversified # sampling): curve(dBD(x, 10, 5, .001), from=0, to=1, lty=3, add=TRUE) # (d) lambda = 10, mu = 5, rho = 0.99 (young node ages, useful for dense # sampling of diverse phylogenies): curve(dBD(x, 10, 5, .99), from=0, to=1, lty=4, add=TRUE)
Estimate marginal likelihood by thermodynamic integration and Gauss-Legendre
quadrature from a sample of n
power posterior MCMC chains sampled
with mcmctree (or bpp).
gauss.quad(mcmcf = "mcmc.txt", betaf = "beta.txt", se = TRUE)
gauss.quad(mcmcf = "mcmc.txt", betaf = "beta.txt", se = TRUE)
mcmcf |
character, mcmc output file name |
betaf |
character, file with beta values |
se |
logical, whether to calculate the standard error |
The MCMC samples should be stored in a directory structure created
by make.bfctlf
with method = "gauss-quad"
. The function
will read the stored log-likelihood values and calculate the log-marginal
likelihood.
Numerical integration is done using Gauss-Legendre quadrature. See Rannala and Yang (2017) for details (also dos Reis et al. 2017, Appendix 2).
A list with components logml
, the log-marginal likelihood estimate;
se
, the standard error of the estimate; mean.logl
, the mean of
log-likelihood values sampled for each beta; and b
, the beta values
used.
Mario dos Reis
Rannala B and Yang Z. (2017) Efficient Bayesian species tree inference under the multispecies coalescent. Systematic Biology 66: 823-842.
dos Reis et al. (2017) Using phylogenomic data to explore the effects of relaxed clocks and calibration strategies on divergence time estimation: Primates as a test case. bioRxiv
make.bfctlf
to prepare directories and mcmctree or bpp control
files to calculate the power posterior.
This dataset contains the results from the BPP A00 analysis of hominid evolution from Angelis and dos Reis (2015).
hominids
hominids
hominids
is a list with elements mcmc
, a dataframe with
20,000 rows and 8 columns, and tree
, an object of class
phylo
from the ape
package.
mcmc
is a posterior sample from a BPP A00 MCMC analysis containing
the relative divergence times (tau's) and nucleotide diversities (theta's)
for the four species ape (hominid) phylogeny.
tree
contains the phylogeny with node ages given as the posterior
means of the tau's in mcmc
.
K. Angelis and M. dos Reis (2015) The impact of ancestral population size and incomplete lineage sorting on Bayesian estimation of species divergence times. Curr. Zool., 61: 874–885.
Import more than one csv file with landmark points and return an array
object with dimensions p x k x n, being p
the number of
landmarks, k
the dimension of the coordinates (2D or 3D), and
n
the number of specimens (the number of files, as each file
contains the landmarks for one specimen).
lmk_imp(path = NULL, lmk.names = FALSE)
lmk_imp(path = NULL, lmk.names = FALSE)
path |
Character, absolute path to the directory with the csv files with the landmark points are. |
lmk.names |
Logical, TRUE if there is an extra column for landmark names, FALSE otherwise (see details). |
Note that all files need to be comma separated files (csv).
If lmk.names = TRUE
, the format expected for 3D landmarks is
the following:
landmarks | x | y | z |
lmk_1 | 0.143 | -0.028 | -0.030 |
lmk_2 | 0.128 | -0.024 | -0.035 |
... | ... | ... |
Otherwise, if lmk.names = TRUE
, then the format is:
x | y | z |
0.143 | -0.028 | -0.030 |
0.128 | -0.024 | -0.035 |
... | ... | ... |
Note that you can always have 2D landmarks, so the format is the same
but the column with the z
landmarks will not appear in the files.
Sandra Alvarez-Carretero
Estimate marginal likelihood from bootstrap replicates
stepping.stones.boot(R, betaf = "beta.txt", preff = "lnL") gauss.quad.boot(R, betaf = "beta.txt", preff = "lnL")
stepping.stones.boot(R, betaf = "beta.txt", preff = "lnL") gauss.quad.boot(R, betaf = "beta.txt", preff = "lnL")
R |
numeric, number of bootstrap replicates used |
betaf |
character, file with beta values |
preff |
character, prefix for files storing boot replicates |
stepping.stones.boot
and gauss.quad.boot
are used to calculate
the marginal likelihoods on bootstrap replicates using the stepping stones
and gaussian quadrature methods respectively. The replicates must have
been generated using block.boot.
A list with components logml
, the original log-marginal likelihood
estimate, logmlR
, the vector of log-marginal likelihood estimates on
the boostrap replicates, se
and ci
, the standard error and 95%
credibility interval of logml
calculated on the bootstrap replicates,
and b
, the beta values used.
block.boot, stepping.stones and gauss.quad.
Make appropriate beta values
make.beta(n, method = c("step-stones", "gauss-quad"), a = 5)
make.beta(n, method = c("step-stones", "gauss-quad"), a = 5)
n |
numeric, number of beta points |
method |
character, the method to choose the beta points, see details |
a |
numeric, exponent for stepping stones beta generation, see details |
If method = "step-stones"
, the beta values are given by the
formula
Values of a
between 5 to 8 appear appropriate. Large a
values
produce beta values close to zero.
If method = "gauss-quad"
, the beta values are calculated according to
the n
Gauss-Legendre quadrature rule (see Rannala and Yang, 2017).
Numeric vector with n
beta values
Mario dos Reis
Rannala B and Yang Z (2017) Efficient Bayesian species tree inference under the multispecies coalescent. Systematic Biology, 66: 823–842.
The generated beta values are suitable input for make.bfctlf
.
Prepare mcmctree or bpp control files for marginal likelihood calculation
make.bfctlf(beta, ctlf = "mcmctree.ctl", betaf = "beta.txt")
make.bfctlf(beta, ctlf = "mcmctree.ctl", betaf = "beta.txt")
beta |
numeric vector of beta values |
ctlf |
character, mcmctree or bpp control file template |
betaf |
character, file onto which to write selected beta values |
This function generates a set of n
directories each
containing a modified ctlf
control file with the appropriate beta
value to run mcmctree (or bpp) to obtain MCMC samples under the required
power-posterior distribution. For the general theory of marginal likelihood
calculation with power posteriors see Yang (2014).
The beta values are printed to betaf
.
Mario dos Reis
Yang Z (2014) Molecular Evolution: A Statistical Approach. Oxford University Press. Pages 256–260.
Convert a matrix with landmark points into an object of class array.
matrix2array(X, coords = c(2, 3))
matrix2array(X, coords = c(2, 3))
X |
Matrix of size |
coords |
Numeric, 2 or 3 for 2D or 3D landmarks, respectively. |
The matrix has format s x n
, with s
rows, one per specimen, and n
columns, one for each coordinate of the landmarks.
Each landmark can be given in 2D or 3D. For instance,
if the landmarks are 3D, the first 3 columns will be the
coordinates x, y, and z for the first landmark, the next 3
columns for the second landmark; and so on:
specimens | lmk1.x | lmk1.y | lmk1.z | lmk2.x | lmk2.y | lmk2.z | ... |
Sp_1 | 0.143 | -0.028 | -0.044 | 0.129 | 0.028 | -0.043 | ... |
Sp_2 | 0.128 | -0.024 | -0.028 | 0.124 | 0.027 | -0.025 | ... |
... | ... | ... | ... | ... | ... | ... | ... |
See object C
for an example of its format and data-raw/C.R
to
see how this object is generated.
An object of class array with format k x q x s
, where k
is the number of
landmarks, q
the number of coordinates, and s
the number of specimens.
Note that if the matrix provided does not have rownames, the specimens in the
returned array (names for the 's' matrices accessed through the array, i.e.
dimnames( array )[ 3 ]
will be labelled as '1', '2', and so on.
See object C.arr.unal
for an example of the format of the object that is
returned and data-raw/C.R
for the description of how to
obtain this object.
Sandra Alvarez-Carretero and Mario dos Reis
Calculate summaries from an MCMC run
mcmc.sum(mcmc)
mcmc.sum(mcmc)
mcmc |
a data frame with MCMCtree's output |
The data frame should have headers and should contain the output from the mcmc.txt file generated by MCMCtree.
A data frame with the mean, median, and 95
Obtain the ancestral reconstruction of characters from an MCMC sample using Felsenstein (1973) model of continuous character evolution.
mcmc2anc(tree, M, mcmc, time.name, rate.name, tip.ages = NULL)
mcmc2anc(tree, M, mcmc, time.name, rate.name, tip.ages = NULL)
tree |
a rooted, strictly bifurcating phylogeny |
M |
s x k matrix of k continuous morphological measurements for s species |
mcmc |
data frame with MCMC output from MCMCtree |
time.name |
character vector of length one |
rate.name |
character vector of length one |
tip.ages |
numeric, the ages of the terminal taxa in the tree |
The function first calculates the mean of divergence times and morphological rates in the MCMC sample. These are used to reconstruct the branch lengths in units of morphological evolution, and then Eq. (7) in Felsenstein (1973) is used to calculate the ancestral reconstruction on the tree.
Note time.name
is the name format used for the node ages in the MCMC
dataframe, usually of the form time.name = "t_"
. Similarly
rate.name
is the name format used in the MCMC sample for the rates, of
the form rate.name = "r_g1_"
where the subscript in "g" must be the
partition number containing the morphological rates. Note tree
must be
the same used by MCMCtree when generating the MCMC sample, right down to its
Newick representation. Taxon names in tree
and M
must match.
A n x k matrix with the ancestral reconstruction for the k characters at the n internal nodes of the phylogeny.
Mario dos Reis
Felsenstein J (1973) Maximum-likelihood estimation of evolutionary trees from continuous characters. Am J Hum Genet, 25: 471–492.
data(carnivores) # calculate tip ages correctly, as they're needed by the function: tips_info <- strsplit(carnivores$tree$tip.label, "\\^" ) back.ages <- as.numeric(unlist(tips_info)[seq(from=2, to=2*19, by=2)]) back.ages <- max(back.ages) - back.ages C <- carnivores$C.proc rownames(C) <- rownames(carnivores$M) recM = mcmc2anc(carnivores$tree, C, mcmc=carnivores$mcmc, time.name="t_", rate.name="r_g1_", tip.ages=back.ages) x <- seq(1, 87, by=3); y <- seq(2, 87, by=3) # Plot landmarks for the 19 carnivores: plot(carnivores$C.proc[,x], carnivores$C.proc[,y], pch='+', cex=.5) # plot ancestral reconstruction at the root (node 20): points(recM["20",x], recM["20",y], pch=19, col="red") # mean shape mS <- apply(carnivores$C.proc, 2, mean) points(mS[x], mS[y], pch=19, col="blue") # Convert reconstruction to an array, as is the standard in # morphometrics software ## Not run: recA <- matrix2array(recM, 3) options(rgl.printRglwidget = TRUE) rgl::plot3d(recA[,,"20"], ty='s', size=2, col="red", aspect=FALSE) ## End(Not run)
data(carnivores) # calculate tip ages correctly, as they're needed by the function: tips_info <- strsplit(carnivores$tree$tip.label, "\\^" ) back.ages <- as.numeric(unlist(tips_info)[seq(from=2, to=2*19, by=2)]) back.ages <- max(back.ages) - back.ages C <- carnivores$C.proc rownames(C) <- rownames(carnivores$M) recM = mcmc2anc(carnivores$tree, C, mcmc=carnivores$mcmc, time.name="t_", rate.name="r_g1_", tip.ages=back.ages) x <- seq(1, 87, by=3); y <- seq(2, 87, by=3) # Plot landmarks for the 19 carnivores: plot(carnivores$C.proc[,x], carnivores$C.proc[,y], pch='+', cex=.5) # plot ancestral reconstruction at the root (node 20): points(recM["20",x], recM["20",y], pch=19, col="red") # mean shape mS <- apply(carnivores$C.proc, 2, mean) points(mS[x], mS[y], pch=19, col="blue") # Convert reconstruction to an array, as is the standard in # morphometrics software ## Not run: recA <- matrix2array(recM, 3) options(rgl.printRglwidget = TRUE) rgl::plot3d(recA[,,"20"], ty='s', size=2, col="red", aspect=FALSE) ## End(Not run)
Plot a densi-tree from an MCMC sample from a BPP or MCMCTree analysis
mcmc2densitree( tree, mcmc, time.name, thin, col = "blue", alpha = 1, y.offset = 0, pfract = 0.1, plot.labels = TRUE, cex.lab = 1, axis = TRUE, add = FALSE, tip.ages = NULL )
mcmc2densitree( tree, mcmc, time.name, thin, col = "blue", alpha = 1, y.offset = 0, pfract = 0.1, plot.labels = TRUE, cex.lab = 1, axis = TRUE, add = FALSE, tip.ages = NULL )
tree |
an object of class phylo. |
mcmc |
data frame with an MCMC sample from MCMCTree or a BPP A00 analysis. |
time.name |
character vector of length one. |
thin |
numeric, the fraction of MCMC samples to keep. |
col |
character, the color for branches. |
alpha |
numeric, between 0 and 1, the branch color transparency. |
y.offset |
numeric, the vertical offset for plotting the tree. |
pfract |
numeric, how much of the plotting space to used for plotting
the tip labels. If |
plot.labels |
logical, whether to plot the tip labels. Ignored if
|
cex.lab |
numeric, the relative character size for the tip labels. |
axis |
logical, whether to plot the x axis. |
add |
logical, if TRUE add the trees to an existing plot, otherwise create a new plot. |
tip.ages |
numeric, the ages of the tips, with the most recent tip
having age zero, and the oldest tip having the largest age. If |
The function will reduce the MCMC sample to dim(mcmc)[1] *
thin
observations. Then the node ages in each observarion are used to plot
each tree in the sample. For a tree with s
species. The y
coordinates of the tips are given by 0:(s - 1) + y.offset
.
The tree
must be rooted, strictly bifurcating, and be the same tree
used to genarate the BPP (A00) or MCMCTree MCMC samples.
Mario dos Reis
data(microcebus) mcmc2densitree(microcebus$tree, microcebus$mcmc, time.name="tau_", thin=0.05, alpha=0.01, col="blue") title(xlab="Distance (substitutions per site)") # data(hominids) TODO: Fix this example (add msc2time.t function) # Calibrate the hominid phylogeny with a uniform fossil calibration of # between 6.5 to 10 Ma for the human-chimp divergence, and plot the # calibrated sample #calmsc <- msc2time.t(mcmc=hominids$mcmc, node="7humanchimp", calf=runif, # min=6.5, max=10) # mcmc2densitree(hominids$tree, calmsc, "t_", thin=0.05, alpha=0.01) # title(xlab="Divergence time (Ma)")
data(microcebus) mcmc2densitree(microcebus$tree, microcebus$mcmc, time.name="tau_", thin=0.05, alpha=0.01, col="blue") title(xlab="Distance (substitutions per site)") # data(hominids) TODO: Fix this example (add msc2time.t function) # Calibrate the hominid phylogeny with a uniform fossil calibration of # between 6.5 to 10 Ma for the human-chimp divergence, and plot the # calibrated sample #calmsc <- msc2time.t(mcmc=hominids$mcmc, node="7humanchimp", calf=runif, # min=6.5, max=10) # mcmc2densitree(hominids$tree, calmsc, "t_", thin=0.05, alpha=0.01) # title(xlab="Divergence time (Ma)")
Convert an MCMC sample from MCMCTree or BPP to a list of trees
mcmc2multiphylo(tree, mcmc, time.name, thin)
mcmc2multiphylo(tree, mcmc, time.name, thin)
tree |
an object of class phylo |
mcmc |
data frame with an MCMC sample from MCMCTree or a BPP A00 analysis |
time.name |
character vector of length one, this is usually
|
thin |
numeric, the fraction of MCMC samples to keep |
tree
must be rooted and strictly bifurcating, and it must
match the tree used by BPP or MCMCTree to obtain the MCMC sample. The
function uses the node ages in mcmc
to calculate branch lengths and
generate a list of trees (with the same topology as tree
), one tree
per (thinned) MCMC sample. The tips of the phylogeny are assumed to have
age zero.
An object of class multiPhylo (i.e., a list of trees).
Mario dos Reis
# hominid MCMC sample from BPP: data(hominids) hominid.trees <- mcmc2multiphylo(hominids$tree, hominids$mcmc, "tau_", 0.001) ## Not run: # If you have the ape package installed, you can output the trees in # Newick format ape::write.tree(hominid.trees) ## End(Not run)
# hominid MCMC sample from BPP: data(hominids) hominid.trees <- mcmc2multiphylo(hominids$tree, hominids$mcmc, "tau_", 0.001) ## Not run: # If you have the ape package installed, you can output the trees in # Newick format ape::write.tree(hominid.trees) ## End(Not run)
This dataset contains the results from the BPP A00 analysis of mouse lemur evolution in Madagascar from Yoder et al. (2016).
microcebus
microcebus
microcebus
is a list with elements mcmc
, a dataframe
with 20,000 rows and 12 columns, and tree
, an object of class
phylo
from the ape
package.
mcmc
is a posterior sample from a BPP A00 MCMC analysis containing
the relative divergence times (tau's) and nucleotide diversities (theta's)
for the six species mouse lemur (Microcebus spp) phylogeny.
tree
contains the phylogeny with node ages given as the posterior
means of the tau's in mcmc
.
A. D. Yoder, C. R. Campbell, M. B. Blanco, M. dos Reis, J. U. Ganzhorn, S. M. Goodman, K. E. Hunnicutt, P. A. Larsen, P. M. Kappeler, R. M. Rasoloarison, J. M. Ralison, D. L. Swofford, and D. W. Weisrock. (2016) Geogenetic patterns in mouse lemurs (genus Microcebus) reveal the ghosts of Madagascar's forests past. Proc. Nat. Acad. Sci. USA., 113: 8049–8056.
Perform a Procrustes alignment with the procSym
and align2procSym
and generate a morphological
alignment in MCMCtree format.
proc2MCMCtree( data, popdata, sp.data, sp.popdata, filename, coords, method = c("eigen", "chol"), ages, ... )
proc2MCMCtree( data, popdata, sp.data, sp.popdata, filename, coords, method = c("eigen", "chol"), ages, ... )
data |
Matrix or array, object with the data set with one specimen per species. See details. |
popdata |
Matrix or array, containing the specimens for the population of species with a larger variation (more than one specimen for this species). See details. |
sp.data |
Numeric, position of the specimen in the |
sp.popdata |
Numeric, position of the specimen in the |
filename |
Character, name for the output file. |
coords |
Numeric, 2 or 3 for 2D or 3D landmarks, respectively. |
method |
(Optional) character, either |
ages |
(Optional) list or vector, ages of the species included in the morpholical alignment. |
... |
(Optional) Further arguments passed to |
If the objects data
and popdata
are of class "matrix", then they have
s
or ps
rows, respectively, and n
columns as of the amount of characters.
These data sets are supposed to contain landmarks, which can be given in 2D or 3D.
For instance, if the landmarks are 3D, the first 3 columns will be the
coordinates x, y, and z for the first landmark, the next 3
columns for the second landmark; and so on.
specimens | lmk1.x | lmk1.y | lmk1.z | lmk2.x | lmk2.y | lmk2.z | ... |
Sp_1 | 0.143 | -0.028 | -0.044 | 0.129 | 0.028 | -0.043 | ... |
Sp_2 | 0.128 | -0.024 | -0.028 | 0.124 | 0.027 | -0.025 | ... |
... | ... | ... | ... | ... | ... | ... | ... |
You can load the object C.mat.unal
to see an example of this format and also
take a look at the data-raw/C.R
file for details about how to
generate it. Otherwise, if the objects data
and popdata
are of
class "array", they have format k x q x s
or k x q x ps
, respectively,
where k
is the number of landmarks, q
the number of coordinates,
s
the number of specimens for object data
, and ps
the number of
specimens in a sampled population of one species for object popdata
.
You can load the object C.arr.unal
to see an example of this array format and
also take a look at the data-raw/C.R
file for details about how to
generate it.
The names of the specimens will be taken from either the row names of the data
matrices, if data
and popdata
objects are of class "matrix", or from
the names of the third dimension of the array, if these objects are of class
"array". If no names are found, the specimens will be labelled as "Specimen_1",
"Specimen_2", and so on.
Note that you are providing a population matrix with the landmarks collected from
more than on specimen belonging to the same species. Therefore, it is assumed that
population noise is accounted for.
Furthermore, the landmarks of one of these specimens are also present in the
morphological alignment. First, a Procrusts alignment is performed with the data set
with one specimen per species (data
), and then all the specimens in popdata
except for the specimen common in data
, which position is specified in the
argument sp.popdata
, are aligned to the mean shape of the
PA generated with data
. Take a look at data.raw/V.R
for more details.
The logarithm of the determinant of the correlation matrix is going to be printed in the output file to later be used by MCMCtree during the likelihood calculation.
The function procSym
can perform a Procrustes superimposition
alignment given a dataset of landmarks in array format and allows the user to pass
different options to the arguments defined.
The current function runs with the default parameters
in procSym
but allows the user to pass three arguments to
proc2MCMCtree
through ...
: scale, reflect, and pairedLM.
If you are thinking of using these arguments, read the documentation in procSym
,
otherwise do not pass further arguments to proc2MCMCtree
and let it run
procSym
in default mode.
$dataPS |
Object of class |
$popdataPS |
Array with the morphological alignment with the object
passed to |
$M |
Matrix with the morphological alignment with the specimens of all species. Note that this alignment is not corrected for character correlation nor population noise. The corrected alignment is only printed in the output alignment file. |
$Rsh |
Estimated shrinkage correlation matrix. |
$c |
Estimated population variance. |
Sandra Alvarez-Carretero and Mario dos Reis
write.morpho
, matrix2array
, array2matrix
## Not run: # A. Use the unaligned, but processed, carnivoran data (data = C.mat.unal) and # vulpes data (popdata = V.mat.unal) to obtain a morphological alignment. # The fox specimen that is common in V.mat.unal and C.mat.unal is the # one in the first row of V.mat.unal (sp.popdata = 1) and the one in position # 13 in C.mat.unal (sp.data = 13). The method to # decompose the estimate shrinkage correlation matrix (internally calculated # in this function) is the Cholesky decomposition (method = c("chol")). # We do not add ages. right <- c( 11, 22, 13, 19, 15, 20, 24, 5, 7, 2, 9, 26, 29 ) left <- c( 10, 21, 12, 17, 14, 18, 23, 4, 6, 1, 8, 25, 28 ) pairedLM <- cbind( left, right ) obj.aln <- proc2MCMCtree( data = C.mat.unal, popdata = V.mat.unal, sp.data = 13, sp.popdata = 1, filename = "./seqfile.aln", coords = 3, method = c("chol"), pairedLM = pairedLM ) # B. Use the unaligned, but processed, carnivoran data (data = C.mat.unal) and # vulpes data (popdata = V.mat.unal) to obtain a morphological alignment. # The fox specimen that is common in V.mat.unal and C.mat.unal is the # one in the first row of V.mat.unal (sp.popdata = 1) and the one in position # 13 in C.mat.unal (sp.data = 13). The method to # decompose the estimate shrinkage correlation matrix (internally calculated # in this function) is the Cholesky decomposition (method = c("chol")). # We add ages. ages <- list( sp1 = 13.135, sp2 = 0.0285, sp3 = 11.95, sp4 = 35.55, sp5 = 25.615, sp6 = 14.785, sp7 = 28.55, sp8 = 0, sp9 = 0, sp10 = 0, sp11 = 0, sp12 = 0, sp13 = 0, sp14 = 0, sp15 = 0, sp16 = 0, sp17 = 6.65, sp18 = 0.0285, sp19 = 0 ) right <- c( 11, 22, 13, 19, 15, 20, 24, 5, 7, 2, 9, 26, 29 ) left <- c( 10, 21, 12, 17, 14, 18, 23, 4, 6, 1, 8, 25, 28 ) pairedLM <- cbind( left, right ) obj.aln <- proc2MCMCtree( data = C.mat.unal, popdata = V.mat.unal, sp.data = 13, sp.popdata = 1, filename = "./seqfile.aln", coords = 3, method = c("chol"), pairedLM = pairedLM, ages = ages ) ## End(Not run)
## Not run: # A. Use the unaligned, but processed, carnivoran data (data = C.mat.unal) and # vulpes data (popdata = V.mat.unal) to obtain a morphological alignment. # The fox specimen that is common in V.mat.unal and C.mat.unal is the # one in the first row of V.mat.unal (sp.popdata = 1) and the one in position # 13 in C.mat.unal (sp.data = 13). The method to # decompose the estimate shrinkage correlation matrix (internally calculated # in this function) is the Cholesky decomposition (method = c("chol")). # We do not add ages. right <- c( 11, 22, 13, 19, 15, 20, 24, 5, 7, 2, 9, 26, 29 ) left <- c( 10, 21, 12, 17, 14, 18, 23, 4, 6, 1, 8, 25, 28 ) pairedLM <- cbind( left, right ) obj.aln <- proc2MCMCtree( data = C.mat.unal, popdata = V.mat.unal, sp.data = 13, sp.popdata = 1, filename = "./seqfile.aln", coords = 3, method = c("chol"), pairedLM = pairedLM ) # B. Use the unaligned, but processed, carnivoran data (data = C.mat.unal) and # vulpes data (popdata = V.mat.unal) to obtain a morphological alignment. # The fox specimen that is common in V.mat.unal and C.mat.unal is the # one in the first row of V.mat.unal (sp.popdata = 1) and the one in position # 13 in C.mat.unal (sp.data = 13). The method to # decompose the estimate shrinkage correlation matrix (internally calculated # in this function) is the Cholesky decomposition (method = c("chol")). # We add ages. ages <- list( sp1 = 13.135, sp2 = 0.0285, sp3 = 11.95, sp4 = 35.55, sp5 = 25.615, sp6 = 14.785, sp7 = 28.55, sp8 = 0, sp9 = 0, sp10 = 0, sp11 = 0, sp12 = 0, sp13 = 0, sp14 = 0, sp15 = 0, sp16 = 0, sp17 = 6.65, sp18 = 0.0285, sp19 = 0 ) right <- c( 11, 22, 13, 19, 15, 20, 24, 5, 7, 2, 9, 26, 29 ) left <- c( 10, 21, 12, 17, 14, 18, 23, 4, 6, 1, 8, 25, 28 ) pairedLM <- cbind( left, right ) obj.aln <- proc2MCMCtree( data = C.mat.unal, popdata = V.mat.unal, sp.data = 13, sp.popdata = 1, filename = "./seqfile.aln", coords = 3, method = c("chol"), pairedLM = pairedLM, ages = ages ) ## End(Not run)
Estimated shrinkage correlation matrix obtained after using the
corpcor::cor.shrink
package.
R.sh
R.sh
A matrix of size n x n, where n = 87 (morphological traits, 29 landmarks x 3D coordinates):
Number of traits for which the correlation values have been calculated, 87
Simulate a continuous morphological alignment using rTraitCont
and later allowing to account for population noise and correlation among characters.
sim.morpho(tree, n, c = 0, R, ...)
sim.morpho(tree, n, c = 0, R, ...)
tree |
Phylo, object with a phylogenetic tree
(see |
n |
Numeric, number of morphological traits to be simulated. |
c |
(Optional) numeric, vector with variances for the speccies within a population to add as population noise to the simulated morphological traits (see details). |
R |
(Optional) matrix, correlation matrix (see details). |
... |
Further arguments passed to |
The function rTraitCont
simulates continuous traits and
can take different parameters to adjust the simulation
(e.g. the model, the rate drift, etc.).
These parameters are the ones the user can pass to the argument ...
in
sim.morpho
.
The default values that sim.morpho
uses are
model = "BM"
, sigma = 1
, ancestor = F
,
and root.value = 0
. For this kind of simulation,
sim.morpho
allows only ancestor = F
, so please
do not change this parameter.
In the rTraitCont
package, the parameter
model
can be model = BM
, model = OU
, or
a function model = FUN
provided by the user. Currently,
sim.morpho
supports only the first two.
The parameter c
contains the population noise, which is used to simulate
the noise matrix. Each parameter follows a normal distribution, x ~ N(0,c)
.
If the variances are assumed to be the same for all characters within the species,
then the length of c
is 1 and equals to the value of this variance.
If it differs, then a vector of length n
has to
be provided specifying the variance for each of the characters.
The simulated noise is later added to the morphological data previously
generated, so we obtain the noisy matrix.
If a correlation matrix, R
, is provided, then it is added to the
noisy matrix. See object sim.R
for an example of its format and
data-raw/sim.R
to understand how it can be generated.
Note that the correlation matrix needs to be of class "matrix"
and symmetric.
M |
Matrix with the simulated morphological continuous data accounting for noise and, if provided, for population variance and trait correlation too. |
Sandra Alvarez-Carretero and Mario dos Reis
# A) Simulation setup: Simulate a morphological alignment # with n = 100 continuous characters for a phylogeny # defined in object 'tree', with the default parameters in # 'sim.morpho' to run 'rTraitCont'. # Population noise and character correlation are not considered, # i.e. c = 0 and R not provided. sim.morpho( tree = sim.tree, n = 100 ) # B) Simulation setup: Simulate a morphological alignment # with n = 100 continuous characters for a phylogeny # defined in object 'tree', but with different parameters # than the default ones in 'sim.morpho' to run # 'rTraitCont'. Population noise and trait correlation are not # considered, i.e. c = 0 and R not provided. sim.morpho( tree = sim.tree, n = 100, model = "OU", sigma = 0.2, alpha = 2 ) # C) Simulation setup: Simulate a morphological alignment # with n = 100 continuous characters for a phylogeny # defined in object 'tree', with the default parameters in # 'sim.morpho' to run 'rTraitCont'. # Population noise is low, c = 0.25, but trait correlation is not # considered, i.e. R not provided. sim.morpho( tree = sim.tree, n = 100, c = 0.25 ) # D) Simulation setup: Simulate a morphological alignment # with n = 100 continuous characters for a phylogeny # defined in object 'tree', with the default parameters in # 'sim.morpho' to run 'rTraitCont'. # Population noise is low, c = 0.25, and a correlation # matrix to simulate trait correlation (rho = 0.50) is provided. sim.morpho( tree = sim.tree, n = 100, c = 0.25, R = sim.R )
# A) Simulation setup: Simulate a morphological alignment # with n = 100 continuous characters for a phylogeny # defined in object 'tree', with the default parameters in # 'sim.morpho' to run 'rTraitCont'. # Population noise and character correlation are not considered, # i.e. c = 0 and R not provided. sim.morpho( tree = sim.tree, n = 100 ) # B) Simulation setup: Simulate a morphological alignment # with n = 100 continuous characters for a phylogeny # defined in object 'tree', but with different parameters # than the default ones in 'sim.morpho' to run # 'rTraitCont'. Population noise and trait correlation are not # considered, i.e. c = 0 and R not provided. sim.morpho( tree = sim.tree, n = 100, model = "OU", sigma = 0.2, alpha = 2 ) # C) Simulation setup: Simulate a morphological alignment # with n = 100 continuous characters for a phylogeny # defined in object 'tree', with the default parameters in # 'sim.morpho' to run 'rTraitCont'. # Population noise is low, c = 0.25, but trait correlation is not # considered, i.e. R not provided. sim.morpho( tree = sim.tree, n = 100, c = 0.25 ) # D) Simulation setup: Simulate a morphological alignment # with n = 100 continuous characters for a phylogeny # defined in object 'tree', with the default parameters in # 'sim.morpho' to run 'rTraitCont'. # Population noise is low, c = 0.25, and a correlation # matrix to simulate trait correlation (rho = 0.50) is provided. sim.morpho( tree = sim.tree, n = 100, c = 0.25, R = sim.R )
Simulate a population sample and return a list with (i) a matrix of
size s x n
, s
specimens and n
characters, (ii) a
vector with the estimated population variances for each character,
and (iii) the estimated shrinkage correlation matrix if the true
correlation matrix is provided.
sim.pop(psample, n, c, R)
sim.pop(psample, n, c, R)
psample |
Numeric, number of specimens the simulated population sample should include. |
n |
Numeric, number of morphological traits to be simulated. |
c |
Numeric, vector with the variances for the species within a population (see details). |
R |
(Optional) matrix, correlation matrix. (see details). |
The parameter c
is the population noise and it is used to sample
n
characters for each of the psample
specimens from a
normal distribution x ~ N(0,c)
.
If the population noise is assumed to be the same for all the characters
within the species, then the length of c
is 1 and equals to the value
of this variance.
If it differs, then a vector of length n
has to
be provided specifying the variance for each of the characters
If a correlation matrix, R
, is provided, then it is added to the
population matrix. Note that the correlation matrix needs to be of class "matrix"
and symmetric. You can take a look at data-raw/sim.R.R
to follow
the commands used to generate this matrix, object R.sim
, which is used
in the examples.
$P |
Matrix with the simulated population sample |
$var |
Vector with the estimated variances |
$Rsh |
Estimated shrinkage correlation matrix, only returned if |
Sandra Alvarez-Carretero and Mario dos Reis
# A) Simulation setup: Simulate a population with # psample = 20 specimens, and sample n = 100 characters with # a low population noise, c = 0.25. sim.pop( psample = 20, n = 100, c = 0.25 ) # B) Simulation setup: Simulate a population with # psample = 20 specimens, and sample n = 100 characters with # a low population noise, c = 0.25, and a low trait correlation # rho = 0.50 (correlation matrix that follows # the constant correlation model, i.e. all non-diagonal values # equal to rho). sim.pop( psample = 20, n = 100, c = 0.25, R = sim.R )
# A) Simulation setup: Simulate a population with # psample = 20 specimens, and sample n = 100 characters with # a low population noise, c = 0.25. sim.pop( psample = 20, n = 100, c = 0.25 ) # B) Simulation setup: Simulate a population with # psample = 20 specimens, and sample n = 100 characters with # a low population noise, c = 0.25, and a low trait correlation # rho = 0.50 (correlation matrix that follows # the constant correlation model, i.e. all non-diagonal values # equal to rho). sim.pop( psample = 20, n = 100, c = 0.25, R = sim.R )
True correlation matrix simulated to be used in the examples detailed in the sim.pop()
function. The matrix follows the constant correlation model, hence all values
outside the diagonal are rho = 0.50. The size is p x p
, being n = 100
the number of characters.
sim.R
sim.R
A matrix of size n x n, where n = 100 (morphological traits, the 100 simulated continuous traits):
Number of simulated continuous traits for which the correlation values have been calculated, 100
Simulated 8-species tree of class "phylo". The function
read.tree
was used to generate this object.
sim.tree
sim.tree
An object of class "phylo" with a simulated species tree with s = 8 species. It contains the following components:
Two-column matrix with 14 rows, where every row is one edge in the tree, the first column is the ancestor node and the second column its daughter node
A numeric vector with the branch lengths of the tree
Numeric, the number of (internal) nodes
A vector with the names of the tips, class "character"
Estimate the marginal likelihood using the stepping stones
method from a sample of n
power posterior MCMC chains sampled
with mcmctree (or bpp).
stepping.stones(mcmcf = "mcmc.txt", betaf = "beta.txt", se = TRUE)
stepping.stones(mcmcf = "mcmc.txt", betaf = "beta.txt", se = TRUE)
mcmcf |
character, mcmc output file name |
betaf |
character, file with beta values |
se |
logical, whether to calculate the standard error |
The MCMC samples should be stored in a directory structure created
by make.bfctlf
with method = "step-stones"
. The function will
read the stored log-likelihood values and calculate the log-marginal
likelihood.
If se = TRUE
, an approximation based on the Delta method is used to
calculate the standard error (see Xie et al. 2011). Warnings are given if the
approximation appears unreliable.
A list with components logml
, the log-marginal likelihood estimate;
se
, the standard error of the estimate; mean.logl
, the mean of
log-likelihood values sampled for each beta; and b
, the beta values
used.
Mario dos Reis
Xie et al. (2011) Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Systematic Biology, 60: 150–160.
make.bfctlf
to prepare directories and mcmctree or bpp control
files to calculate the power posterior.
Outputs a tree file to be used in MCMCtree. The path to this file needs to be written next to the option "treefile = " in the control file for MCMCtree.
treeMCMCtree(tree, aln, filename)
treeMCMCtree(tree, aln, filename)
tree |
Character, path to the file with the tree topology in Newick format, without branch lengths (see details). |
aln |
Character, path to the alignment file output by
|
filename |
Character, name for the output file. |
If you used write.morpho
or proc2MCMCtree
and passed a vector or a list with the ages of the specimens in your
alignment, you will see that the names in the output file with this
alignment are followed by a "^" and a value. This value is transformed
to the age you input within MCMCtree, as if it was a tip date, and used
to estimate the divergence times of the species in your phylogeny.
This function generates a tree file with the same species names than in
the morphological alignment, i.e. with the ages to be used by MCMCtree
next to the names of each specimen.
Remember to write the path to the tree file next to the "treefile = "
option in the control file to run MCMCtree.
Remember that the names without "^" followed by the ages have to be the
same in the file you pass to parameter "tree" than the ones you have
in the alignment file passed to "aln".
# Use the file with the tree topology (no branch lengths) and the # file with the morphological alignment that are saved in the # inst/extdata directory to generate a tree file with the # ages used by MCMCtree included. # We call the output tree file "treefile.txt". tree <- system.file( "extdata", "19s.trees", package = "mcmc3r") aln <- system.file( "extdata", "seqfile.aln", package = "mcmc3r") treeMCMCtree( tree = tree, aln = aln, filename = "treefile.txt" )
# Use the file with the tree topology (no branch lengths) and the # file with the morphological alignment that are saved in the # inst/extdata directory to generate a tree file with the # ages used by MCMCtree included. # We call the output tree file "treefile.txt". tree <- system.file( "extdata", "19s.trees", package = "mcmc3r") aln <- system.file( "extdata", "seqfile.aln", package = "mcmc3r") treeMCMCtree( tree = tree, aln = aln, filename = "treefile.txt" )
A matrix containing the 29 3D landmarks collected from the skulls of 21 "Vulpes vulpes" specimens after carrying out a Procrustes analysis (PA). Please take a look at the description in morpho/data-raw/V.R to understand how this object was generated.
V
V
A matrix with s = 21 rows and n = 87 columns (87/3 = 29 landmarks):
Rows, specimens from which landmarks were collected, 21
Columns, 87 traits (29 landmarks in 3D) after the PA
A 3D array containing the 29 3D landmarks collected from the skulls of 21 "Vulpes vulpes" specimens before carrying out a Procrustes analysis (PA). Please take a look at the description in morpho/data-raw/V.R to understand how this object was generated.
V.arr.unal
V.arr.unal
An array with k = 29 (landmarks), q = 3 (coordinates) and s = 21 (specimens):
landmark points collected from 21 foxes specimens, 29
coordinates in 3D or 2D, 3
number of specimens, 21
A matrix containing the 29 3D landmarks collected from the skulls of 21 "Vulpes vulpes" specimens before carrying out a Procrustes analysis (PA). Please take a look at the description in morpho/data-raw/V.R to understand how this object was generated.
V.mat.unal
V.mat.unal
A matrix with s = 21 rows and n = 87 columns (87/3 = 29 landmarks):
Rows, specimens from which landmarks were collected, 21
Columns, 87 traits (29 landmarks in 3D) after the PA
Object of class array output by Morpho after being aligned to the mean shape of the 19 carnivoran species previously generated (object "C.PS"). Please take a look at the description in morpho/data-raw/V.R to understand how this object was generated.
V.PS.nov1
V.PS.nov1
Object procSym
Check procSym
for more details
Vector with 87 within-species variances calculated from the object V. Please take a look at the description in morpho/data-raw/var.foxes.R to understand how this object was generated.
var.foxes
var.foxes
A vector with i = 87 variances regarding the "Vulpes vulpes" population:
Number of variances for the foxes population, 87
A dataset containing the 29 3D landmarks collected from the skulls of 19 carnivoran specimens This data.drame consists of a first column with the specimens labels used by MCMCtree and then 87 columns with the landmarks (29 landmarks x 3 coordinates). Please take a look at the description in morpho/data-raw/vulpes21x29.raw.R to understand how this object was generated.
vulpes21x29.raw
vulpes21x29.raw
A data.frame with n = 21 rows and p = 88 columns (info column + 87 coordinates):
Rows, specimens from which landmarks were collected, 21
Columns, information about the taxa (1st column) and 87 coordinates (29 landmarks in 3D)
Generate an alignment file with quantitative characters in MCMCtree format. The option "seqfile" in the control file used by MCMCtree should read the path to the file output by this function.
write.morpho( M, filename, c = 0, R = diag(1, dim(M)[2]), method, A = NULL, names, ages, scale = 1 )
write.morpho( M, filename, c = 0, R = diag(1, dim(M)[2]), method, A = NULL, names, ages, scale = 1 )
M |
Matrix, |
filename |
Character, name for the output file. |
c |
Numeric, vector of variances within the species of a population (see details).
If not provided, |
R |
Matrix, correlation matrix. Requires |
method |
(Optional) character, either |
A |
(Optional) matrix, decomposed matrix. Requires |
names |
(Optional) list or character, species name included in the morphological alignment (see examples B and C). |
ages |
(Optional) list or numeric, ages of the species included in the morphological alignment (see example C). |
scale |
numeric, all characters are multiplied by |
The matrix M
has s
rows, one for each specimen, and n
columns regarding the characters. If the data set contains landmarks,
they can be given in 2D or 3D.
For instance, if the landmarks are 3D, the first 3 columns will be the
coordinates x, y, and z for the first landmark; the next 3
columns for the second landmark; and so on.
specimens | lmk1.x | lmk1.y | lmk1.z | lmk2.x | lmk2.y | lmk2.z | ... |
Sp_1 | 0.143 | -0.028 | -0.044 | 0.129 | 0.028 | -0.043 | ... |
Sp_2 | 0.128 | -0.024 | -0.028 | 0.124 | 0.027 | -0.025 | ... |
... | ... | ... | ... | ... | ... | ... | ... |
See descriptions in data-raw/carnivores19x29.raw.R
,
data-raw/vulpes21x29.raw.R
, data-raw/C.R
, data-raw/V.R
,
to know how to obtain the morphological alignment used in write.morpho
.
Note that the explanation starts with the processing of raw data.
If the data set contains a set of n
morphological continuous characters,
e.g. from a simulated data set, the file should look like
specimens | char.1 | char.2 | char.3 | char.4 | ... |
Sp_1 | 0.143 | -0.028 | -0.044 | 0.129 | ... |
Sp_2 | 0.128 | -0.024 | -0.028 | 0.124 | ... |
... | ... | ... | ... | ... | ... |
Note that if a list with the specimens names is not passed to the parameter names
,
the name for each species will be "Species_1", "Species_2", and so on.
The object c
can be of length 1, if all characters have the same variance, or
a vector of length n
with the variance of each of the characters. For the latter,
you can take a look at object var.foxes
, which has been generated following
the steps explained in data-raw/var.foxes.R
.
The object R
has to be a symmetric and positive definite object of class matrix,
i.e. class( R ) = "matrix"
. See R.sh
for an example of its format.
You can also read the description in data-raw/R.shrinkage.R
for the details
about how to generate this matrix.
The logarithm of the determinant of the correlation matrix is going to be printed in the output file to later be used by MCMCtree during the likelihood calculation.
If a correlation matrix R
is provided, write.morpho
can use either
the method = "chol"
or method = "eigen"
to get a matrix A
such that .
This matrix
is later used to transform
the morphological data to account for the correlation in this data set,
so that the transformed characters in
Z
,
, are independent.
Alternatively, this matrix
A
can also be provided by the user. You can
read the description to generate this matrix in data-raw/A.R
to
understand how to generate this matrix, which is also available as object A
.
If you decide to use a matrix A
, it will be used to transform the data
and no decomposition will be performed, thus saving computational time when
large matrices are to be used.
Sandra Alvarez-Carretero and Mario dos Reis
matrix2array
, array2matrix
, sim.morpho
# A.1) Providing the morphological alignment (M = C) and # the name for the output file. This does not account for # correlation nor population noise write.morpho( M = C, filename = "seqfile.aln" ) # A.2) Providing the morphological alignment (M = C), the population # noise (c = 0.25), and the name for the output file. Note that # c = 0.25 means that the population noise for all the characters # is c = 0.25, i.e. it will be considered as if # length( c ) = p characters, being all of them 0.25. write.morpho( M = C, c = 0.25, filename = "seqfile.aln" ) # A.3) Providing the morphological alignment (M = C), the population # noise (c = 0.25), the (estimate of the) correlation matrix (R), # the method to decompose R ("chol" in this example), # and the name for the output file. Note that the R matrix needs # to be invertible, otherwise the data will not be able to be # transformed accounting for correlation. write.morpho( M = C, c = 0.25, R = R.sh, method = "chol", filename = "seqfile.aln" ) # A.4) Providing the morphological alignment (M = C), a vector with # the population noise for each character (c = var.foxes), # the (shrinkage estimate of the) correlation matrix (R = R.sh), # the method to decompose R ("chol" in this example), # and the name for the output file. Note that the matrix passed # to argument R needs to be invertible, otherwise the data will # not be able to be transformed accounting for correlation. write.morpho( M = C, c = var.foxes, R = R.sh, method = "chol", filename = "seqfile.aln" ) # A.5) Providing the morphological alignment (C), a vector with # the population noise for each character (c = var.foxes), # the (shrinkage estimate of the) correlation matrix (R = R.sh), # the A matrix to transform the data, and the name for the # output file. Note that as the A matrix is provided, the matrix # passed to R will not be decomposed, hence the argument "method" # is not needed. write.morpho( M = C, c = var.foxes, R = R.sh, A = A, filename = "seqfile.aln" ) # B) Scenario A.5 but providing a list with the # names of the species names <- list( sp1 = "Ael_sp.", sp2 = "Can_dir", sp3 = "Epi_hay", sp4 = "Hes_sp.", sp5 = "Par_jos", sp6 = "Tom_sp.", sp7 = "Enh_pah", sp8 = "Cuo_alp", sp9 = "Spe_ven", sp10 = "Can_lup", sp11 = "Cer_tho", sp12 = "Oto_meg", sp13 = "Urs_ame", sp14 = "Ail_ful", sp15 = "Nan_bin", sp16 = "Par_her", sp17 = "Hia_won", sp18 = "Smi_fat", sp19 = "Vul_vul" ) write.morpho( M = C, c = var.foxes, R = R.sh, A = A, filename = "seqfile.aln", names = names ) # C) Scenario A.5 but providing a vector of type character with the names of # the specimens and a list with their corresponding ages. Please # keep the same order in both lists, so the first specimen in the # list name corresponds to the first age in the age list, and so on. names <- c( "Ael_sp.", "Can_dir", "Epi_hay", "Hes_sp.", "Par_jos", "Tom_sp.", "Enh_pah", "Cuo_alp", "Spe_ven", "Can_lup", "Cer_tho", "Oto_meg", "Urs_ame", "Ail_ful", "Nan_bin", "Par_her", "Hia_won", "Smi_fat", "Vul_vul" ) ages <- list( sp1 = 13.135, sp2 = 0.0285, sp3 = 11.95, sp4 = 35.55, sp5 = 25.615, sp6 = 14.785, sp7 = 28.55, sp8 = 0, sp9 = 0, sp10 = 0, sp11 = 0, sp12 = 0, sp13 = 0, sp14 = 0, sp15 = 0, sp16 = 0, sp17 = 6.65, sp18 = 0.0285, sp19 = 0 ) write.morpho( M = C, c = var.foxes, R = R.sh, A = A, filename = "seqfile.aln", names = names, ages = ages )
# A.1) Providing the morphological alignment (M = C) and # the name for the output file. This does not account for # correlation nor population noise write.morpho( M = C, filename = "seqfile.aln" ) # A.2) Providing the morphological alignment (M = C), the population # noise (c = 0.25), and the name for the output file. Note that # c = 0.25 means that the population noise for all the characters # is c = 0.25, i.e. it will be considered as if # length( c ) = p characters, being all of them 0.25. write.morpho( M = C, c = 0.25, filename = "seqfile.aln" ) # A.3) Providing the morphological alignment (M = C), the population # noise (c = 0.25), the (estimate of the) correlation matrix (R), # the method to decompose R ("chol" in this example), # and the name for the output file. Note that the R matrix needs # to be invertible, otherwise the data will not be able to be # transformed accounting for correlation. write.morpho( M = C, c = 0.25, R = R.sh, method = "chol", filename = "seqfile.aln" ) # A.4) Providing the morphological alignment (M = C), a vector with # the population noise for each character (c = var.foxes), # the (shrinkage estimate of the) correlation matrix (R = R.sh), # the method to decompose R ("chol" in this example), # and the name for the output file. Note that the matrix passed # to argument R needs to be invertible, otherwise the data will # not be able to be transformed accounting for correlation. write.morpho( M = C, c = var.foxes, R = R.sh, method = "chol", filename = "seqfile.aln" ) # A.5) Providing the morphological alignment (C), a vector with # the population noise for each character (c = var.foxes), # the (shrinkage estimate of the) correlation matrix (R = R.sh), # the A matrix to transform the data, and the name for the # output file. Note that as the A matrix is provided, the matrix # passed to R will not be decomposed, hence the argument "method" # is not needed. write.morpho( M = C, c = var.foxes, R = R.sh, A = A, filename = "seqfile.aln" ) # B) Scenario A.5 but providing a list with the # names of the species names <- list( sp1 = "Ael_sp.", sp2 = "Can_dir", sp3 = "Epi_hay", sp4 = "Hes_sp.", sp5 = "Par_jos", sp6 = "Tom_sp.", sp7 = "Enh_pah", sp8 = "Cuo_alp", sp9 = "Spe_ven", sp10 = "Can_lup", sp11 = "Cer_tho", sp12 = "Oto_meg", sp13 = "Urs_ame", sp14 = "Ail_ful", sp15 = "Nan_bin", sp16 = "Par_her", sp17 = "Hia_won", sp18 = "Smi_fat", sp19 = "Vul_vul" ) write.morpho( M = C, c = var.foxes, R = R.sh, A = A, filename = "seqfile.aln", names = names ) # C) Scenario A.5 but providing a vector of type character with the names of # the specimens and a list with their corresponding ages. Please # keep the same order in both lists, so the first specimen in the # list name corresponds to the first age in the age list, and so on. names <- c( "Ael_sp.", "Can_dir", "Epi_hay", "Hes_sp.", "Par_jos", "Tom_sp.", "Enh_pah", "Cuo_alp", "Spe_ven", "Can_lup", "Cer_tho", "Oto_meg", "Urs_ame", "Ail_ful", "Nan_bin", "Par_her", "Hia_won", "Smi_fat", "Vul_vul" ) ages <- list( sp1 = 13.135, sp2 = 0.0285, sp3 = 11.95, sp4 = 35.55, sp5 = 25.615, sp6 = 14.785, sp7 = 28.55, sp8 = 0, sp9 = 0, sp10 = 0, sp11 = 0, sp12 = 0, sp13 = 0, sp14 = 0, sp15 = 0, sp16 = 0, sp17 = 6.65, sp18 = 0.0285, sp19 = 0 ) write.morpho( M = C, c = var.foxes, R = R.sh, A = A, filename = "seqfile.aln", names = names, ages = ages )