Total-evidence dating of carnivores integrating molecular and morphometric data

In this tutorial, we will use geometric morphometrics (GMM) and molecular data from extant and fossil carnivores for Bayesian inference of speciation timings. The tutorial reproduces the analysis in:

Bayesian estimation of species divergence times using correlated quantitative characters
Álvarez-Carretero S, Goswami A, Yang Z, and dos Reis M. (2019) Systematic Biology, 68: 967–986.
DOI: 10.1093/sysbio/syz015

The tutorial is divided into the following steps:

Collect and process morphological data
Perform Procrustes alignment of GMM data
Transform a GMM alignment to account for population noise and character correlation
Carry out MCMC divergence dating with MCMCtree using GMM and molecular data
Visualise and validate divergence dating results

Before starting the tutorial, make sure you have the mcmc3r R package and its dependencies installed. Steps 4 and 5 require that you: (1) download and install the command line version of MCMCTree from PAML (make sure MCMCtree is in your system’s PATH, as explained in PAML’s website), and (2) download the tree, alignment and control files necessary to reproduce the analysis with MCMCtree, and save these files to a directory called carnivores/. This tutorial was tested with PAML version 4.9j, and thus we recommend you use this version when doing the tutorial excercises.

1. Collect and process morphological data

First, landmark measurements from the bones of extinct and extant taxa need to be collected. Here we use landmarks measured from the skulls of 19 carnivoran species.

If there is one species for which more than one specimen is available, we can use the various specimens to estimate the covariance matrix of landmarks, which can be used to account for population noise and caracter correlation before Bayesian inference. Here we use 21 specimens of Vulpes vulpes (common fox) for this purpose. Therefore, we have two data sets:

The foxes (Vulpes vulpes) data set: landmarks collected from 21 fox specimens.
The carnivoran data set: landmarks collected from 19 species (one specimen per species, with one fox specimen also present in the foxes data set).

Specimens information

The 19 carnivoran specimens are detailed in the next table. The oldest fossil species is Hesperocyon sp. dating to roughly 35.6 Millions years ago (Ma).

Taxon^a	Voucher specimen	Specimen age (mid-point age^b), Ma	Reference^c
Hesperocyon sp. †	NMNH 459576	35.5500 (37.2000-33.9000)	National Museum of Natural History collection
Enhydrocyon pahinsintewakpa †	AMNH 27579	28.5500 (30.800-26.3000)	Wang 1994, pp. 89-90
Paraenhydrocyon josephi †	YPM 12702	25.6150 (30.8000-20.4300)	Wang 1994, p. 135 & p. 141
Tomarctus hippophaga †	AMNH 61156	14.7850 (15.9700-13.6000)	Wang et al. 1999, pp. 157-158
Aelurodon ferox †	AMNH 61757	13.1350 (15.9700-10.3000)	Wang et al. 1999, pp. 182-183
Epicyon haydeni †	LACM 131855	11.9500 (13.6000-10.3000)	Wang et al. 1999, pp. 252-254
Smilodon fatalis †	LACMHC 1360	0.0285 (0.0440-0.0130)	La Brea Tar Pits collection
Hyaenictitherium wongii †	China G L-49	6.6500 (8.0000-5.3000)	Werdelin 1988, p. 259; Werdelin & Solounias 1991, p. 33; Tseng & Wang 2007, p. 708 (Table 2)
Canis dirus †	LACMHC 2300-4	0.0285 (0.0440-0.0130)	La Brea Tar Pits collection
Ursus americanus americanus (O)	FMNH 106356	0	-
Ailurus fulgens (O)	FMNH 60762	0	-
Nandinia binotata (O)	FMNH 149362	0	-
Paradoxurus hermaphroditus phillipinensis (O)	FMNH 33548	0	-
Cuon alpinus primaevus	FMNH 38515	0	-
Speothos venaticus	FMNH 87861	0	-
Canis lupus lycaon	FMNH 153800	0	-
Cerdocyon thous aquilis	FMNH 68889	0	-
Otocyon megalotis	AMNH 179143	0	-
Vulpes vulpes pusilla	FMNH 112415	0	-

_{^aThe first nine species are extinct
species (indicated by a dagger) and the next ten are extant species.
Those with the label “(O)” are outgroups.}
_{^bMid-point age calculated from the
maximum and minimum ages of the voucher specimen according to the
formation from which it was retrieved. See column with header
“Reference” for the literature where the corresponding specimen and the
formation from where it was collected are described.}
_{^cAge reference corresponding only to
the fossil specimens (extinct species). This can refer to either a
paper, book chapter, or the database for the museum collection.}

Landmark points

The 29 3D-landmarks collected from the specimen skulls are:

Landmark #	Landmark description
1	Basioccipital-Basisphenoid-Bulla suture - left
2	Basioccipital-Basisphenoid-Bulla suture - right
3	Palatine - Maxilla - ventral suture
4	Jugal - Squamosal ventral suture - left
5	Jugal - Squamosal ventral suture - right
6	Bulla - anterior extreme - left
7	Bulla - anterior extreme - right
8	Bulla - posterior lateral extreme - left
9	Bulla - posterior lateral extreme - right
10	Premaxilla - anterior extreme - left
11	Premaxilla - anterior extreme - right
12	Jugal-Maxilla (Orbit crest) suture - left
13	Jugal-Maxilla (Orbit crest) suture - right
14	Jugal-Maxilla (base of zygomatic arch) suture - left
15	Jugal-Maxilla (base of zygomatic arch) suture - right
16	Nasals - Frontal suture
17	Anterior lateral M1 - left
18	Posterior lateral M2 - left
19	Anterior lateral M1 - right
20	Posterior lateral M2 - right
21	Canine - mesial extreme - left
22	Canine - mesial extreme - right
23	Postorbital process tip - left
24	Postorbital process tip - right
25	Paraoccipital process tip - left
26	Paraoccipital process tip - right
27	Parietals - Occipital suture
28	Occipital condyle - extreme - left
29	Occipital condyle - extreme - right

The next figure shows a 2D projection of the landmarks. The crosses in orange are the landmarks from the 21 foxes, while the grey crosses are the landmarks from the rest of the 18 carnivoran specimens. The black dots correspond to the mean of the landmark coordinates across the 19 carnivoran species:

Raw data

The landmarks can be collected in 2D (x and y coordinates per landmark) or 3D (x, y, and z coordinates per landmark). In this tutorial, we are using 29 3D landmarks.

A. Carnivores data set

Load the raw data. The raw landmark data can be found in the inst/extdata directory. The command below will find and load the data into R:

filepath <- system.file( "extdata", "carnivores19x29.tsv", package = "mcmc3r")
carnivores19x29.raw <- read.table( file = filepath, header = F, skip = 1, dec = ".", sep = "\t" )
dim( carnivores19x29.raw )
#> [1] 19 89

For example, the first nine columns of carnivores19x29.raw are:

V1	V2	V3	V4	V5	V6	V7	V8	V9
Ael_sp.	AMNH 61757	309.8820	93.05910	57.41040	289.3455	104.76550	63.93170	309.0355
Can_dir	LACMHC 2300-4	-94.2938	357.87210	74.29750	-111.4080	340.83530	72.33680	-117.6451
Epi_hay	LACM 131855	-319.5504	82.89650	54.26130	-306.3990	60.05450	55.53630	-289.5598
Hes_sp.	NMNH 459576	-147.1955	291.72010	71.74370	-146.3978	284.49670	68.12210	-161.7725
Par_jos	YPM 12702	317.0446	117.29873	61.93539	305.3596	122.77050	58.60500	298.5644
Tom_sp.	AMNH 61156	315.3884	124.39520	68.02090	303.1740	133.42730	66.47120	296.2589
Enh_pah	AMNH 27579	318.6872	157.65050	63.18620	309.0530	164.83810	60.67490	307.8123
Cuo_alp	FMNH38515	349.1540	13.71310	50.71620	340.1889	26.84560	51.42100	351.9942
Spe_ven	FMNH87861	318.6051	110.69920	38.03950	309.8225	119.94330	37.33400	318.6282
Can_lup	FMNH153800	295.1609	144.04080	71.96250	284.2220	158.36160	69.81900	277.9851
Cer_tho	FMNH 68889	327.7413	75.74673	40.68660	321.7587	87.13897	39.61633	341.3116
Oto_meg	AMNH179143	312.0606	-112.93650	38.22430	309.2825	-102.56840	37.99480	312.8086
Vul_vul	FMNH112415	315.2145	63.37430	36.16660	310.0419	70.60290	35.26020	318.9246
Urs_ame	FMNH106356	356.5741	102.59340	63.70490	335.3378	126.49580	62.71650	342.2405
Ail_ful	FMNH 60762	302.7209	95.12060	31.54690	291.9594	103.64560	30.86280	280.6053
Nan_bin	FMNH 149362	248.0835	204.40140	31.39870	239.4642	210.36130	30.69630	235.8965
Par_her	FMNH 33548	252.2580	209.18220	33.05070	241.9314	214.56230	31.25390	242.3726
Hia_won	China G L-49	151.3266	257.70470	53.04390	136.5949	258.34620	55.67750	150.1895
Smi_fat	LACMHC 1360	419.6118	-32.90320	104.66530	408.9502	-14.39180	101.84950	393.0468

The full table has 19 rows, one for each of the 19 specimens, and 89 columns, 2 columns with information about the specimens and 87 columns with the coordinates collected for the 29 3D-landmarks, i.e. 29 × 3 = 87 morphological traits.

Note that this new object carnivores19x29.raw is not a matrix: it is a data.frame. We need an object of class matrix in order to perform a Procrustes analysis. Therefore, we only keep the columns with the morphological characters (columns 3-89). Furthermore, we use the first column, which contains the names of the specimens, as the names for the rows in the new matrix. This step is very important, as this is the only way we will be able to identify the landmarks with the corresponding specimen they were collected from.

# 1. Convert raw data into class matrix and add row names
C.mat              <- as.matrix( carnivores19x29.raw[ , 3:dim( carnivores19x29.raw )[2] ] )
row.names( C.mat ) <- carnivores19x29.raw[ ,1 ]

B. Foxes data set

Load the raw data. As mentioned before, you can find the raw data in the inst/extdata directory.

filepath <- system.file( "extdata", "vulpes21x29.tsv", package = "mcmc3r")
vulpes21x29.raw <- read.table( file = filepath, header = F, skip = 1, dec = ".", sep = "\t" )
dim( vulpes21x29.raw )

The first nine columns of vulpes21x29.raw are:
The full table has 21 rows, one for each of the 21 foxes specimens, and 88 columns, 1 column with the names of the specimens and 87 columns with the coordinates collected for the 29 3D-landmarks, i.e. 29 × 3 = 87.

As before, we need to convert the foxes data frame into a matrix, paying attention to preserving the species names and row names:

V.mat <- as.matrix( vulpes21x29.raw[ , 2:dim( vulpes21x29.raw )[2] ] )
row.names( V.mat ) <- vulpes21x29.raw[ , 1 ]

Now, we have two objects of class matrix with the two processed data sets: C.mat, with 29 landmarks for 19 carnivore specimens, and V.mat, with 29 landmarks for 21 fox specimens. We can now continue with the Procrustes alignment!

2. Procrustes alignment

We will use the R package Morpho to perform the Procrustes alignment. We choose this package because it allows alignment of the foxes landmarks to the mean Procrustes shape of the carnivores. This ensures that the morphological alignment with the foxes, which will be later used to correct for population variance, is oriented and aligned to the morpho space of the rest of the carnivores specimens.

Use the mcmc3r::matrix2array function to convert the matrix objects previously generated into arrays. The array format is required to perform a Procrustes alignment with the package Morpho.

C.arr <- mcmc3r::matrix2array( X = C.mat, coords = 3 )
V.arr <- mcmc3r::matrix2array( X = V.mat, coords = 3 )

Use Morpho::procSym to perform a Procrustes alignment with C.arr. This function aligns all carnivoran specimens, resulting in the morphological alignment saved in object C.PS (19 × 87). Note we have symmetrical landmarks, that is, landmarks collected from the left and right sides of the skull, and thus we indicate these as “paired landmarks” during Procrustes alignment:

# Get right and left vectors with corresponding symmetric landmarks
right <- c( 11, 22, 13, 19, 15, 20, 24, 5, 7, 2, 9, 26, 29 )
left  <- c( 10, 21, 12, 17, 14, 18, 23, 4, 6, 1, 8, 25, 28 )
pairedLM <- cbind( left, right )
# Run Morpho::procSym
C.PS  <- Morpho::procSym( dataarray = C.arr,
                        pairedLM = pairedLM )

We had previously put in an array the 21 foxes specimens and they are in object V.arr. So now we remove from V.arr the rows with the common fox in the objects C.arr and V.arr. This is done to avoid having repeated data, as C.PS contains an alignment with the same fox in both data sets. The common fox is the first specimen, therefore this is the specimen we remove from the array.

V.arr.nov1 <- V.arr[,,-1]

Use the object of class symproc output by Morpho::procSym, C.PS, in the Morpho::align2procSym function. Specifically, this function will use the mean shape of alignment C.PS as the morpho space to which the landmarks of the other 20 foxes will be aligned, resulting in alignment V.PS.nov1 (20 × 87).

V.PS.nov1             <- Morpho::align2procSym( x = C.PS, newdata = V.arr.nov1 )
# Add the species names as row names in V.PS.nov1
dimnames( V.PS.nov1 ) <- dimnames( V.arr.nov1 )

Get the vulpes (Vulpes_1) from C.PS, in position 13, and add it to V.PS. This includes the vulpes in the previous alignment.

V.PS              <- array( dim = c( 29, 3, 21 ) )
dimnames( V.PS )  <- list( paste( "lmk", seq( 1:29 ), sep="" ),
                                  c( "x", "y", "z" ),
                                  c( "Vulpes_1",
                                     dimnames( V.PS.nov1 )[[3]] ) )
V.PS[,,1]    <- C.PS$rotated[,,13] # Vulpes_1
V.PS[,,2:21] <- V.PS.nov1

We have finished the Procrustes analysis! Now we convert the aligned landmark arrays back into matrices, as we need to manipulate these matrices to correct for population noise and landmark correlations:

C <- mcmc3r::array2matrix( X = C.PS$rotated, coords = 3 )
V <- mcmc3r::array2matrix( X = V.PS, coords = 3 )

Matrix C is the morphological alignment with the 19 carnivoran specimens, while matrix V contains the fox-only morphological alignment. We can now correct for the population noise and the correlation among characters. We will see how to do this in the next step!

3. Correct `C` for population noise and trait correlation

We will use the fox-only matrix to estimating the population noise (or population variance) and the correlation matrix among landmarks, which is obtained using the shrinkage estimator from package corpcor.

Find population variance using object V.

var.foxes <- diag( cov( V ) )

Use the function corpcor::cor.shrink to estimate the shrinkage correlation matrix, R.sh, using the object V. This function gurantees to find the optimum value of delta to estimate a correlation matrix that can be inverted: R.sh.

# Generate shrinkage correlation matrix, R.sh
R.sh <- corpcor::cor.shrink( V )
# Convert R.sh into class matrix
R.sh <- cbind( R.sh )

These two objects will be later used in the mcmc3r::write.morpho function to correct for population noise and trait correlation.

4. Bayesian inference of divergence times with MCMCtree

We will proceed with Bayesian MCMC sampling of divergence times, which requires using a program external to R, MCMCtree. First, we use function mcmc3r::write.morpho to write the Procrustes alignment into a phylip file suitable for analysis with MCMCtree. The function will take the estimates of the population noise and character correlations and apply these to the Procustes alignment to generate a new transformed alignment with independent columns and population variances normalised to one (see original paper for details). The parameters we are going to pass to this function are:

M
This is the morphological alignment, object C, we have previously generated. Therefore, we would write M = C when setting this parameter in the function.
filename
This is the name that will be given to the output file with the morphological alignment. We have chosen seqfile.aln, but you can choose any name.
c
This is the argument that takes the population noise, The object passed can be of length 1, if all characters have the same variance, or a vector with the same length than the number of characters (each character has a specific variance). In this tutorial, we are using the latter, as we are passing the object var.foxes with the population variance for each character previously calculated.
R
The object passed to this parameter has to be a symmetric and positive definite matrix, thus allowing to calculate its inverse. Otherwise, the likelihood calculation cannot take place because it needs to use the inverse of the correlation matrix. We are using R.sh, the estimated shrinkage correlation matrix calculated in the previous step.
method
If a correlation matrix is passed to parameter R, this function can use either the method = "chol"(Cholesky decomposition) or method = "eigen" (eigen decomposition) to obtain a matrix A such that R⁻¹ = A^TA. This matrix A^T is later used to transform the morphological data, M, to account for the correlation in this data set, so that the transformed characters in Z, Z = MA^T, are independent.
names
Vector or list with the specimens names. In this tutorial, we use a character vector.
ages
Vector or list with the ages of the specimens. Make sure each age corresponds to the right specimen as it should be in the same order than in vector names. In this tutorial, we use a numeric vector. Note extant species should have age 0 while extinct species should have ages > 0.

Before running the commands below, use the setwd command in R to set your working directory to the carnivores/ directory you created earlier.

# setwd("~/carnivores/") # use the right directory path
# Create a vector (although it can also be a list) with the specimens names
names <- c( "Ael_sp.", "Can_dir", "Epi_hay", "Hes_sp.",
            "Par_jos", "Tom_sp.", "Enh_pah", "Cuo_alp",
            "Spe_ven", "Can_lup", "Cer_tho", "Oto_meg",
            "Urs_ame", "Ail_ful", "Nan_bin", "Par_her",
            "Hia_won", "Smi_fat", "Vul_vul"           )
# Create a vector (although it can also be a list) with the ages of the specimens.
# Make sure each age corresponds to the right specimen as it should be in the same 
# order than in vector `names`
ages  <- list( sp1  = 13.135, sp2  =  0.0285, sp3  = 11.95,  sp4  = 35.55,
               sp5  = 25.615, sp6  = 14.785,  sp7  = 28.55,  sp8  =  0,
               sp9  =  0,     sp10 =  0,      sp11 =  0,     sp12 =  0,
               sp13 =  0,     sp14 =  0,      sp15 =  0,     sp16 =  0,
               sp17 =  6.65,  sp18 =  0.0285, sp19 =  0 )

# Run write.morpho to correct for population noise and trait correlation and write the
# morphological alignment in MCMCtree format
mcmc3r::write.morpho( M = C, filename = "seqfile.aln", c = var.foxes , R = R.sh, method = "chol", names = names, ages = ages )

You can find more information and other examples in ?mcmc3r::write.morpho. A bit of file seqfile.aln, which follows the PHYLIP format, is shown below:

  19  87  M  1  -17.4449361406561

Ael_sp.^22.425      32.1888806478969 13.2166043633248 18.0769785490827  ...
Can_dir^35.5315     28.6903520919794 11.7957141425012 10.6673624273699  ...
Epi_hay^23.61       30.7230881245987 13.8673453218744 3.30535244654374  ...
Hes_sp.^0.01        29.6257424752038 4.39441803218726 12.7003750739771  ...
Par_jos^9.945       26.8972062403083 16.3244194778594 9.66291130048758  ...
Tom_sp.^20.775      27.5960709958697 13.2198776170147 6.19263005366892  ...
Enh_pah^7.01        29.1449154123494 8.81885279314611 13.2404534560068  ...
Cuo_alp^35.56       25.9516979794982 12.9776718002656 11.2662021565935  ...
...

The first line has the number of species (19), the number of characters (87), a string indicating this is a morphological alignment (M), the standardised population noise (1) and the log-determinant of the correlation matrix (-17.44). This is then followed by the species names and transformed landmark coordinates. Note the species ages have been recoded. Do not worry about this.

The next step is to merge the morphological alignment (which contains extant and extinct species) with a molecular alignment (that only contains extant species). We have already prepared such combined file, which you should have already downloaded from GitHub. Go to the carnivores/ directory you created to store the data. There you will see file morph_mol.aln, containing the morphological and molecular alignments as two separate partitions. You can use a text editor to open the file. The top block is the morphological alignment, like the one shown above (although you may see some small numerical differences due to recent updates to the Procrustes algorithm). The bottom block is the molecular alignment of 3rd positions for mitochondrial genes.

File mcmctree.ctl contains the instructions required by MCMCtree to carry out MCMC sampling of divergence times. You can open this file with a text editor:

          seed = -1
       seqfile = morph_mol.aln
      treefile = carnivores.tree
      mcmcfile = mcmc.txt
       outfile = out.txt

         ndata = 2
       seqtype = 0    * 0: nucleotides; 1:codons; 2:AAs
       usedata = 1    * 0: no data; 1:seq like; 2:use in.BV; 3: out.BV
         clock = 2    * 1: global clock; 2: independent rates; 3: correlated rates
       RootAge = 'B(37.3, 66.0, 0.025, 0.025)'
       TipDate = 1 1  * TipDate (1) & time unit

         model = 0    * 0:JC69, 1:K80, 2:F81, 3:F84, 4:HKY85
         alpha = 0.5  * alpha for gamma rates at sites
         ncatG = 5    * No. categories in discrete gamma

     cleandata = 0    * remove sites with ambiguity data (1:yes, 0:no)?

       BDparas = 1 1 0 0.001 * birth, death, sampling
   alpha_gamma = 1 1         * gamma prior for alpha

   rgene_gamma = 2 10  * gamma prior for overall rates for genes
  sigma2_gamma = 2 2   * gamma prior for sigma^2 (for clock=2 or 3)

         print = 2
        burnin = 500
      sampfreq = 1 
       nsample = 8000

The first line, seed = -1 tells the program to use a random seed to initialise the stochastic sampler. Then we have the names of the data files required by the program: the alignment, morph_mol.aln, and the phylogeny, carnivores.tree. Then the names of the output files: the raw mcmc output mcmc.txt, and the processed output out.txt. The next line, ndata = 2 tells the program our alignment has two partitions. Then seqtype = 0 which indicates DNA alignments for the non-morpho partitions, and usedata = 1 which tells the program to use exact likelihood calculation (you can see a full description of all options in the MCMCtree manual). Then clock = 2 tells the program to use the independent-log normal relaxed-clock model. Line Root age is a fossil calibration on the root (always required with MCMCtree). The next line, TipDate = 1 1 activates tip dating (i.e. our fossil taxa are dated tips), with a time unit of 1 My. Then model = 0 tells MCMCtree to use the JC69 model for the moleculr partition. We choose this model here because it is very fast to compute. In the original paper, we used HKY. We then have the prior for alpha, for the discrete gamma model, alpha_gamma = 1 1. Then we have the priors for the relaxed clock model, rgene_gamma = 2 10, and sigma2_gamma = 2 2. The last set of parameters control the amount of burn-in, burnin = 500, how often to collect samples from the MCMC, sampfreq = 1, and the total number of samples to collect, nsample = 8000.

Using the command line, go to your carnivores/ directory and type the following command:

mcmctree mcmctree.ctl

Then, please press enter to execute MCMCtree. The program will then start MCMC sampling of divergences times using the morphological and molecular data. If the command above fails, either you do not have MCMCtree installed, you have a different alias to execute the program, or your MCMCtree executable is not in your system’s search path. It should take around 8 minutes for the analysis to complete depending on the computer.

5. Visualise and validate divergence dating results

Once the MCMC sampling completes, you will see several files have been created. File out.txt contains dated trees and summary statistics of the MCMC. Open this file with a text editor and examine it. You should find the block with the dated trees, which like this:

Species tree for FigTree.  Branch lengths = posterior mean times; 95% CIs = labels
(((((((((1_Can_lup^35.56, 2_Can_dir^35.5315) 28 , 3_Cuo_alp^35.56) 27 , 4_Cer_tho^35.56) ...

(((((((((Can_lup^35.56: 7.399885, Can_dir^35.5315: 7.371385): 3.383741, Cuo_alp^35.56: ... 

(((((((((Can_lup^35.56 #1.000000: 7.399885, Can_dir^35.5315: 7.371385) [&95%HPD={0.711672, ...

rategram locus 1:
(((((((((Can_lup^35.56: 0.982196, Can_dir^35.5315: 0.893678): 0.254008, Cuo_alp^35.56: ...

rategram locus 2:
(((((((((Can_lup^35.56: 0.009285, Can_dir^35.5315: 0.009505): 0.016383, Cuo_alp^35.56: ...

The first tree has nodes labelled with numbers. Tips are labelled from 1 to 19 (as we have 19 species), and internal nodes are labelled from 20 (the root) to 37. Next, we have the timetree without the 95% CI, and next the same timetree but with the 95% CI of node ages, suitable for plotting with FigTree. The next two trees (rategrams) have branch lengths equal to the evolutionary rate for the branch, the first with the morphological rates for the landmark partition, and the second with the molecular rates for the molecular alignment.

If you scroll towards the bottom of the file, you will see a summary of the node ages (e.g., t_n20), rate parameters (mu1, mu2, sigma1, sigma2), and branch rates (e.g., r_g1_n1). For example, the posterior mean age of the root and its 95% credibility intervals (equal-tail and HPD) are:

t_n20         49.5139 (41.5294, 61.8787) (40.7484, 60.0179)

Note the values in your out.txt file may look different as MCMC uses random sampling to estimate parameters in the posterior distribution.

File mcmc.txt, which contains the raw MCMC output, can be loaded into R (remember you must have your working directory set to carnivores/):

# Modify the path below to the right directory in your file system
# setwd("~/carnivores/")
# Load the MCMC file
mcmc <- read.table( file = "mcmc.txt", header = TRUE )
# Create a density plot for the root's age...
plot( density( mcmc$t_n20, adj = .2 ), xlim = c(30, 80),
      main = "Root age", xlab = "Time (Ma)", las = 1 )
# ... and we can compare the posterior to the prior calibration,
# 'B(37.3, 66.0, 0.025, 0.025)':
curve( mcmc3r::dB( x = x, tL = 37.3, tU = 66, pL = .025, pU = .025),
       add = TRUE, lty = 2 )

File mcmc.txt is suitable for performing diagnostics for convergence of the MCMC. For example, if you have the coda package installed, you can check for the effective sample sizes (ESS) of the parameters estimated. In our case many ESS will be small as we ran a rather short MCMC:

coda::effectiveSize( x = mcmc[,-1] )

Alternatively, you can load mcmc.txt into Tracer (from the Beast MCMC software family) and perform convergence diagnostics there. We can also use the information within mcmc.txt to make a densitree plot in R:

# Read tree used by MCMCtree for sampling:
tt <- ape::read.tree( file = "carnivores.tree" )
# Recode the tip ages for plotting:
tips_info <- strsplit(tt$tip.label, "\\^")
back.ages <- as.numeric(unlist(tips_info)[seq(from=2, to=2*19, by=2)])
back.ages <- max( back.ages ) - back.ages
# Plot the densitree (note this command may be slow):
mcmc3r::mcmc2densitree( tree = tt, mcmc = mcmc, time.name = "t_", 
                        thin = 0.025, alpha = 0.1, tip.ages = back.ages,
                        cex = .8, pfrac = .2 )
title( xlab = "Million years ago" )

MCMCtree also generated a FigTree.tre file suitable for plotting the timetree with FigTree. Note there is currently a bug in MCMCtree which adds a #1.000000 or #2.000000 within the tree in the FigTree.tre file. These tags must be removed, using a text editor, before loading the file into FigTree.

Finally, we can use the MCMC sample from MCMCtree to generate ancestral reconstructions of the landmarks for all nodes in the phylogeny.

# First, we need to make sure the taxa names in object `C` match exactly
# those in the tree (which are slightly different as they contain the taxa ages)
i <- pmatch( rownames( C ), tt$tip.label )
rownames( C ) <- tt$tip.label[i]
recM <- mcmc2anc( tree = tt, M = C, mcmc = mcmc, time.name = "t_",
                  rate.name = "r_g1_", tip.ages = back.ages)

x <- seq( from = 1, to = 87, by = 3 )
y <- seq( from = 2, to = 87, by = 3 )
# Plot landmarks for the 19 carnivores:
plot( x = C[,x], y = C[,y], pch = '+', cex = .5,
      xlab = "X coords", ylab = " Y coords" )

# Plot ancestral reconstruction at the root (node 20):
points( x = recM["20",x], y = recM["20",y], pch = 19, col = "red" )

# Calculate and plot mean shape
mS <- apply( X = C, MARGIN = 2, FUN = mean )
points( x = mS[x], y = mS[y], pch = 19, col = "blue" )

The reconstruction is stored as a matrix object. We can convert it to an array for compatibility with morphometrics software. You can also use array objects to generate 3D plots with the rgl package.

recA <- mcmc3r::matrix2array( X = recM, coords = 3 )
options( rgl.printRglwidget = TRUE )
rgl::plot3d( recA[,,"20"], ty = 's', size = 2, col = "red", aspect = FALSE )

We hope this tutorial has been useful and you have managed to reproduce the results from the paper!

EXERCISES

Now, some additional exercises you can play with:

Exercise 1:
Because the MCMCtree is stochastic, there is no guarantee that the MCMC sample has converged to the posterior distribution. Rename files mcmc.txt and out.txt as mcmc1.txt and out2.txt respectively. Then run mcmctree again. Rename the newly generated files as mcmc2.txt and out2.txt. Now compare the two sets of files. For example, you can look at the posterior means and 95% CIs at the end of the out1.txt and out2.txt files, or you can plot the densities of the node age samples in R. Do the results look similar? If not, then the MCMC samples have not converged.

Exercise 2:
The MCMC above was rather short, and it is unlikely that the values converged. Open file mcmctree.ctl and change the following lines as shown here: burnin = 5000, sampfreq = 50, nsample = 20000. Then run MCMCtree again. This new analysis will take much longer, over 15 hours, which is normal in phylogenetic MCMC sampling. If you create two directories in your computer, you can run two analyses in parallel overnight. The next day look at the results in the out.txt files. The results should now be much more similar, indicating convergence.