Package 'EvoPhylo' reference manual

Title:	Pre- And Postprocessing of Morphological Data from Relaxed Clock Bayesian Phylogenetics
Description:	Performs automated morphological character partitioning for phylogenetic analyses and analyze macroevolutionary parameter outputs from clock (time-calibrated) Bayesian inference analyses, following concepts introduced by Simões and Pierce (2021) <doi:10.1038/s41559-021-01532-x>.
Authors:	Tiago Simoes [cre, aut] , Noah Greifer [aut] , Joelle Barido-Sottani [aut] , Stephanie Pierce [aut]
Maintainer:	Tiago Simoes <[email protected]>
License:	GPL (>=2)
Version:	0.3.3
Built:	2025-02-06 03:28:26 UTC
Source:	https://github.com/tiago-simoes/EvoPhylo

A morphological phylogenetic data matrix

Description

An example dataset of morphological characters for early tetrapodomorphs from Simões & Pierce (2021). This type of data would be used as input to get_gower_dist.

Usage

data("characters")data("characters")

Format

A data frame with 178 observations (characters) on 43 columns (taxa).

References

Simões, T. R. and S. E. Pierce (2021). Sustained High Rates of Morphological Evolution During the Rise of Tetrapods. Nature Ecology & Evolution 5: 1403–1414.

Convert clock rate tables from wide to long format

Description

Converts clock rate tables, such as those produced by clockrate_summary and imported back after including clade names, from wide to long format.

Usage

clock_reshape(rate_table)
clock_reshape(rate_table)

Arguments

rate_table

A data frame of clock rates, such as from the output of get_clockrate_table_MrBayes with an extra "clade" column.

Details

This function will convert clock rate tables from wide to long format, with a new column "clock" containing the clock partition from where each rate estimate was obtained as a factor. The long format is necessary for downstream analyses of selection strength (mode), as similarly done by FBD_reshape for posterior parameter log files.

Value

A data frame containing a single "value" column (for all rate values) and one column for the "clock" variable (indicating to which clock partition each rate values refers to)

Examples

# See vignette("rates-selection") for how to use this
# function as part of an analysis pipeline

## The example dataset rate_table_clades_means3
## has clades and 3 clock rate columns:
data("rate_table_clades_means3")

## Reshape a clock rate table with clade names to long format
## Not run: 
rates_by_clade <- clock_reshape(rate_table_clades_means3)

## End(Not run)# See vignette("rates-selection") for how to use this
# function as part of an analysis pipeline

## The example dataset rate_table_clades_means3
## has clades and 3 clock rate columns:
data("rate_table_clades_means3")

## Reshape a clock rate table with clade names to long format
## Not run: 
rates_by_clade <- clock_reshape(rate_table_clades_means3)

## End(Not run)

Plot clock rate distributions

Description

Plots the distribution density of clock rates by clock and clade. The input must have a "clade" column.

Usage

clockrate_dens_plot(rate_table, clock = NULL,
                    stack = FALSE, nrow = 1,
                    scales = "fixed")
clockrate_dens_plot(rate_table, clock = NULL,
                    stack = FALSE, nrow = 1,
                    scales = "fixed")

Arguments

`rate_table`	A data frame of clock rates, such as from the output of `get_clockrate_table_MrBayes` with an extra "clade" column.
`clock`	Which clock rates will be plotted. If unspecified, all clocks are plotted.
`stack`	Whether to display stacked density plots (`TRUE`) or overlapping density plots (`FALSE`).
`nrow`	When plotting rates for more than one clock, how many rows should be filled by the plots. This is passed to `facet_wrap`.
`scales`	When plotting rates for more than one clock, whether the axis scales should be "fixed" (default) across clocks or allowed to vary ("free", "free_x", or "free_y"). This is passed to `facet_wrap`.

Details

The user must manually add clades to the rate table produced by get_clockrate_table_MrBayes before it can be used with this function. This can be doen manually with in R, such as by using a graphical user interface for editing data like the DataEditR package, or by writing the rate table to a spreadsheet and reading it back in after adding the clades. The example below uses a table that has had the clades added.

Value

A ggplot object, which can be modified using ggplot2 functions.

Examples

# See vignette("rates-selection") for how to use this
# function as part of an analysis pipeline

data("RateTable_Means_3p_Clades")

# Overlapping plots
clockrate_dens_plot(RateTable_Means_3p_Clades, stack = FALSE,
                    nrow = 1, scales = "fixed")

# Stacked density for all three clocks, changing the color
# palette to viridis using ggplot2 functions
clockrate_dens_plot(RateTable_Means_3p_Clades,
                    clock = 1:3, nrow = 1, stack = TRUE,
                    scales = "fixed") +
  ggplot2::scale_color_viridis_d() +
  ggplot2::scale_fill_viridis_d()
# See vignette("rates-selection") for how to use this
# function as part of an analysis pipeline

data("RateTable_Means_3p_Clades")

# Overlapping plots
clockrate_dens_plot(RateTable_Means_3p_Clades, stack = FALSE,
                    nrow = 1, scales = "fixed")

# Stacked density for all three clocks, changing the color
# palette to viridis using ggplot2 functions
clockrate_dens_plot(RateTable_Means_3p_Clades,
                    clock = 1:3, nrow = 1, stack = TRUE,
                    scales = "fixed") +
  ggplot2::scale_color_viridis_d() +
  ggplot2::scale_fill_viridis_d()

Plot regression lines between sets of rates

Description

Displays a scatterplot and fits regression line of one set of clock rates against another, optionally displaying their Pearson correlation coefficient (r) and R-squared values (R^2).

Usage

clockrate_reg_plot(rate_table, clock_x, clock_y,
                   method = "lm", show_lm = TRUE,
                   ...)
clockrate_reg_plot(rate_table, clock_x, clock_y,
                   method = "lm", show_lm = TRUE,
                   ...)

Arguments

`rate_table`	A table of clock rates, such as from the output of `get_clockrate_table_MrBayes`.
`clock_x`, `clock_y`	The clock rates that should go on the x- and y-axes, respectively.
`method`	The method (function) used fit the regression of one clock on the other. Check the `method` argument in the to `geom_smooth` function of ggplot2 for all options. Default is `"lm"` for a linear regression model. `"glm"` and `"loess"` are alternative options.
`show_lm`	Whether to display the Pearson correlation coefficient (r) and R-squared values (R^2) between two sets of clock rates.
`...`	Other arguments passed to `geom_smooth`.

Details

clockrate_reg_plot() can only be used when multiple clocks are present in the clock rate table. Unlike clockrate_summary and clockrate_dens_plot, no "clade" column is required.

Value

A ggplot object, which can be modified using ggplot2 functions.

Examples

# See vignette("rates-selection") for how to use this
# function as part of an analysis pipeline

data("RateTable_Means_3p_Clades")

#Plot correlations between clocks 1 and 3
clockrate_reg_plot(RateTable_Means_3p_Clades,
                   clock_x = 1, clock_y = 3)

#Use arguments supplied to geom_smooth():
clockrate_reg_plot(RateTable_Means_3p_Clades,
                   clock_x = 1, clock_y = 3,
                   color = "red", se = FALSE)
# See vignette("rates-selection") for how to use this
# function as part of an analysis pipeline

data("RateTable_Means_3p_Clades")

#Plot correlations between clocks 1 and 3
clockrate_reg_plot(RateTable_Means_3p_Clades,
                   clock_x = 1, clock_y = 3)

#Use arguments supplied to geom_smooth():
clockrate_reg_plot(RateTable_Means_3p_Clades,
                   clock_x = 1, clock_y = 3,
                   color = "red", se = FALSE)

Compute rate summary statistics across clades and clocks

Description

Computes summary statistics for each clade and/or each clock partition. The input must have a "clade" column.

Usage

clockrate_summary(rate_table, file = NULL, digits = 3)
clockrate_summary(rate_table, file = NULL, digits = 3)

Arguments

`rate_table`	A data frame of clock rates, such as from the output of `get_clockrate_table_MrBayes` with an extra `"clade"` column.
`file`	An optional file path where the resulting table will be stored using `write.csv`.
`digits`	The number of digits to round the summary results to. Default is 3. See `round`.

Details

The user must manually add clades to the rate table produced by get_clockrate_table_MrBayes before it can be used with this function. This can be doen manually within R, such as by using a graphical user interface for editing data like the DataEditR package, or by writing the rate table to a spreadsheet and reading it back in after adding the clades. The example below uses a table that has had the clades added.

Value

A data frame containing a row for each clade and each clock with summary statistics (n, mean, standard deviation, minimum, 1st quartile, median, third quartile, maximum).

Examples

# See vignette("rates-selection") for how to use this
# function as part of an analysis pipeline

data("RateTable_Means_3p_Clades")

clockrate_summary(RateTable_Means_3p_Clades)
# See vignette("rates-selection") for how to use this
# function as part of an analysis pipeline

data("RateTable_Means_3p_Clades")

clockrate_summary(RateTable_Means_3p_Clades)

Export character partitions to a Nexus file

Description

Creates and exports a Nexus file with a list of characters and their respective partitions as inferred by the make_clusters function. The contents can be copied and pasted directly into a Mr. Bayes commands block for a partitioned clock Bayesian inference analysis.

Usage

cluster_to_nexus(cluster_df, file = NULL)
cluster_to_nexus(cluster_df, file = NULL)

Arguments

`cluster_df`	A `cluster_df` object; the output of a call to `make_clusters`.
`file`	The path of the text file to be created containing the partitioning information in Nexus format. If `NULL` (the default), no file will be written and the output will be returned as a string. If `""`, the text will be printed to the console. Passed directly to the `file` argument of `cat`.

Value

The text as a string, returned invisibly if file is not NULL. Use cat on the resulting output to format it correctly (i.e., to turn "\n" into line breaks).

Examples

# See vignette("char-part") for how to use this
# function as part of an analysis pipeline

# Load example phylogenetic data matrix
data("characters")

# Create distance matrix
Dmatrix <- get_gower_dist(characters)

# Find optimal partitioning scheme using PAM under k=3
# partitions
cluster_df <- make_clusters(Dmatrix, k = 3)

# Write to Nexus file and export to .txt file:
file <- tempfile(fileext = ".txt")

# You would set, e.g.,
# file <- "path/to/file.txt"

cluster_to_nexus(cluster_df, file = file)
# See vignette("char-part") for how to use this
# function as part of an analysis pipeline

# Load example phylogenetic data matrix
data("characters")

# Create distance matrix
Dmatrix <- get_gower_dist(characters)

# Find optimal partitioning scheme using PAM under k=3
# partitions
cluster_df <- make_clusters(Dmatrix, k = 3)

# Write to Nexus file and export to .txt file:
file <- tempfile(fileext = ".txt")

# You would set, e.g.,
# file <- "path/to/file.txt"

cluster_to_nexus(cluster_df, file = file)

Combine and filter (.p) log files from Mr.Bayes

Description

Imports parameter (.p) log files from Mr. Bayes and combines them into a single data frame. Samples can be dropped from the start of each log file (i.e., discarded as burn-in) and/or downsampled to reduce the size of the output object.

Usage

combine_log(path = ".", burnin = 0.25, downsample = 10000)
combine_log(path = ".", burnin = 0.25, downsample = 10000)

Arguments

`path`	The path to a folder containing (.p) log files or a character vector of log files to be read.
`burnin`	Either the number or a proportion of generations to drop from the beginning of each log file.
`downsample`	Either the number or the proportion of generations the user wants to keep after downsampling for the final (combined) log file. Generations will be dropped in approximately equally-spaced intervals.

Details

combine_log() imports log files produced by Mr.Bayes, ignoring the first row of the file (which contains an ID number). The files are appended together, optionally after removing burn-in generations from the beginning and/or by further filtering throughout the rest of each file. When burnin is greater than 0, the number or propotion of generations corresponding to the supplied value will be dropped from the beginning of each file as it is read in. For example, setting burnin = .25 (the default) will drop the first 25% of generations from each file. When downsample is greater than 0, the file will be downsampled until the number or proportion of generations corresponding to the supplied value is reached. For example, if downsample = 10000 generations (the default) for log files from 4 independent runs (i.e., 4 (.p) files), each log file will be downsampled to 2500 generations, and the final combined data frame will contain 10000 samples, selected in approximately equally spaced intervals from the original data.

The output can be supplied to get_pwt_rates_MrBayes and to FBD_reshape. The latter will convert the log data frame from my wide to long format, which is necessary to be used as input for downstream analyses using FBD_summary, FBD_dens_plot, FBD_normality_plot, FBD_tests1, or FBD_tests2.

Value

A data frame with columns corresponding to the columns in the supplied log files and rows containing the sampled parameter values. Examples of the kind of output produced can be accessed using data("posterior1p") and data("posterior3p").

Examples

# See vignette("fbd-params") for how to use this
# function as part of an analysis pipeline
## Not run: 
posterior <- combine_log("path/to/folder", burnin = .25,
                         downsample = 10000)

## End(Not run)
# See vignette("fbd-params") for how to use this
# function as part of an analysis pipeline
## Not run: 
posterior <- combine_log("path/to/folder", burnin = .25,
                         downsample = 10000)

## End(Not run)

Remove dummy tip from beast summary trees, accounting for metadata on the tips

Description

This method is designed to remove the dummy tip added on offset trees once postprocessing is complete (for instance once the summary tree has been built using TreeAnnotator).

Usage

drop.dummy.beast(
  tree.file,
  output.file = NULL,
  dummy.name = "dummy",
  convert.heights = TRUE
)
drop.dummy.beast(
  tree.file,
  output.file = NULL,
  dummy.name = "dummy",
  convert.heights = TRUE
)

Arguments

`tree.file`	path to file containing the tree with dummy tip
`output.file`	path to file to write converted tree. If `NULL` (default), the tree is simply returned.
`dummy.name`	name of the added dummy tip, default `dummy`.
`convert.heights`	whether height metadata should be converted to height - offset (required to plot e.g. HPD intervals correctly). Default TRUE.

Value

list of tree converted tree (as treedata) ; and offset age of the youngest tip in the final tree

Examples

# Analyze the trees with dummy tips - for instance, calculate the MCC summary tree
# Then remove the dummy tip from the MCC tree
final_tree <- drop.dummy.beast(system.file("extdata", "ex_offset.MCC.tre", package = "EvoPhylo"))

# Analyze the trees with dummy tips - for instance, calculate the MCC summary tree
# Then remove the dummy tip from the MCC tree
final_tree <- drop.dummy.beast(system.file("extdata", "ex_offset.MCC.tre", package = "EvoPhylo"))

Remove dummy tip from Mr. Bayes summary trees, accounting for metadata on the tips

Description

This method is designed to remove the dummy tip added to a dataset before running with Mr. Bayes.

Usage

drop.dummy.mb(
  tree.file,
  output.file = NULL,
  dummy.name = "dummy",
  convert.ages = TRUE
)
drop.dummy.mb(
  tree.file,
  output.file = NULL,
  dummy.name = "dummy",
  convert.ages = TRUE
)

Arguments

`tree.file`	path to file containing the tree with dummy tip
`output.file`	path to file to write converted tree. If `NULL` (default), the tree is simply returned.
`dummy.name`	name of the added dummy tip, default `dummy`.
`convert.ages`	whether height metadata should be converted to height - offset (required to plot e.g. HPD intervals correctly). Default TRUE.

Value

list of tree converted tree (as treedata) ; and offset age of the youngest tip in the final tree

Examples

# Remove the dummy tip from the summary tree
final_tree <- drop.dummy.mb(system.file("extdata", "tree_mb_dummy.tre", package = "EvoPhylo"))

# Remove the dummy tip from the summary tree
final_tree <- drop.dummy.mb(system.file("extdata", "tree_mb_dummy.tre", package = "EvoPhylo"))

Density plots for each FBD parameter

Description

Produces a density or violin plot displaying the distribution of FBD parameter samples by time bin.

Usage

FBD_dens_plot(posterior, parameter, type = "density",
              stack = FALSE, color = "red")
FBD_dens_plot(posterior, parameter, type = "density",
              stack = FALSE, color = "red")

Arguments

`posterior`	A data frame of posterior parameter estimates containing a single "Time_bin" column and one column for each FBD parameter value. Such data frame can be imported using `combine_log` followed by `FBD_reshape`.
`parameter`	A string containing the name of an FBD parameter in the data frame; abbreviations allowed.
`type`	The type of plot; either `"density"` for a density plot or `"violin"` for violin plots. Abbreviations allowed.
`stack`	When `type = "density"`, whether to produce stacked densities (`TRUE`) or overlapping densities (`FALSE`, the default). Ignored otherwise.
`color`	When `type = "violin"`, the color of the plotted densities.

Details

Density plots are produced using ggplot2::stat_density, and violin plots are produced using ggplot2::geom_violin. On violin plots, a horizontal line indicates the median (of the density), and the black dot indicates the mean.

Value

A ggplot object, which can be modified using ggplot2 functions.

Note

When setting type = "violin", a warning may appear saying something like "In regularize.values(x, y, ties, missing(ties), na.rm = na.rm) : collapsing to unique 'x' values". This warning can be ignored.

Examples

# See vignette("fbd-params") for how to use this
# function as part of an analysis pipeline

data("posterior3p")

posterior3p_long <- FBD_reshape(posterior3p)

FBD_dens_plot(posterior3p_long, parameter = "net_speciation",
              type = "density", stack = FALSE)

FBD_dens_plot(posterior3p_long, parameter = "net_speciation",
              type = "density", stack = TRUE)

FBD_dens_plot(posterior3p_long, parameter = "net_speciation",
              type = "violin", color = "red")
# See vignette("fbd-params") for how to use this
# function as part of an analysis pipeline

data("posterior3p")

posterior3p_long <- FBD_reshape(posterior3p)

FBD_dens_plot(posterior3p_long, parameter = "net_speciation",
              type = "density", stack = FALSE)

FBD_dens_plot(posterior3p_long, parameter = "net_speciation",
              type = "density", stack = TRUE)

FBD_dens_plot(posterior3p_long, parameter = "net_speciation",
              type = "violin", color = "red")

Inspect FBD parameter distributions visually

Description

Produces plots of the distributions of fossilized birth–death process (FBD) parameters to facilitate the assessment of the assumptions of normality within time bins and homogeneity of variance across time bins.

Usage

FBD_normality_plot(posterior)
FBD_normality_plot(posterior)

Arguments

posterior

A data frame of posterior parameter estimates containing a single "Time_bin" column and one column for each FBD parameter value. Such data frame can be imported using combine_log followed by FBD_reshape.

Details

The plots produced include density plots for each parameter within each time bin (residualized to have a mean of zero), scaled so that the top of the density is at a value of one (in black). Superimposed onto these densitys are the densities of a normal distribution with the same mean and variance (and scaled by the same amount) (in red). Deviations between the normal density in red and the density of the parameters in black indiciate deviations from normality. The standard deviation of each parameter is also displayed for each time bin to facilitate assessing homogenity of variance.

Value

A ggplot object, which can be modified using ggplot2 functions.

Examples

# See vignette("fbd-params") for how to use this
# function as part of an analysis pipeline

data("posterior3p")

posterior3p_long <- FBD_reshape(posterior3p)

FBD_normality_plot(posterior3p_long)
# See vignette("fbd-params") for how to use this
# function as part of an analysis pipeline

data("posterior3p")

posterior3p_long <- FBD_reshape(posterior3p)

FBD_normality_plot(posterior3p_long)

Convert an FBD posterior parameter table from wide to long format

Description

Converts FBD posterior parameter table, such as those imported using combine_log, from wide to long format.

Usage

FBD_reshape(posterior, variables = NULL, log.type = c("MrBayes", "BEAST2"))
FBD_reshape(posterior, variables = NULL, log.type = c("MrBayes", "BEAST2"))

Arguments

`posterior`	Single posterior parameter sample dataset with skyline FBD parameters produced with `combine_log`.
`variables`	Names of FBD rate variables in the log. If NULL (default), will attempt to auto-detect the names and log type.
`log.type`	Name of the software which produced the log (currently supported: MrBayes or BEAST2). Has to be set if `variables` is not NULL.

Details

The posterior parameters log files produced by Bayesian evolutionary analyses using skyline birth-death tree models, including the skyline FBD model, result into two or more estimates for each FBD parameter, one for each time bin. This function will convert a table of parameters with skyline FBD parameters from wide to long format, with one row per generation per time bin and a new column "Time_bin" containing the respective time bins as a factor. The long format is necessary for downstream analyses using FBD_summary, FBD_dens_plot, FBD_normality_plot, FBD_tests1, or FBD_tests2, as similarly done by clock_reshape for clock rate tables.

The format of the log files can either be specified using the variables and log.type or auto-detected by the function. The "posterior" data frame can be obtained by reading in a log file directly (e.g. using the read.table function) or by combining several output log files from Mr. Bayes using combine_log.

Value

A data frame of posterior parameter estimates containing a single "Time_bin" column and one column for each FBD parameter value.

Examples

# See vignette("fbd-params") for how to use this
# function as part of an analysis pipeline

data("posterior3p")

head(posterior3p)

## Reshape FBD table to long format
posterior3p_long <- FBD_reshape(posterior3p)

head(posterior3p_long)
# See vignette("fbd-params") for how to use this
# function as part of an analysis pipeline

data("posterior3p")

head(posterior3p)

## Reshape FBD table to long format
posterior3p_long <- FBD_reshape(posterior3p)

head(posterior3p_long)

Summarize FBD posterior parameter estimates

Description

Produces numerical summaries of each fossilized birth–death process (FBD) posterior parameter by time bin.

Usage

FBD_summary(posterior, file = NULL, digits = 3)
FBD_summary(posterior, file = NULL, digits = 3)

Arguments

`posterior`	A data frame of posterior parameter estimates containing a single "Time_bin" column and one column for each FBD parameter value. Such data frame can be imported using `combine_log` followed by `FBD_reshape`.
`file`	An optional file path where the resulting table will be stored using `write.csv`.
`digits`	The number of digitis to round the summary results to. Default is 3. See `round`.

Value

A data frame with a row for each paramater and time bin, and columns for different summary statistics. These include the number of data points (n) and the mean, standard deviation (sd), minimum value (min), first quartile (Q1), median, third quartile (Q3), and maximum value (max). When file is not NULL, a .csv file containing this data frame will be saved to the filepath specified in file and the output will be returned invisibly.

Examples

# See vignette("fbd-params") for how to use this
# function as part of an analysis pipeline

data("posterior3p")

posterior3p_long <- FBD_reshape(posterior3p)

FBD_summary(posterior3p_long)
# See vignette("fbd-params") for how to use this
# function as part of an analysis pipeline

data("posterior3p")

posterior3p_long <- FBD_reshape(posterior3p)

FBD_summary(posterior3p_long)

Test assumptions of normality and homoscedasticity for FBD posterior parameters

Description

Produces tests of normality (within time bin, ignoring time bin, and pooling within-time bin values) and homoscedasticity (homogeneity of variances) for each fossilized birth–death process (FBD) parameter in the posterior parameter log file.

Usage

FBD_tests1(posterior, downsample = TRUE)
FBD_tests1(posterior, downsample = TRUE)

Arguments

`posterior`	A data frame of posterior parameter estimates containing a single "Time_bin" column and one column for each FBD parameter value. Such data frame can be imported using `combine_log` followed by `FBD_reshape`.
`downsample`	Whether to downsample the observations to ensure Shapiro-Wilk normality tests can be run. If `TRUE`, observations will be dropped so that no more than 5000 observations are used for the tests on the full dataset, as required by `shapiro.test`. They will be dropped in evenly spaced intervals. If `FALSE` and there are more than 5000 observations for any test, that test will not be run.

Details

FBD_tests1() performs several tests on the posterior distributions of parameter values within and across time bins. It produces the Shapiro-Wilk test for normality using shapiro.test and the Bartlett and Fligner tests for homogeneity of variance using bartlett.test and fligner.test, respectively. Note that these tests are likely to be significant even if the observations are approximately normally distributed or have approximately equal variance; therefore, they should be supplemented with visual inspection using FBD_normality_plot.

Value

A list containing the results of the three tests with the following elements:

`shapiro`	A list with an element for each parameter. Each element is a data frame with a row for each time bin and the test statistic and p-value for the Shapiro-Wilk test for normality. In addition, there will be a row for an overall test, combining all observations ignoring time bin, and a test of the residuals, which combines the group-mean-centered observations (equivalent to the residuals in a regression of the parameter on time bin).
`bartlett`	A data frame of the Bartlett test for homogeneity of variance across time bins with a row for each parameter and the test statistic and p-value for the test.
`fligner`	A data frame of the Fligner test for homogeneity of variance across time bins with a row for each parameter and the test statistic and p-value for the test.

Examples

# See vignette("fbd-params") for how to use this
# function as part of an analysis pipeline

data("posterior3p")

posterior3p_long <- FBD_reshape(posterior3p)

FBD_tests1(posterior3p_long)
# See vignette("fbd-params") for how to use this
# function as part of an analysis pipeline

data("posterior3p")

posterior3p_long <- FBD_reshape(posterior3p)

FBD_tests1(posterior3p_long)

Test for differences in FBD parameter values

Description

FBD_tests2() performs t-tests and Mann-Whitney U-tests to compare the average value of fossilized birth–death process (FBD) parameters between time bins.

Usage

FBD_tests2(posterior, p.adjust.method = "fdr")
FBD_tests2(posterior, p.adjust.method = "fdr")

Arguments

`posterior`	A data frame of posterior parameter estimates containing a single "Time_bin" column and one column for each FBD parameter value. Such data frame can be imported using `combine_log` followed by `FBD_reshape`.
`p.adjust.method`	The method use to adjust the p-values for multiple testing. See `p.adjust` for details and options. Default if `"fdr"` for the Benjamini-Hochberg false discovery rate correction.

Details

pairwise.t.test and pairwise.wilcox.test are used to calculate, respectively, the t-test and Mann-Whitney U-tests statistics and p-values. Because the power of these tests depends on the number of posterior samples, it can be helpful to examine the distributions of FBD parameter posteriors using FBD_dens_plot instead of relying heavily on the tests.

Value

A list with an element for each test, each of which contains a list of test results for each parameter. The results are in the form of a data frame containing the sample sizes and unadjusted and adjusted p-values for each comparison.

Examples

# See vignette("fbd-params") for how to use this
# function as part of an analysis pipeline

data("posterior3p")

posterior3p_long <- FBD_reshape(posterior3p)

FBD_tests2(posterior3p_long)
# See vignette("fbd-params") for how to use this
# function as part of an analysis pipeline

data("posterior3p")

posterior3p_long <- FBD_reshape(posterior3p)

FBD_tests2(posterior3p_long)

Extract evolutionary rates from Bayesian clock trees produced by BEAST2

Description

BEAST2 stores the rates for each clock in a separate file. All trees need to be loaded using treeio::read.beast.

Usage

get_clockrate_table_BEAST2(..., summary = "median", drop_dummy = NULL)
get_clockrate_table_BEAST2(..., summary = "median", drop_dummy = NULL)

Arguments

`...`	`treedata` objects containing the summary trees with associated data on the rates for each separate clock.
`summary`	summary metric used for the rates. Currently supported: `"mean"` or `"median"`, default `"median"`.
`drop_dummy`	if not `NULL`, will drop the dummy extant tip with the given label from the BEAST2 summary trees prior to extracting the clock rates (when present). Default is `NULL`.

Value

A data frame with a column containing the node identifier (node) and one column containing the clock rates for each tree provided, in the same order as the trees.

Examples

#Import all clock summary trees produced by BEAST2 from your local directory
## Not run: 
tree_clock1 <- treeio::read.beast("tree_file_clock1.tre")
tree_clock2 <- treeio::read.beast("tree_file_clock2.tre")

## End(Not run)

#Or use the example BEAST2 multiple clock trees that accompany EvoPhylo.
data(tree_clock1)
data(tree_clock2)

# obtain the rate table from BEAST2 trees
rate_table <- get_clockrate_table_BEAST2(tree_clock1, tree_clock2, summary = "mean")

#Import all clock summary trees produced by BEAST2 from your local directory
## Not run: 
tree_clock1 <- treeio::read.beast("tree_file_clock1.tre")
tree_clock2 <- treeio::read.beast("tree_file_clock2.tre")

## End(Not run)

#Or use the example BEAST2 multiple clock trees that accompany EvoPhylo.
data(tree_clock1)
data(tree_clock2)

# obtain the rate table from BEAST2 trees
rate_table <- get_clockrate_table_BEAST2(tree_clock1, tree_clock2, summary = "mean")

Extract evolutionary rates from a Bayesian clock tree produced by Mr. Bayes

Description

Extract evolutionary rate summary statistics for each node from a Bayesian clock summary tree produced by Mr. Bayes and stores them in a data frame.

Usage

get_clockrate_table_MrBayes(tree, summary = "median",
                    drop_dummy = NULL)
get_clockrate_table_MrBayes(tree, summary = "median",
                    drop_dummy = NULL)

Arguments

`tree`	An S4 class object of type `treedata`; a Bayesian clock tree imported using `treeio::read.mrbayes` for Mr. Bayes summary trees.
`summary`	The name of the rate summary. Should be one of `"mean"` or `"median"`.
`drop_dummy`	if not `NULL`, will drop the dummy extant tip with the given label from the Mr. Bayes summary tree prior to extracting the clock rates (when present). Default is `NULL`.

Value

A data frame with a column containing the node identifier (node) and one column for each relaxed clock partition in the tree object containing clock rates.

Examples

# See vignette("rates-selection") for how to use this
# function as part of an analysis pipeline

## Import summary tree with three clock partitions produced by
## Mr. Bayes (.t or .tre files) from your local directory
## Not run: 
tree3p <- treeio::read.mrbayes("Tree3p.t")

## End(Not run)

#Or use the example Mr.Bayes multi-clock tree file (\code{tree3p})
data("tree3p")

# obtain the rate table from MrBayes tree
rate_table <- get_clockrate_table_MrBayes(tree3p)

head(rate_table)
# See vignette("rates-selection") for how to use this
# function as part of an analysis pipeline

## Import summary tree with three clock partitions produced by
## Mr. Bayes (.t or .tre files) from your local directory
## Not run: 
tree3p <- treeio::read.mrbayes("Tree3p.t")

## End(Not run)

#Or use the example Mr.Bayes multi-clock tree file (\code{tree3p})
data("tree3p")

# obtain the rate table from MrBayes tree
rate_table <- get_clockrate_table_MrBayes(tree3p)

head(rate_table)

Compute Gower distances between characters

Description

Computes Gower distance between characters from a phylogenetic data matrix.

Usage

get_gower_dist(x, numeric = FALSE)
get_gower_dist(x, numeric = FALSE)

Arguments

`x`	A phylogenetic data matrix in Nexus (.nex) format, or in any other data frame or matrix format with a column for each character and terminal taxa as rows, which will be read using `ape::read.nexus.data`. The data cannot include polymorphisms.
`numeric`	Whether to treat the values contained in the `x` as numeric or categorical. If `FALSE` (default), features will be considered categorical; if `TRUE`, they will be considered numeric.

Value

The Gower distance matrix.

Author(s)

This function uses code adapted from StatMatch::gower.dist() written by Marcello D'Orazio.

Examples

# See vignette("char-part") for how to use this
# function as part of an analysis pipeline

# Load example phylogenetic data matrix
data("characters")

# Create distance matrix
Dmatrix <- get_gower_dist(characters)

# Reading data matrix as numeric data
Dmatrix <- get_gower_dist(characters, numeric = TRUE)
# See vignette("char-part") for how to use this
# function as part of an analysis pipeline

# Load example phylogenetic data matrix
data("characters")

# Create distance matrix
Dmatrix <- get_gower_dist(characters)

# Reading data matrix as numeric data
Dmatrix <- get_gower_dist(characters, numeric = TRUE)

Conduct pairwise t-tests between node rates and clock base rates from a BEAST2 output.

Description

Produces a data frame containing the results of 1-sample t-tests for the mean of posterior clock rates against each node's absolute clock rate.

Usage

get_pwt_rates_BEAST2(rate_table, posterior)
get_pwt_rates_BEAST2(rate_table, posterior)

Arguments

`rate_table`	A data frame containing a single "value" column (for all rate values) and one column for the "clock" variable (indicating to which clock partition each rate values refers to), such as from the output of `get_clockrate_table_MrBayes` with an extra `clade` column added, and followed by `clock_reshape`.
`posterior`	A data frame of posterior parameter estimates including a "clockrate" column indicating the base of the clock rate estimate for each generation that will be used for pairwise t-tests. Such data frame can be imported using `combine_log` (no need to reshape from wide to long). See the `posterior1p` or `posterior3p` datasets for an examples of how the input file should look.

Details

get_pwt_rates_BEAST2() first transforms relative clock rates to absolute rate values for each node and each clock, by multiplying these by the mean posterior clock rate base value. Then, for each node and clock, a one-sample t-test is performed with the null hypothesis that the mean of the posterior clockrates is equal to that node and clock's absolute clock rate.

Value

A long data frame with one row per node per clock and the following columns:

`clade`	The name of the clade, taken from the "clade" column of `rate_table`
`nodes`	The node number, taken from the "node" column of `rate_table`
`clock`	The clock partition number
`background.rate(mean)`	The absolute background clock rate (mean clock rate for the whole tree) sampled from the posterior log file
`relative.rate(mean)`	The relative mean clock rate per branch, taken from the "rates" columns of `rate_table`
`absolute.rate(mean)`	The absolute mean clock rate per branch; the relative clock rate multiplied by the mean of the posterior clock rates
`p.value`	The p-value of the test comparing the mean ofthe posterior clockrates to each absolute clockrate

Examples

## Not run: 
# See vignette("rates-selection") for how to use this
# function as part of an analysis pipeline

# Load example rate table and posterior data sets
RateTable_Means_Clades <- system.file("extdata", "RateTable_Means_Clades.csv", package = "EvoPhylo")
RateTable_Means_Clades <- read.csv(RateTable_Means_Clades, header = TRUE)

posterior <- system.file("extdata", "Penguins_log.log", package = "EvoPhylo")
posterior <- read.table(posterior, header = TRUE)

get_pwt_rates_BEAST2(RateTable_Means_Clades, posterior)

## End(Not run)## Not run: 
# See vignette("rates-selection") for how to use this
# function as part of an analysis pipeline

# Load example rate table and posterior data sets
RateTable_Means_Clades <- system.file("extdata", "RateTable_Means_Clades.csv", package = "EvoPhylo")
RateTable_Means_Clades <- read.csv(RateTable_Means_Clades, header = TRUE)

posterior <- system.file("extdata", "Penguins_log.log", package = "EvoPhylo")
posterior <- read.table(posterior, header = TRUE)

get_pwt_rates_BEAST2(RateTable_Means_Clades, posterior)

## End(Not run)

Conduct pairwise t-tests between node rates and clock base rate from a Mr.Bayes output.

Description

Produces a data frame containing the results of 1-sample t-tests for the mean of posterior clock rates against each node's absolute clock rate.

Usage

get_pwt_rates_MrBayes(rate_table, posterior)
get_pwt_rates_MrBayes(rate_table, posterior)

Arguments

`rate_table`	A data frame containing a single "value" column (for all rate values) and one column for the "clock" variable (indicating to which clock partition each rate values refers to), such as from the output of `get_clockrate_table_MrBayes` with an extra `clade` column added, and followed by `clock_reshape`.
`posterior`	A data frame of posterior parameter estimates including a "clockrate" column indicating the base of the clock rate estimate for each generation that will be used for pairwise t-tests. Such data frame can be imported using `combine_log` (no need to reshape from wide to long). See the `posterior1p` or `posterior3p` datasets for an examples of how the input file should look.

Details

get_pwt_rates_MrBayes() first transforms relative clock rates to absolute rate values for each node and each clock, by multiplying these by the mean posterior clock rate base value. Then, for each node and clock, a one-sample t-test is performed with the null hypothesis that the mean of the posterior clockrates is equal to that node and clock's absolute clock rate.

Value

A long data frame with one row per node per clock and the following columns:

`clade`	The name of the clade, taken from the "clade" column of `rate_table`
`nodes`	The node number, taken from the "node" column of `rate_table`
`clock`	The clock partition number
`relative.rate`	The relative mean clock rate per node, taken from the "rates" columns of `rate_table`
`absolute.rate(mean)`	The absolute mean clock rate per node; the relative clock rate multiplied by the mean of the posterior clock rates
`null`	The absolute clock rate used as the null value in the t-test
`p.value`	The p-value of the test comparing the mean ofthe posterior clockrates to each absolute clockrate

Examples

# See vignette("rates-selection") for how to use this
# function as part of an analysis pipeline

# Load example rate table and posterior data sets
data("RateTable_Means_3p_Clades")
data("posterior3p")

get_pwt_rates_MrBayes(RateTable_Means_3p_Clades, posterior3p)
# See vignette("rates-selection") for how to use this
# function as part of an analysis pipeline

# Load example rate table and posterior data sets
data("RateTable_Means_3p_Clades")
data("posterior3p")

get_pwt_rates_MrBayes(RateTable_Means_3p_Clades, posterior3p)

Calculate silhouette widths index for various numbers of partitions

Description

Computes silhouette widths index for several possible numbers of clusters(partitions) k, which determines how well an object falls within their cluster compared to other clusters. The best number of clusters k is the one with the highest silhouette width.

Usage

get_sil_widths(dist_mat, max.k = 10)

## S3 method for class 'sil_width_df'
plot(x, ...)
get_sil_widths(dist_mat, max.k = 10)

## S3 method for class 'sil_width_df'
plot(x, ...)

Arguments

`dist_mat`	A Gower distance matrix, the output of a call to `get_gower_dist`.
`max.k`	The maximum number of clusters(partitions) to search across.
`x`	A `sil_width_df` object; the output of a call to `get_sil_widths()`.
`...`	Further arguments passed to `ggplot2::geom_line` to control the appearance of the plot.

Details

get_sil_widths calls cluster::pam on the supplied Gower distance matrix with each number of clusters (partitions) up to max.k and stores the average silhouette widths across the clustered characters. When plot = TRUE, a plot of the sillhouette widths against the number of clusters is produced, though this can also be produced seperately on the resulting data frame using plot.sil_width_df(). The number of clusters with the greatest silhouette width should be selected for use in the final clustering specification.

Value

For get_sil_widths(), it produces a data frame, inheriting from class "sil_width_df", with two columns: k is the number of clusters, and sil_width is the silhouette widths for each number of clusters. If plot = TRUE, the output is returned invisibly.

For plot() on a get_sil_widths() object, it produces a ggplot object that can be manipulated using ggplot2 syntax (e.g., to change the theme or labels).

Examples

# See vignette("char-part") for how to use this
# function as part of an analysis pipeline

data("characters")

#Reading example file as categorical data
Dmatrix <- get_gower_dist(characters)

#Get silhouette widths for k=7
sw <- get_sil_widths(Dmatrix, max.k = 7)

sw

plot(sw, color = "red", size =2)
# See vignette("char-part") for how to use this
# function as part of an analysis pipeline

data("characters")

#Reading example file as categorical data
Dmatrix <- get_gower_dist(characters)

#Get silhouette widths for k=7
sw <- get_sil_widths(Dmatrix, max.k = 7)

sw

plot(sw, color = "red", size =2)

Estimate and plot character partitions

Description

Determines cluster (partition) membership for phylogenetic morphological characters from the supplied Gower distance matrix and requested number of clusters using partitioning around medoids (PAM, or K-medoids). For further and independently testing the quality of the chosen partitioning scheme, users may also poduce graphic clustering (tSNEs), coloring data points according to PAM clusters, to verify PAM clustering results.

Usage

make_clusters(dist_mat, k, tsne = FALSE,
              tsne_dim = 2, tsne_theta = 0,
              ...)

## S3 method for class 'cluster_df'
plot(x, seed = NA, nrow = 1,
              ...)
make_clusters(dist_mat, k, tsne = FALSE,
              tsne_dim = 2, tsne_theta = 0,
              ...)

## S3 method for class 'cluster_df'
plot(x, seed = NA, nrow = 1,
              ...)

Arguments

`dist_mat`	A Gower distance matrix, the output of a call to `get_gower_dist`.
`k`	The desired number of clusters (or character partitions), the output from `get_sil_widths`.
`tsne`	Whether to perform Barnes-Hut t-distributed stochastic neighbor embedding (tSNE) to produce a multi-dimensional representation of the distance matrix using `Rtsne::Rtsne`. The number of dimensions is controlled by the `tsne_dim` argument. See Details. Default is `FALSE`.
`tsne_dim`	When `tsne = TRUE`, the number of dimensions for the tSNE multidimensional scaling plots. This is passed to the `dims` argument of `Rtsne::Rtsne`. Default is 2.
`tsne_theta`	When `tsne = TRUE`, a parameter controlling the speed/accuracy trade-off (increase for faster but less accurate results). This is passed to the `theta` argument of `Rtsne::Rtsne`. Default is 0 for exact tSNE.
`...`	For `make_clusters()`, other arguments passed to `Rtsne::Rtsne` when `tsne = TRUE`. For `plot()`, when plotting a `cluster_df` object, other arguments passed to `ggrepel::geom_text_repel` to control display of the observation labels.
`x`	For `plot()`, a `cluster_df` object; the output of a call to `make_clusters()`.
`seed`	For `plot()`, the seed used to control the placement of the labels and the jittering of the points. Jittering only occurs when `tsne = FALSE` in the call to `make_clusters()`. Using a non-`NA` seed ensure replicability across uses.
`nrow`	For `plot()`, when `tsne = TRUE` in the call to `make_clusters()` and `tsne_dim` is greater than 2, the number of rows used to display the resulting 2-dimensional plots. Default is 1 for side-by-side plots.

Details

make_clusters calls cluster::pam on the supplied Gower distance matrix with the specified number of clusters to determine cluster membership for each character. PAM is analogous to K-means, but it has its clusters centered around medoids instead of centered around centroids, which are less prone to the impact from outliers and heterogeneous cluster sizes. PAM also has the advantage over k-means of utilizing Gower distance matrices instead of Euclidean distance matrices only.

When tsne = TRUE, a Barnes-Hut t-distributed stochastic neighbor embedding is used to compute a multi-dimensional embedding of the distance matrix, coloring data points according to the PAM-defined clusters, as estimated by the function make_clusters. This graphic clustering allows users to independently test the quality of the chosen partitioning scheme from PAM, and can help in visualizing the resulting clusters. Rtsne::Rtsne is used to do this. The resulting dimensions will be included in the output; see Value below.

plot() plots all morphological characters in a scatterplot with points colored based on cluster membership. When tsne = TRUE in the call to make_clusters(), the x- and y-axes will correspond to requested tSNE dimensions. With more than 2 dimensions, several plots will be produced, one for each pair of tSNE dimensions. These are displayed together using patchwork::plot_layout. When tsne = FALSE, the points will be arrange horizontally by cluster membership and randomly placed vertically.

Value

A data frame, inheriting from class "cluster_df", with a row for each character with its number (character_number) and cluster membership (cluster). When tsne = TRUE, additional columns will be included, one for each requested tSNE dimension, labeled tSNE_Dim1, tSNE_Dim2, etc., containing the values on the dimensions computed using Rtsne().

The pam fit resulting from cluster::pam is returned in the "pam.fit" attribute of the outut object.

Note

When using plot() on a cluster_df object, warnings may appear from ggrepel saying something along the lines of "unlabeled data points (too many overlaps). Consider increasing max.overlaps". See ggrepel::geom_text_repel for details; the max.overlaps argument can be supplied to plot() to increase the maximum number of element overlap in the plot. Alternatively, users can increase the size of the plot when exporting it, as it will increase the plot area and reduce the number of elements overlap. This warning can generally be ignored, though.

Examples

# See vignette("char-part") for how to use this
# function as part of an analysis pipeline

data("characters")

# Reading example file as categorical data
Dmatrix <- get_gower_dist(characters)

sil_widths <- get_sil_widths(Dmatrix, max.k = 7)

sil_widths
# 3 clusters yields the highest silhouette width

# Create clusters with PAM under k=3 partitions
cluster_df <- make_clusters(Dmatrix, k = 3)

# Simple plot of clusters
plot(cluster_df, seed = 12345)

# Create clusters with PAM under k=3 partitions and perform
# tSNE (3 dimensions; default is 2)
cluster_df_tsne <- make_clusters(Dmatrix, k = 3, tsne = TRUE,
                                 tsne_dim = 2)

# Plot clusters, plots divided into 2 rows, and increasing
# overlap of text labels (default = 10)
plot(cluster_df_tsne, nrow = 2, max.overlaps = 20)
# See vignette("char-part") for how to use this
# function as part of an analysis pipeline

data("characters")

# Reading example file as categorical data
Dmatrix <- get_gower_dist(characters)

sil_widths <- get_sil_widths(Dmatrix, max.k = 7)

sil_widths
# 3 clusters yields the highest silhouette width

# Create clusters with PAM under k=3 partitions
cluster_df <- make_clusters(Dmatrix, k = 3)

# Simple plot of clusters
plot(cluster_df, seed = 12345)

# Create clusters with PAM under k=3 partitions and perform
# tSNE (3 dimensions; default is 2)
cluster_df_tsne <- make_clusters(Dmatrix, k = 3, tsne = TRUE,
                                 tsne_dim = 2)

# Plot clusters, plots divided into 2 rows, and increasing
# overlap of text labels (default = 10)
plot(cluster_df_tsne, nrow = 2, max.overlaps = 20)

Convert trees produced by a BEAST2 FBD analysis with offset to trees with correct ages.

Description

This method adds a dummy tip at the present (t = 0) to fully extinct trees with offsets, in order to have correct ages (otherwise the most recent tip is assumed to be at 0). This is a workaround to get the proper ages of the trees into other tools such as TreeAnnotator.

Usage

offset.to.dummy(trees.file, log.file, output.file = NULL, dummy.name = "dummy")
offset.to.dummy(trees.file, log.file, output.file = NULL, dummy.name = "dummy")

Arguments

`trees.file`	path to BEAST2 output file containing posterior trees
`log.file`	path to BEAST2 trace log file containing offset values
`output.file`	path to file to write converted trees. If `NULL` (default), trees are simply returned.
`dummy.name`	name of the added dummy tip, default `dummy`.

Details

NB: Any metadata present on the tips will be discarded. If you want to keep metadata (such as clock rate values), use offset.to.dummy.metadata instead.

Value

list of converted trees (as treedata)

Examples

# Convert trees with offset to trees with dummy tip
trees_file <- system.file("extdata", "ex_offset.trees", package = "EvoPhylo")
log_file <- system.file("extdata", "ex_offset.log", package = "EvoPhylo")
converted_trees <- offset.to.dummy.metadata(trees_file, log_file)

# Do something with the converted trees - for instance, calculate the MCC summary tree
# Then remove the dummy tip from the MCC tree
final_tree <- drop.dummy.beast(system.file("extdata", "ex_offset.MCC.tre", package = "EvoPhylo"))

# Convert trees with offset to trees with dummy tip
trees_file <- system.file("extdata", "ex_offset.trees", package = "EvoPhylo")
log_file <- system.file("extdata", "ex_offset.log", package = "EvoPhylo")
converted_trees <- offset.to.dummy.metadata(trees_file, log_file)

# Do something with the converted trees - for instance, calculate the MCC summary tree
# Then remove the dummy tip from the MCC tree
final_tree <- drop.dummy.beast(system.file("extdata", "ex_offset.MCC.tre", package = "EvoPhylo"))

Convert trees produced by a BEAST2 FBD analysis with offset to trees with correct ages, accounting for possible metadata on the tips.

Description

Usage

offset.to.dummy.metadata(
  trees.file,
  log.file,
  output.file = NULL,
  dummy.name = "dummy"
)
offset.to.dummy.metadata(
  trees.file,
  log.file,
  output.file = NULL,
  dummy.name = "dummy"
)

Arguments

`trees.file`	path to BEAST2 output file containing posterior trees
`log.file`	path to BEAST2 trace log file containing offset values
`output.file`	path to file to write converted trees. If `NULL` (default), trees are simply returned.
`dummy.name`	name of the added dummy tip, default `dummy`.

Value

list of converted trees (as treedata)

Examples

# Convert trees with offset to trees with dummy tip
trees_file <- system.file("extdata", "ex_offset.trees", package = "EvoPhylo")
log_file <- system.file("extdata", "ex_offset.log", package = "EvoPhylo")
converted_trees <- offset.to.dummy.metadata(trees_file, log_file)

# Do something with the converted trees - for instance, calculate the MCC summary tree
# Then remove the dummy tip from the MCC tree
final_tree <- drop.dummy.beast(system.file("extdata", "ex_offset.MCC.tre", package = "EvoPhylo"))

# Convert trees with offset to trees with dummy tip
trees_file <- system.file("extdata", "ex_offset.trees", package = "EvoPhylo")
log_file <- system.file("extdata", "ex_offset.log", package = "EvoPhylo")
converted_trees <- offset.to.dummy.metadata(trees_file, log_file)

# Do something with the converted trees - for instance, calculate the MCC summary tree
# Then remove the dummy tip from the MCC tree
final_tree <- drop.dummy.beast(system.file("extdata", "ex_offset.MCC.tre", package = "EvoPhylo"))

Plots distribution of background rates extracted from posterior log files.

Description

Plots The distribution and mean of background rates extracted from the posterior log files from Mr. Bayes or BEAST2, as well as the distribution of background rates if log transformed to test for normality of data distribution.

Usage

plot_back_rates(type = c("MrBayes", "BEAST2"),
                           posterior,
                           clock = 1,
                           trans = c("none", "log", "log10"),
                           size = 12, quantile = 0.95)
plot_back_rates(type = c("MrBayes", "BEAST2"),
                           posterior,
                           clock = 1,
                           trans = c("none", "log", "log10"),
                           size = 12, quantile = 0.95)

Arguments

`type`	Whether to use data output from "Mr.Bayes" or "BEAST2".
`posterior`	A data frame of posterior parameter estimates (log file). From Mr.Bayes, it includes a "clockrate" column indicating the mean (background) clock rate estimate for each generation that will be used for pairwise t-tests. Such data frame can be imported using `combine_log` (no need to reshape from wide to long). See the `posterior1p` or `posterior3p` datasets for an examples of how the input file should look. From BEAST2, it will include at least one "rate<filename>.mean" column indicating the mean (background) clock rate estimate for each generation. If there are "P" unlinked clock partitions in BEAST2, there will be P x "rate<filename>.mean" columns (one for each partition) in the posterior log file.
`clock`	The clock partition number to calculate selection mode. Ignored if only one clock is available.
`trans`	Type of data transformation to perform on background rates extracted from the posterior log file from Mr. Bayes or BEAST2. Options include "none" (if rates are normally distributed), natural log transformation "log", and log of base 10 transformation "log10". The necessity of using data transformation can be tested using the function `plot_back_rates`.
`size`	Font size for title of plot
`quantile`	Upper limit for X axis (passed on to 'xlim') to remove outliers from histogram. The quantile can be any value between "0" and "1", but values equal or above "0.95" provide good results in most cases in which the data distribution is right skewed.

Details

Plots The distribution and mean (red dotted line) of background rates extracted from the posterior log files from Mr. Bayes or BEAST2, as well as the distribution of background rates if log transformed. Background rates should be normally distributed for meeting the assumptions of t-tests and other tests passed on by downstream functions, including get_pwt_rates_MrBayes, get_pwt_rates_BEAST2, and plot_treerates_sgn.

Value

It produces a ggplot object that can be manipulated using ggplot2 syntax (e.g., to change the theme or labels).

Examples

# See vignette("rates-selection") for how to use this
# function as part of an analysis pipeline

## MrBayes example
# Load example tree and posterior

data("posterior3p")

P <- plot_back_rates (type = "MrBayes", posterior3p, clock = 1,
                      trans = "log10", size = 10, quantile = 0.95)
P
# See vignette("rates-selection") for how to use this
# function as part of an analysis pipeline

## MrBayes example
# Load example tree and posterior

data("posterior3p")

P <- plot_back_rates (type = "MrBayes", posterior3p, clock = 1,
                      trans = "log10", size = 10, quantile = 0.95)
P

Plot Bayesian evolutionary tree with rate thresholds for selection mode

Description

Plots the summary Bayesian evolutionary tree with branches, according to user-defined thresholds (in units of standard deviations) used to infer the strength and mode of selection.

Usage

plot_treerates_sgn(type = c("MrBayes", "BEAST2"),
                  tree, posterior,
                  trans = c("none", "log", "log10"),
                  summary = "mean", drop.dummyextant = TRUE,
                  clock = 1, threshold = c("1 SD", "2 SD"),
                  low = "blue", mid = "gray90", high = "red",
                  branch_size = 2, tip_size = 2,
                  xlim = NULL, nbreaks = 10, geo_size = list(2, 3),
                  geo_skip = c("Quaternary", "Holocene", "Late Pleistocene"))
plot_treerates_sgn(type = c("MrBayes", "BEAST2"),
                  tree, posterior,
                  trans = c("none", "log", "log10"),
                  summary = "mean", drop.dummyextant = TRUE,
                  clock = 1, threshold = c("1 SD", "2 SD"),
                  low = "blue", mid = "gray90", high = "red",
                  branch_size = 2, tip_size = 2,
                  xlim = NULL, nbreaks = 10, geo_size = list(2, 3),
                  geo_skip = c("Quaternary", "Holocene", "Late Pleistocene"))

Arguments

`type`	Whether to use data output from "Mr.Bayes" or "BEAST2".
`tree`	A `tidytree` object; the output of a call to `treeio::read.beast`. Summary trees from Mr. Bayes will include branch specific rates for all clock partitions, and the partition to be plotted will be specified using the "clock" argument. On the other hand, BEAST2 will output one separate summary tree file for each clock partition. For the latter, the tree file for the partition of interest should be provided for plotting.
`posterior`	A data frame of posterior parameter estimates (log file). From Mr.Bayes, it includes a "clockrate" column indicating the mean (background) clock rate estimate for each generation that will be used for pairwise t-tests. Such data frame can be imported using `combine_log` (no need to reshape from wide to long). See the `posterior1p` or `posterior3p` datasets for an examples of how the input file should look. From BEAST2, it will include at least one "rate<filename>.mean" column indicating the mean (background) clock rate estimate for each generation. If there are "P" unlinked clock partitions in BEAST2, there will be P x "rate<filename>.mean" columns (one for each partition) in the posterior log file.
`trans`	Type of data transformation to perform on background rates extracted from the posterior log file from Mr. Bayes or BEAST2. Options include "none" (if rates are normally distributed), natural log transformation "log", and log of base 10 transformation "log10". The necessity of using data transformation can be tested using the function `plot_back_rates`.
`summary`	Only when using Mr. Bayes trees. The rate summary stats chosen to calculate selection mode. Only rates "mean" and "median" are allowed. Default is "mean".
`drop.dummyextant`	`logical`; Only when using Mr. Bayes trees. Whether to drop the "Dummyextant" tip (if present) from the tree before plotting the tree. Default is `TRUE`.
`clock`	The clock partition number to calculate selection mode. Ignored if only one clock is available.
`threshold`	A vector of threshold values. Default is to display thresholds of ±1 relative standard deviation (SD) of the relative posterior clock rates. Should be specified as a number of standard deviations (e.g., `"1 SD"`) or the confidence level for a confidence interal around the mean relative posterior clockrate (e.g., `"95%"`). Multiple values are allowed to produce a plot with multiple thresholds. Set to `NULL` to omit thresholds.
`low`, `mid`, `high`	Colors passed to `scale_color_steps2` to control the colors of the branches based on which thresholds are exceeded. When no thresholds are supplied, use `mid` to control the color of the tree.
`branch_size`	The thickness of the lines that form the tree.
`tip_size`	The font size for the tips of the tree.
`xlim`	The x-axis limits. Should be two negative numbers (though the axis labels will be in absolute value, i.e., Ma).
`nbreaks`	The number of interval breaks in the geological timescale.
`geo_size`	The font size for the labels in the geological scale. The first value in `list()` is the font size for geological epochs and the second value is for geological periods. Passed directly to the `size` argument of `deeptime::coord_geo`.
`geo_skip`	A vector of interval names indicating which intervals should not be labeled. Passed directly to the `skip` argument of `deeptime::coord_geo`.

Details

Plots the phylogentic tree contained in tree using ggtree::ggtree. Branches undergoing accelerating evolutionary rates (e.g., >"1 SD", "3 SD", or "5 SD" relative to the background rate) for each morphological clock partition suggest directional (or positive) selection for that morphological partition in that branch of the tree. Branches undergoing decelerating evolutionary rates (e.g., <"1 SD", "3 SD", or "5 SD" relative to the background rate) for each morphological clock partition suggest stabilizing selection for that morphological partition in that branch of the tree. For details on rationale, see Simões & Pierce (2021).

Please double check that the distribution of background rates (mean rates for the tree) sampled from the posterior follow the assumptions of a normal distribution (e.g., check for normality of distribution in Tracer). Otherwise, displayed results may not have a valid interpretation.

Value

A ggtree object, which inherits from ggplot.

References

Simões, T. R. and S. E. Pierce (2021). Sustained High Rates of Morphological Evolution During the Rise of Tetrapods. Nature Ecology & Evolution 5: 1403–1414.

Examples

# See vignette("rates-selection") for how to use this
# function as part of an analysis pipeline

## MrBayes example
# Load example tree and posterior
data("tree3p")
data("posterior3p")

plot_treerates_sgn(
  type = "MrBayes",
  tree3p, posterior3p,          #MrBayes tree file with data for all partitions
  trans = "none",
  summary = "mean",             #MrBayes specific argument
  drop.dummyextant = TRUE,      #MrBayes specific argument
  clock = 1,                           #Show rates for clock partition 1
  threshold = c("1 SD", "3 SD"),       #sets background rate threshold for selection mode
  branch_size = 1.5, tip_size = 3,                          #sets size for tree elements
  xlim = c(-450, -260), nbreaks = 8, geo_size = list(3, 3)) #sets limits and breaks for geoscale

## Not run: 
## BEAST2 example
tree_clock1 <- system.file("extdata", "Penguins_MCC_morpho_part1", package = "EvoPhylo")
tree_clock1 <- treeio::read.beast(tree_clock1)
posterior <- system.file("extdata", "Penguins_log.log", package = "EvoPhylo")
posterior <- read.table(posterior, header = TRUE)

plot_treerates_sgn(
  type = "BEAST2",
  tree_clock1, posterior,                 #BEAST2 tree file with data for partition 1
  trans = "log10",
  clock = 1,                              #Show rates for clock partition 1
  threshold = c("1 SD", "3 SD"),          #sets background rate threshold for selection mode
  branch_size = 1.5, tip_size = 3,                        #sets size for tree elements
  xlim = c(-70, 30), nbreaks = 8, geo_size = list(3, 3))  #sets limits and breaks for geoscale

## End(Not run)# See vignette("rates-selection") for how to use this
# function as part of an analysis pipeline

## MrBayes example
# Load example tree and posterior
data("tree3p")
data("posterior3p")

plot_treerates_sgn(
  type = "MrBayes",
  tree3p, posterior3p,          #MrBayes tree file with data for all partitions
  trans = "none",
  summary = "mean",             #MrBayes specific argument
  drop.dummyextant = TRUE,      #MrBayes specific argument
  clock = 1,                           #Show rates for clock partition 1
  threshold = c("1 SD", "3 SD"),       #sets background rate threshold for selection mode
  branch_size = 1.5, tip_size = 3,                          #sets size for tree elements
  xlim = c(-450, -260), nbreaks = 8, geo_size = list(3, 3)) #sets limits and breaks for geoscale

## Not run: 
## BEAST2 example
tree_clock1 <- system.file("extdata", "Penguins_MCC_morpho_part1", package = "EvoPhylo")
tree_clock1 <- treeio::read.beast(tree_clock1)
posterior <- system.file("extdata", "Penguins_log.log", package = "EvoPhylo")
posterior <- read.table(posterior, header = TRUE)

plot_treerates_sgn(
  type = "BEAST2",
  tree_clock1, posterior,                 #BEAST2 tree file with data for partition 1
  trans = "log10",
  clock = 1,                              #Show rates for clock partition 1
  threshold = c("1 SD", "3 SD"),          #sets background rate threshold for selection mode
  branch_size = 1.5, tip_size = 3,                        #sets size for tree elements
  xlim = c(-70, 30), nbreaks = 8, geo_size = list(3, 3))  #sets limits and breaks for geoscale

## End(Not run)

Multiple phylogenetic clock trees

Description

Multiple clock Bayesian phylogenetic tree, imported as an S4 class object using treeio::read.beast().

Usage

data("post_trees")data("post_trees")

Format

A tidytree object.

Details

Example tree file for function write.beast.treedata.

Posterior parameter samples (single clock)

Description

An example dataset of posterior parameter samples resulting from a clock-based Bayesian inference analysis using the skyline fossilized birth–death process (FBD) tree model with Mr. Bayes after combining all parameter (.p) files into a single data frame with the combine_log function. This particular example was produced by analyzing the data set with a single morphological partition from Simões & Pierce (2021).

Usage

data("posterior1p")data("posterior1p")

Format

A data frame with 4000 observations on several variables estimated for each generation during analysis:

Gen: A numeric vector for the generation number
LnL: A numeric vector for the natural log likelihood of the cold chain
LnPr: A numeric vector for the natural log likelihood of the priors
TH: A numeric vector for the total tree height (sum of all branch durations, as chronological units)
TL: A numeric vector for total tree length (sum of all branch lengths, as accumulated substitutions/changes)
prop_ancfossil: A numeric vector indicating the proportion of fossils recovered as ancestors
sigma: A numeric vector for the standard deviation of the lognormal distribution governing how much rates vary across characters.
net_speciation_1, net_speciation_2, net_speciation_3, net_speciation_4: A numeric vector for net speciation estimates for each time bin
relative_extinction_1, relative_extinction_2, relative_extinction_3, relative_extinction_4: A numeric vector for relative extinction estimates for each time bin
relative_fossilization_1, relative_fossilization_2, relative_fossilization_3, relative_fossilization_4: A numeric vector for relative fossilization estimates for each time bin
tk02var: A numeric vector for the variance on the base of the clock rate
clockrate: A numeric vector for the base of the clock rate

Details

Datasets like this one can be produced from parameter log (.p) files using combine_log. The number of variables depends on parameter set up, but for clock analyses with Mr. Bayes, will typically include the ones above, possibly also including alpha, which contains the shape of the gamma distribution governing how much rates vary across characters. When using the traditional FBD model rather than the skyline FBD model used to produce this dataset, there will be only one column for each of net_speciation, relative_extinction and relative_fossilization. When using more than one morphological partition, different columns may be present; see posterior3p for an example with 3 partitions.

References

Simões, T. R. and S. E. Pierce (2021). Sustained High Rates of Morphological Evolution During the Rise of Tetrapods. Nature Ecology & Evolution 5: 1403–1414.

Posterior parameter samples (3 clock partions)

Description

An example dataset of posterior parameter samples resulting from a clock-based Bayesian inference analysis using the skyline fossilized birth–death process (FBD) tree model with Mr. Bayes after combining all parameter (.p) files into a single data frame with the combine_log function. This particular example was produced by analyzing the data set with three morphological partitions from Simões & Pierce (2021).

Usage

data("posterior3p")data("posterior3p")

Format

A data frame with 4000 observations on several variables estimated for each generation during analysis. The number of variables depends on parameter set up, but for clock analyses with Mr. Bayes, will typically include the following:

Gen: A numeric vector for the generation number
LnL: A numeric vector for the natural log likelihood of the cold chain
LnPr: A numeric vector for the natural log likelihood of the priors
TH.all.: A numeric vector for the total tree height (sum of all branch durations, as chronological units)
TL.all.: A numeric vector for total tree length (sum of all branch lengths, as accumulated substitutions/changes)
prop_ancfossil.all.: A numeric vector indicating the proportion of fossils recovered as ancestors
sigma.1., sigma.2., sigma.3.: A numeric vector for the standard deviation of the lognormal distribution governing how much rates vary across characters for each data partition
m.1., m.2., m.3.: A numeric vector for the rate multiplier parameter for each data partition
net_speciation_1.all., net_speciation_2.all., net_speciation_3.all., net_speciation_4.all.: A numeric vector for net speciation estimates for each time bin
relative_extinction_1.all., relative_extinction_2.all., relative_extinction_3.all., relative_extinction_4.all.: A numeric vector for relative extinction estimates for each time bin
relative_fossilization_1.all., relative_fossilization_2.all., relative_fossilization_3.all., relative_fossilization_4.all.: A numeric vector for relative fossilization estimates for each time bin
tk02var.1., tk02var.2., tk02var.3.: A numeric vector for the variance on the base of the clock rate for each clock partition
clockrate.all.: A numeric vector for the base of the clock rate

Details

Datasets like this one can be produced from parameter log (.p) files using combine_log. The number of variables depends on parameter set up, but for clock analyses with Mr. Bayes, will typically include the ones above, possibly also including an alpha for each partition, which contains the shape of the gamma distribution governing how much rates vary across characters (when shape of the distribution is unlinked across partitions). When using the traditional FBD model rather than the skyline FBD model used to produce this dataset, there will be only one column for each of net_speciation, relative_extinction and relative_fossilization. When using a single morphological partition, different columns may be present; see posterior1p for an example with just one partition.

References

Simões, T. R. and S. E. Pierce (2021). Sustained High Rates of Morphological Evolution During the Rise of Tetrapods. Nature Ecology & Evolution 5: 1403–1414.

Mean clock rates by node and clade (single clock)

Description

A data set containing the mean clock rates for a tree with 1 clock partition, such as the output of get_clockrate_table_MrBayes but with an additional "clade" column added, which is required for use in clockrate_summary and clockrate_dens_plot.

Usage

data("RateTable_Means_1p_Clades")data("RateTable_Means_1p_Clades")

Format

A data frame with 79 observations on the following 3 variables.

clade: A character vector containing the clade names for each corresponding node
nodes: A numeric vector for the node numbers in the summary tree
rates: A numeric vector containing the mean posterior clock rate for each node

Details

RateTable_Means_1p_Clades was created by running get_clockrate_table_MrBayes(tree1p) and then adding a "clade" column. It can be produced by using the following procedure:

1) Import tree file:

data("tree1p")

2) Produce clock rate table with, for instance, mean rate values from each branch in the tree:

rate_table <- get_clockrate_table_MrBayes(tree1p, summary = "mean")

write.csv(rate_table, file = "rate_table.csv", row.names = FALSE)

3) Now, manually add clades using, e.g., Excel:

3.1) Manually edit rate_table.csv, adding a "clade" column. This introduces customized clade names to individual nodes in the tree.

3.2) Save the edited rate table with a different name to differentiate from the original output (e.g., rate_table_clades_means.csv).

4) Read the file back in:

RateTable_Means_1p_Clades <- read.csv("rate_table_clades_means.csv")

head(RateTable_Means_1p_Clades)

Mean clock rates by node and clade (3 clock partitions)

Description

A data set containing the mean clock rates for a tree with 3 clock partitions, such as the output of get_clockrate_table_MrBayes but with an additional "clade" column added, which is required for use in clockrate_summary and clockrate_dens_plot.

Usage

data("RateTable_Means_3p_Clades")data("RateTable_Means_3p_Clades")

Format

A data frame with 79 observations on the following 5 variables.

clade: A character vector containing the clade names for each corresponding node
nodes: A numeric vector for the node numbers in the summary tree
rates1: A numeric vector containing the mean posterior clock rate for each node for the first partition
rates2: A numeric vector containing the mean posterior clock rate for each node for the second partition
rates3: A numeric vector containing the mean posterior clock rate for each node for the third partition

Details

RateTable_Means_3p_Clades was created by running get_clockrate_table_MrBayes(tree3p) and then adding a "clade" column. It can be produced by using the following procedure:

1) Import tree file:

data("tree3p")

2) Produce clock rate table with, for instance, mean rate values from each branch in the tree:

rate_table <- get_clockrate_table_MrBayes(tree3p, summary = "mean")

write.csv(rate_table, file = "rate_table.csv", row.names = FALSE)

3) Now, manually add clades using, e.g., Excel:

3.1) Manually edit rate_table.csv, adding a "clade" column. This introduces customized clade names to individual nodes in the tree.

3.2) Save the edited rate table with a different name to differentiate from the original output (e.g., rate_table_clades_means.csv).

4) Read the file back in:

RateTable_Means_3p_Clades <- read.csv("rate_table_clades_means.csv")

head(RateTable_Means_3p_Clades)

BEAST2 phylogenetic tree with clock rates from partition 1

Description

A clock Bayesian phylogenetic tree with clock rates from a single clock partition (partition 1 here), imported as an S4 class object using treeio::read.beast().

Usage

data("tree_clock1")data("tree_clock1")

Format

A tidytree object.

Details

This example tree file was produced by analyzing the data set with a single morphological partition from

BEAST2 phylogenetic tree with clock rates from partition 2

Description

A clock Bayesian phylogenetic tree with clock rates from a single clock partition (partition 2 here), imported as an S4 class object using treeio::read.beast().

Usage

data("tree_clock2")data("tree_clock2")

Format

A tidytree object.

Details

This example tree file was produced by analyzing the data set with a single morphological partition from

Phylogenetic tree with a single clock partition

Description

A clock Bayesian phylogenetic tree, imported as an S4 class object using treeio::read.mrbayes().

Usage

data("tree1p")data("tree1p")

Format

A tidytree object.

Details

This example tree file was produced by analyzing the data set with a single morphological partition from Simões & Pierce (2021).

References

Simões, T. R. and S. E. Pierce (2021). Sustained High Rates of Morphological Evolution During the Rise of Tetrapods. Nature Ecology & Evolution 5: 1403–1414.

Phylogenetic tree with 3 clock partitions

Description

A clock Bayesian phylogenetic tree, imported as an S4 class object using treeio::read.mrbayes().

Usage

data("tree3p")data("tree3p")

Format

A tidytree object.

Details

This example tree file was produced by analyzing the data set with 3 morphological clock partitions from Simões & Pierce (2021).

References

Simões, T. R. and S. E. Pierce (2021). Sustained High Rates of Morphological Evolution During the Rise of Tetrapods. Nature Ecology & Evolution 5: 1403–1414.

Write character partitions as separate Nexus files (for use in BEAUti)

Description

Write character partitions as separate Nexus files (for use in BEAUti)

Usage

write_partitioned_alignments(x, cluster_df, file)
write_partitioned_alignments(x, cluster_df, file)

Arguments

`x`	character data matrix as Nexus file (.nex) or data frame (with taxa as rows and characters as columns) read directly from local directory
`cluster_df`	cluster partitions as outputted by `make.clusters`
`file`	path to save the alignments. If `file = "example.nex"`, alignments will be saved to files `"example_part1.nex"`, `"example_part2.nex"`, etc.

Value

no return value

Examples

# Load example phylogenetic data matrix
data("characters")

# Create distance matrix
Dmatrix <- get_gower_dist(characters)

# Find optimal partitioning scheme using PAM under k=3 partitions
cluster_df <- make_clusters(Dmatrix, k = 3)

# Write to Nexus files
## Not run: write_partitioned_alignments(characters, cluster_df, "example.nex")
# Load example phylogenetic data matrix
data("characters")

# Create distance matrix
Dmatrix <- get_gower_dist(characters)

# Find optimal partitioning scheme using PAM under k=3 partitions
cluster_df <- make_clusters(Dmatrix, k = 3)

# Write to Nexus files
## Not run: write_partitioned_alignments(characters, cluster_df, "example.nex")

Export multiple treedata objects (S4 class tree files) to BEAST NEXUS file

Description

This function was adopted and modified from treeio::write.beast to export a list of trees instead of a single tree.

Usage

write.beast.treedata(treedata, file = "",
                    translate = TRUE, tree.name = "STATE")

write.beast.treedata(treedata, file = "",
                    translate = TRUE, tree.name = "STATE")

Arguments

`treedata`	An S4 class object of type `treedata` containing multiple trees; e.g. a Bayesian clock tree distribution imported using `treeio::read.beast` or `treeio::read.mrbayes`.
`file`	Output file. If `file = ""`, prints the output content on screen.
`translate`	Whether to translate taxa labels.
`tree.name`	Name of the trees, default `"STATE"`.

Value

Writes object type treedata containing multiple trees to a file or file content on screen

Examples

#Load file with multiple trees
## Not run: 
trees_file = system.file("extdata", "ex_offset.trees", package = "EvoPhylo")
posterior_trees_offset = treeio::read.beast(trees_file)

#Write multiple trees to screen
write.beast.treedata(posterior_trees_offset)

## End(Not run)
#Load file with multiple trees
## Not run: 
trees_file = system.file("extdata", "ex_offset.trees", package = "EvoPhylo")
posterior_trees_offset = treeio::read.beast(trees_file)

#Write multiple trees to screen
write.beast.treedata(posterior_trees_offset)

## End(Not run)

Package 'EvoPhylo'

Help Index

A morphological phylogenetic data matrix

Description

Usage

Format

References

Convert clock rate tables from wide to long format

Description

Usage

Arguments

Details

Value

See Also

Examples

Plot clock rate distributions

Description

Usage

Arguments

Details

Value

See Also

Examples

Plot regression lines between sets of rates

Description

Usage

Arguments

Details

Value

See Also

Examples

Compute rate summary statistics across clades and clocks

Description

Usage

Arguments

Details

Value

See Also

Examples

Export character partitions to a Nexus file

Description

Usage

Arguments

Value

See Also

Examples

Combine and filter (.p) log files from Mr.Bayes

Description

Usage

Arguments

Details

Value

See Also

Examples

Remove dummy tip from beast summary trees, accounting for metadata on the tips

Description

Usage

Arguments

Value

See Also

Examples

Remove dummy tip from Mr. Bayes summary trees, accounting for metadata on the tips

Description

Usage

Arguments

Value

See Also

Examples

Density plots for each FBD parameter

Description

Usage

Arguments

Details

Value

Note

See Also

Examples

Inspect FBD parameter distributions visually

Description

Usage