Title: | Runs various sister group comparisons to test hypotheses about diversification |
---|---|
Description: | Sister groups are pairs of clades that differ in a key character. If that character leads to higher diversification rate, then clades with that trait should have more species than their sisters. There are several tests to see if this is the case, ranging from a basic sign test to more complex ones (many implemented in the ape package). This package can identify sister groups that differ in a binary trait and perform all the relevant tests. It can also discretize a continuous trait into a binary "high" vs "low" state. |
Authors: | Brian O'Meara [aut, cre] |
Maintainer: | Brian O'Meara <[email protected]> |
License: | GPL-3 |
Version: | 0.0.0.9000 |
Built: | 2024-12-22 03:18:05 UTC |
Source: | https://github.com/bomeara/sisters |
Does basic formatting and cleanup: makes sure the taxa are the same order in both, makes sure row names of the data are taxa, etc. Relies on geiger's treedata function. The first_col_names is for software like hisse, where the first column is often taxon names.
sis_clean(phy, traits, first_col_names = FALSE)
sis_clean(phy, traits, first_col_names = FALSE)
phy |
A phylo object |
traits |
A data.frame of traits |
first_col_names |
Boolean on whether the first column has names. |
a list with phy and traits elements
Converts a vector of numbers into a vector of 0 and 1 based on whether they are below or above some value. There are two ways to do this: based on percentile or based on a numeric cutoff. By default, it will separate it based on the 50th percentile (cutoff of 0.5), but you can change the cutoff value and whether it is used as percentile or trait value.
sis_discretize(x, cutoff = 0.5, use_percentile = TRUE)
sis_discretize(x, cutoff = 0.5, use_percentile = TRUE)
x |
Vector of continuous trait values |
cutoff |
Value to use as cutoff. If percentile, 0.3 = 30th percentile, etc. |
use_percentile |
If TRUE, use cutoff as percentile |
a vector of 0 and 1 (and NAs)
Utility function for tossing out taxa already used
sis_find_taxon(taxon, sisters)
sis_find_taxon(taxon, sisters)
taxon |
Node number of taxon |
sisters |
Data.frame from sis_get_sisters() |
data.frame of whether the taxon is in the left or right sister group, or any
Convert a data.frame of all sister groups (from sis_get_sisters) and a vector of 0 and 1 (with names equal to taxon names) to a data.frame with the sister groups that differ in traits.
sis_format_comparison(sisters, trait, phy)
sis_format_comparison(sisters, trait, phy)
sisters |
Data.frame from sis_get_sisters() |
trait |
vector of 0/1 data |
phy |
A phylo object |
data.frame where each row is a sister group comparison.
data(geospiza, package="geiger") cleaned <- sis_clean(geospiza$phy, geospiza$dat) phy <- cleaned$phy traits <- cleaned$traits trait <- sis_discretize(traits[,1]) sisters <- sis_get_sisters(phy, ncores=2) sisters_comparison <- sis_format_comparison(sisters, trait, phy) print(sisters_comparison)
data(geospiza, package="geiger") cleaned <- sis_clean(geospiza$phy, geospiza$dat) phy <- cleaned$phy traits <- cleaned$traits trait <- sis_discretize(traits[,1]) sisters <- sis_get_sisters(phy, ncores=2) sisters_comparison <- sis_format_comparison(sisters, trait, phy) print(sisters_comparison)
Get simplified comparison format suitable for passing into other functions
sis_format_simpified(sisters_comparison)
sis_format_simpified(sisters_comparison)
sisters_comparison |
Data.frame from sis_format_comparison |
Data.frame of two columns: diversity with state 0 and state 1, where each row is a sister group comparison
Get monomorphic trait
sis_get_monomorphic(trait)
sis_get_monomorphic(trait)
trait |
Vector of trait values |
The state all taxa have if monomorphic; NA otherwise
For a node, gives the taxa on each side. Note that the output is a data.frame with lists
sis_get_sister_pair(node, phy)
sis_get_sister_pair(node, phy)
node |
Node number |
phy |
A phylo object |
a data.frame with the node numbers and columns with the tip labels of the two descendant clades
For each node, return the vector of tip numbers for taxa on each side. It is sorted so that sister groups with fewer taxa are arranged at the top.
sis_get_sisters(phy, ncores = 2)
sis_get_sisters(phy, ncores = 2)
phy |
A phylo object |
ncores |
How many cores to use to run this in parallel. I suggest parallel::detectCores(), but set it at 2 for a default (otherwise CRAN checks fail) |
a data.frame with the node numbers and columns with the tip labels of the two descendant clades, plus additional info on the sister groups
Get trait values for tip numbers
sis_get_trait_values(nodes, phy, trait)
sis_get_trait_values(nodes, phy, trait)
nodes |
vector of node numbers (tip numbers, actually) |
phy |
A phylo object |
trait |
A trait vector with names equal to taxon names |
This is a way of looking at the effect of using different cutoff values on the sister group comparisons. Do clades with a higher value have more species than their sister, and is this robust to what cutoff value is used? At the extremes (the min and max value) this is almost certainly not the case, unless you have many taxa with the same maximum or minimum values.
sis_iterate( x, nsteps = 11, phy, sisters = sis_get_sisters(phy), drop_matches = TRUE )
sis_iterate( x, nsteps = 11, phy, sisters = sis_get_sisters(phy), drop_matches = TRUE )
x |
Vector of continuous trait values |
nsteps |
Number of thresholds to try |
phy |
A phylo object |
sisters |
Data.frame from sis_get_sisters() |
drop_matches |
Drop sister group comparisons with equal numbers of taxa |
This is a very dangerous function to use. Someone could use this to find the perfect cutoff value to find a significant result. This is one of the many forms of p-hacking. So, if you use this function and then report on significance using some cutoff, you MUST mention somewhere in your manuscript that you've tried a variety of cutoff values, and include a discussion of why you used a particular cutoff. Ideally, you should have some biological intuition about what cutoff value is reasonable before using this function, as well.
A data.frame, where each column is for a different cutoff percentile and every row is a number returned from sis_test()
data(geospiza, package="geiger") cleaned <- sis_clean(geospiza$phy, geospiza$dat) phy <- cleaned$phy trait <- cleaned$traits[,1] sis_iterate(trait, phy=phy)
data(geospiza, package="geiger") cleaned <- sis_clean(geospiza$phy, geospiza$dat) phy <- cleaned$phy trait <- cleaned$traits[,1] sis_iterate(trait, phy=phy)
Do a test with a single cutoff value
sis_iterate_single_run( cutoff, x, use_percentile = TRUE, phy, sisters = sis_get_sisters(phy), drop_matches = TRUE, warn = FALSE )
sis_iterate_single_run( cutoff, x, use_percentile = TRUE, phy, sisters = sis_get_sisters(phy), drop_matches = TRUE, warn = FALSE )
cutoff |
Value to use as cutoff. If percentile, 0.3 = 30th percentile, etc. |
x |
Vector of continuous trait values |
use_percentile |
If TRUE, use cutoff as percentile |
phy |
A phylo object |
sisters |
Data.frame from sis_get_sisters() |
drop_matches |
Drop sister group comparisons with equal numbers of taxa |
warn |
Some tests will fail with warnings (too few sister groups or other reasons). Setting this to FALSE will suppress those |
vector of output from sis_test()
Compute multiple tests based on sister group comparisons
sis_test(pairs, drop_matches = TRUE, warn = TRUE)
sis_test(pairs, drop_matches = TRUE, warn = TRUE)
pairs |
Data.frame with one row per sister group comparison, with one column for number of taxa in state 0, and one column for the number of taxa in state 1. |
drop_matches |
Drop sister group comparisons with equal numbers of taxa |
warn |
Some tests will fail with warnings (too few sister groups or other reasons). Setting this to FALSE will suppress those |
A vector with the results of many tests, as well as summary data for the comparisons
data(geospiza, package="geiger") cleaned <- sis_clean(geospiza$phy, geospiza$dat) phy <- cleaned$phy traits <- cleaned$traits trait <- sis_discretize(traits[,1]) sisters <- sis_get_sisters(phy) sisters_comparison <- sis_format_comparison(sisters, trait, phy) pairs <- sis_format_simpified(sisters_comparison) sis_test(pairs)
data(geospiza, package="geiger") cleaned <- sis_clean(geospiza$phy, geospiza$dat) phy <- cleaned$phy traits <- cleaned$traits trait <- sis_discretize(traits[,1]) sisters <- sis_get_sisters(phy) sisters_comparison <- sis_format_comparison(sisters, trait, phy) pairs <- sis_format_simpified(sisters_comparison) sis_test(pairs)
Try but returns NA rather than error
tryNA(code, silent = FALSE)
tryNA(code, silent = FALSE)
code |
Code to run |
silent |
Print error if TRUE |