Title: | Extending 'dendrogram' Functionality in R |
---|---|
Description: | Offers a set of functions for extending 'dendrogram' objects in R, letting you visualize and compare trees of 'hierarchical clusterings'. You can (1) Adjust a tree's graphical parameters - the color, size, type, etc of its branches, nodes and labels. (2) Visually and statistically compare different 'dendrograms' to one another. |
Authors: | Tal Galili [aut, cre, cph] (https://www.r-statistics.com), Yoav Benjamini [ths], Gavin Simpson [ctb], Gregory Jefferis [aut, ctb] (imported code from his dendroextras package), Marco Gallotta [ctb] (a.k.a: marcog), Johan Renaudie [ctb] (https://github.com/plannapus), The R Core Team [ctb] (Thanks for the Infastructure, and code in the examples), Kurt Hornik [ctb], Uwe Ligges [ctb], Andrej-Nikolai Spiess [ctb], Steve Horvath [ctb], Peter Langfelder [ctb], skullkey [ctb], Mark Van Der Loo [ctb] (https://github.com/markvanderloo d3dendrogram), Andrie de Vries [ctb] (ggdendro author), Zuguang Gu [ctb] (circlize author), Cath [ctb] (https://github.com/CathG), John Ma [ctb] (https://github.com/JohnMCMa), Krzysiek G [ctb] (https://github.com/storaged), Manuela Hummel [ctb] (https://github.com/hummelma), Chase Clark [ctb] (https://github.com/chasemc), Lucas Graybuck [ctb] (https://github.com/hypercompetent), jdetribol [ctb] (https://github.com/jdetribol), Ben Ho [ctb] (https://github.com/SplitInf), Samuel Perreault [ctb] (https://github.com/samperochkin), Christian Hennig [ctb] (http://www.homepages.ucl.ac.uk/~ucakche/), David Bradley [ctb] (https://github.com/DBradley27), Houyun Huang [ctb] (https://github.com/houyunhuang), Patrick Schupp [ctb] (https://github.com/pschupp), Alec Buetow [ctb] (https://github.com/alecbuetow) |
Maintainer: | Tal Galili <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 1.19.0 |
Built: | 2024-11-15 06:35:32 UTC |
Source: | https://github.com/talgalili/dendextend |
Offers a set of functions for extending 'dendrogram' objects in R, letting you visualize and compare trees of 'hierarchical clusterings'. You can (1) Adjust a tree's graphical parameters - the color, size, type, etc of its branches, nodes and labels. (2) Visually and statistically compare different 'dendrograms' to one another.
Maintainer: Tal Galili [email protected] (https://www.r-statistics.com) [copyright holder]
Authors:
Gregory Jefferis [email protected] (imported code from his dendroextras package) [contributor]
Other contributors:
Yoav Benjamini [email protected] [thesis advisor]
Gavin Simpson [contributor]
Marco Gallotta (a.k.a: marcog) [contributor]
Johan Renaudie (https://github.com/plannapus) [contributor]
The R Core Team (Thanks for the Infastructure, and code in the examples) [contributor]
Kurt Hornik [contributor]
Uwe Ligges [contributor]
Andrej-Nikolai Spiess [contributor]
Steve Horvath [email protected] [contributor]
Peter Langfelder [email protected] [contributor]
skullkey [contributor]
Mark Van Der Loo [email protected] (https://github.com/markvanderloo d3dendrogram) [contributor]
Andrie de Vries [email protected] (ggdendro author) [contributor]
Zuguang Gu [email protected] (circlize author) [contributor]
Cath (https://github.com/CathG) [contributor]
John Ma (https://github.com/JohnMCMa) [contributor]
Krzysiek G (https://github.com/storaged) [contributor]
Manuela Hummel [email protected] (https://github.com/hummelma) [contributor]
Chase Clark (https://github.com/chasemc) [contributor]
Lucas Graybuck (https://github.com/hypercompetent) [contributor]
jdetribol (https://github.com/jdetribol) [contributor]
Ben Ho [email protected] (https://github.com/SplitInf) [contributor]
Samuel Perreault [email protected] (https://github.com/samperochkin) [contributor]
Christian Hennig [email protected] (http://www.homepages.ucl.ac.uk/~ucakche/) [contributor]
David Bradley (https://github.com/DBradley27) [contributor]
Houyun Huang [email protected] (https://github.com/houyunhuang) [contributor]
Patrick Schupp [email protected] (https://github.com/pschupp) [contributor]
Alec Buetow [email protected] (https://github.com/alecbuetow) [contributor]
dendrogram, hclust
in stats
package.
Given a tree and a k number of clusters, the tree is rotated so that the extra clusters added from k-1 to k clusters are flipped.
This is useful for finding good trees for a tanglegram.
all_couple_rotations_at_k(dend, k, dend_heights_per_k, ...)
all_couple_rotations_at_k(dend, k, dend_heights_per_k, ...)
dend |
a dendrogram object |
k |
integer scalar with the number of clusters the tree should be cut into. |
dend_heights_per_k |
a named vector that resulted from running heights_per_k.dendrogram. When running the function many times, supplying this object will help improve the running time if using the cutree.dendrogram method.. |
... |
not used |
A list with dendrogram objects with all the possible rotations for k clusters (beyond the k-1 clusters!).
tanglegram, match_order_by_labels, entanglement, flip_leaves.
## Not run: dend1 <- USArrests[1:5, ] %>% dist() %>% hclust() %>% as.dendrogram() dend2 <- all_couple_rotations_at_k(dend1, k = 2)[[2]] tanglegram(dend1, dend2) entanglement(dend1, dend2, L = 2) # 0.5 dend2 <- all_couple_rotations_at_k(dend1, k = 3)[[2]] tanglegram(dend1, dend2) entanglement(dend1, dend2, L = 2) # 0.4 dend2 <- all_couple_rotations_at_k(dend1, k = 4)[[2]] tanglegram(dend1, dend2) entanglement(dend1, dend2, L = 2) # 0.05 ## End(Not run)
## Not run: dend1 <- USArrests[1:5, ] %>% dist() %>% hclust() %>% as.dendrogram() dend2 <- all_couple_rotations_at_k(dend1, k = 2)[[2]] tanglegram(dend1, dend2) entanglement(dend1, dend2, L = 2) # 0.5 dend2 <- all_couple_rotations_at_k(dend1, k = 3)[[2]] tanglegram(dend1, dend2) entanglement(dend1, dend2, L = 2) # 0.4 dend2 <- all_couple_rotations_at_k(dend1, k = 4)[[2]] tanglegram(dend1, dend2) entanglement(dend1, dend2, L = 2) # 0.05 ## End(Not run)
Checks if all the elements in a vector are unique
all_unique(x, ...)
all_unique(x, ...)
x |
a vector |
... |
ignored. |
logical (are all the elements in the vector unique)
https://www.mail-archive.com/[email protected]/msg77592.html OLD (no longer working): https://r.789695.n4.nabble.com/Is-there-a-function-to-test-if-all-the-elements-in-a-vector-are-unique-td931833.html
all_unique(c(1:5, 1, 1)) all_unique(c(1, 1, 2)) all_unique(c(1, 1, 2, 3, 3, 3, 3)) all_unique(c(1, 3, 2)) all_unique(c(1:10))
all_unique(c(1:5, 1, 1)) all_unique(c(1, 1, 2)) all_unique(c(1, 1, 2, 3, 3, 3, 3)) all_unique(c(1, 3, 2)) all_unique(c(1:10))
This function makes a global comparison of two or more dendrograms trees.
The function can get two dendlist objects and compare them using all.equal.list. If a dendlist is in only "target" (and not "current"), it will go through the dendlist and compare all of the dendrograms within it to one another.
## S3 method for class 'dendrogram' all.equal(target, current, use.edge.length = TRUE, use.tip.label.order = FALSE, use.tip.label = TRUE, use.topology = TRUE, tolerance = .Machine$double.eps^0.5, scale = NULL, ...) ## S3 method for class 'dendlist' all.equal(target, current, ...)
## S3 method for class 'dendrogram' all.equal(target, current, use.edge.length = TRUE, use.tip.label.order = FALSE, use.tip.label = TRUE, use.topology = TRUE, tolerance = .Machine$double.eps^0.5, scale = NULL, ...) ## S3 method for class 'dendlist' all.equal(target, current, ...)
target |
an object of type dendrogram or dendlist |
current |
an object of type dendrogram |
use.edge.length |
logical (TRUE). If to check branches' heights. |
use.tip.label.order |
logical (FALSE). If to check labels are in the same and in identical order |
use.tip.label |
logical (TRUE). If to check that labels are the same (regardless of order) |
use.topology |
logical (TRUE). If to check teh existence of distinct edges |
tolerance |
the numeric tolerance used to compare the branch lengths. |
scale |
a positive number (NULL as default), comparison of branch height is made after scaling (i.e., dividing) them by this number. |
... |
Ignored. |
Either TRUE (NULL for attr.all.equal) or a vector of mode "character" describing the differences between target and current.
all.equal, all.equal.phylo, identical
## Not run: set.seed(23235) ss <- sample(1:150, 10) dend1 <- iris[ss, -5] %>% dist() %>% hclust("com") %>% as.dendrogram() dend2 <- iris[ss, -5] %>% dist() %>% hclust("single") %>% as.dendrogram() dend3 <- iris[ss, -5] %>% dist() %>% hclust("ave") %>% as.dendrogram() dend4 <- iris[ss, -5] %>% dist() %>% hclust("centroid") %>% as.dendrogram() # cutree(dend1) all.equal(dend1, dend1) all.equal(dend1, dend2) all.equal(dend1, dend2, use.edge.length = FALSE) all.equal(dend1, dend2, use.edge.length = FALSE, use.topology = FALSE) all.equal(dend2, dend4, use.edge.length = TRUE) all.equal(dend2, dend4, use.edge.length = FALSE) all.equal(dendlist(dend1, dend2, dend3, dend4)) all.equal(dendlist(dend1, dend2, dend3, dend4), use.edge.length = FALSE) all.equal(dendlist(dend1, dend1, dend1)) ## End(Not run)
## Not run: set.seed(23235) ss <- sample(1:150, 10) dend1 <- iris[ss, -5] %>% dist() %>% hclust("com") %>% as.dendrogram() dend2 <- iris[ss, -5] %>% dist() %>% hclust("single") %>% as.dendrogram() dend3 <- iris[ss, -5] %>% dist() %>% hclust("ave") %>% as.dendrogram() dend4 <- iris[ss, -5] %>% dist() %>% hclust("centroid") %>% as.dendrogram() # cutree(dend1) all.equal(dend1, dend1) all.equal(dend1, dend2) all.equal(dend1, dend2, use.edge.length = FALSE) all.equal(dend1, dend2, use.edge.length = FALSE, use.topology = FALSE) all.equal(dend2, dend4, use.edge.length = TRUE) all.equal(dend2, dend4, use.edge.length = FALSE) all.equal(dendlist(dend1, dend2, dend3, dend4)) all.equal(dendlist(dend1, dend2, dend3, dend4), use.edge.length = FALSE) all.equal(dendlist(dend1, dend1, dend1)) ## End(Not run)
Convert dendrogram Objects to Class hclust while preserving the call/method/dist.method values of the original hclust object (hc)
as_hclust_fixed(x, hc, ...)
as_hclust_fixed(x, hc, ...)
x |
any object which has an as.hclust method. (mostly used for dendrogram) |
hc |
an old hclust object from which to re-use the call/method/dist.method values |
... |
passed to as.hclust |
An hclust object (from a dendrogram) with the original hclust call/method/dist.method values
hc <- hclust(dist(USArrests[1:3, ]), "ave") dend <- as.dendrogram(hc) as.hclust(dend) as_hclust_fixed(dend, hc)
hc <- hclust(dist(USArrests[1:3, ]), "ave") dend <- as.dendrogram(hc) as.hclust(dend) as_hclust_fixed(dend, hc)
It removes stuff that are not dendgrogram/dendlist and turns what is left into a dendlist
as.dendlist(x, ...)
as.dendlist(x, ...)
x |
a list with several dendrogram/hclust/phylo or dendlist objects and other junk that should be omitted. |
... |
NOT USED |
A list of class dendlist where each item is a dendrogram
## Not run: dend <- iris[, -5] %>% dist() %>% hclust() %>% as.dendrogram() dend2 <- iris[, -5] %>% dist() %>% hclust(method = "single") %>% as.dendrogram() x <- list(dend, 1, dend2) as.dendlist(x) ## End(Not run)
## Not run: dend <- iris[, -5] %>% dist() %>% hclust() %>% as.dendrogram() dend2 <- iris[, -5] %>% dist() %>% hclust(method = "single") %>% as.dendrogram() x <- list(dend, 1, dend2) as.dendlist(x) ## End(Not run)
Based on as.hclust.dendrogram with as.phylo.hclust
In the future I hope a more direct link will be made.
as.phylo.dendrogram(x, ...)
as.phylo.dendrogram(x, ...)
x |
a dendrogram |
... |
ignored. |
A phylo class object
as.dendrogram, as.hclust, as.phylo
## Not run: library(dendextend) library(ape) dend <- iris[1:30, -5] %>% dist() %>% hclust() %>% as.dendrogram() dend2 <- as.phylo(dend) plot(dend2, type = "fan") library(dendextend) library(ggplot2) # no longer needed: library(ggdendro) dend <- iris[1:30, -5] %>% dist() %>% hclust() %>% as.dendrogram() # there is a bug in the location of the labels # If you want to solve it - please send a Pull Request to: # https://github.com/talgalili/dendextend/ ggplot(dend) + scale_y_reverse(expand = c(0.2, 0)) + coord_polar(start = 1, theta="x") ## End(Not run) # see: https://github.com/klutometis/roxygen/issues/796 #
## Not run: library(dendextend) library(ape) dend <- iris[1:30, -5] %>% dist() %>% hclust() %>% as.dendrogram() dend2 <- as.phylo(dend) plot(dend2, type = "fan") library(dendextend) library(ggplot2) # no longer needed: library(ggdendro) dend <- iris[1:30, -5] %>% dist() %>% hclust() %>% as.dendrogram() # there is a bug in the location of the labels # If you want to solve it - please send a Pull Request to: # https://github.com/talgalili/dendextend/ ggplot(dend) + scale_y_reverse(expand = c(0.2, 0)) + coord_polar(start = 1, theta="x") ## End(Not run) # see: https://github.com/klutometis/roxygen/issues/796 #
Populates dendextend functions into dendextend_options
assign_dendextend_options()
assign_dendextend_options()
Go through the dendrogram branches and updates the values inside its edgePar
If the value has Inf then the value in edgePar will not be changed.
assign_values_to_branches_edgePar( dend, value, edgePar, skip_leaves = FALSE, warn = dendextend_options("warn"), ... )
assign_values_to_branches_edgePar( dend, value, edgePar, skip_leaves = FALSE, warn = dendextend_options("warn"), ... )
dend |
a dendrogram object |
value |
a new value scalar for the edgePar attribute. |
edgePar |
a character indicating the value inside edgePar to adjust. Can be either "col", "lty", or "lwd". |
skip_leaves |
logical (FALSE) - should the leaves be skipped/ignored? |
warn |
logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE. |
... |
not used |
A dendrogram, after adjusting the edgePar attribute in all of its branches,
# This failed before - now it works fine. (thanks to Martin Maechler) dend <- 1:2 %>% dist() %>% hclust() %>% as.dendrogram() dend %>% set("branches_lty", 1:2) %>% set("branches_col", c("topbranch_never_plots", "black", "orange")) %>% plot() ## Not run: dend <- USArrests[1:5, ] %>% dist() %>% hclust() %>% as.dendrogram() plot(dend) dend <- assign_values_to_branches_edgePar(dend = dend, value = 2, edgePar = "lwd") plot(dend) dend <- assign_values_to_branches_edgePar(dend = dend, value = 2, edgePar = "col") plot(dend) dend <- assign_values_to_branches_edgePar(dend = dend, value = "orange", edgePar = "col") plot(dend) dend2 <- assign_values_to_branches_edgePar(dend = dend, value = 2, edgePar = "lty") plot(dend2) dend2 %>% unclass() %>% str() ## End(Not run)
# This failed before - now it works fine. (thanks to Martin Maechler) dend <- 1:2 %>% dist() %>% hclust() %>% as.dendrogram() dend %>% set("branches_lty", 1:2) %>% set("branches_col", c("topbranch_never_plots", "black", "orange")) %>% plot() ## Not run: dend <- USArrests[1:5, ] %>% dist() %>% hclust() %>% as.dendrogram() plot(dend) dend <- assign_values_to_branches_edgePar(dend = dend, value = 2, edgePar = "lwd") plot(dend) dend <- assign_values_to_branches_edgePar(dend = dend, value = 2, edgePar = "col") plot(dend) dend <- assign_values_to_branches_edgePar(dend = dend, value = "orange", edgePar = "col") plot(dend) dend2 <- assign_values_to_branches_edgePar(dend = dend, value = 2, edgePar = "lty") plot(dend2) dend2 %>% unclass() %>% str() ## End(Not run)
Go through the dendrogram leaves and updates the values inside its edgePar
If the value has Inf then the value in edgePar will not be changed.
assign_values_to_leaves_edgePar( dend, value, edgePar, warn = dendextend_options("warn"), ... )
assign_values_to_leaves_edgePar( dend, value, edgePar, warn = dendextend_options("warn"), ... )
dend |
a dendrogram object |
value |
a new value vector for the edgePar attribute. It should be the same length as the number of leaves in the tree. If not, it will recycle the value and issue a warning. |
edgePar |
the value inside edgePar to adjust. |
warn |
logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE. |
... |
not used |
A dendrogram, after adjusting the edgePar attribute in all of its leaves,
get_leaves_attr, assign_values_to_leaves_nodePar
## Not run: dend <- USArrests[1:5, ] %>% dist() %>% hclust("ave") %>% as.dendrogram() plot(dend) dend <- assign_values_to_leaves_edgePar(dend = dend, value = c(3, 2), edgePar = "col") plot(dend) dend <- assign_values_to_leaves_edgePar(dend = dend, value = c(3, 2), edgePar = "lwd") plot(dend) dend <- assign_values_to_leaves_edgePar(dend = dend, value = c(3, 2), edgePar = "lty") plot(dend) get_leaves_attr(dend, "edgePar", simplify = FALSE) ## End(Not run)
## Not run: dend <- USArrests[1:5, ] %>% dist() %>% hclust("ave") %>% as.dendrogram() plot(dend) dend <- assign_values_to_leaves_edgePar(dend = dend, value = c(3, 2), edgePar = "col") plot(dend) dend <- assign_values_to_leaves_edgePar(dend = dend, value = c(3, 2), edgePar = "lwd") plot(dend) dend <- assign_values_to_leaves_edgePar(dend = dend, value = c(3, 2), edgePar = "lty") plot(dend) get_leaves_attr(dend, "edgePar", simplify = FALSE) ## End(Not run)
Go through the dendrogram leaves and updates the values inside its nodePar
If the value has Inf then the value in edgePar will not be changed.
assign_values_to_leaves_nodePar( dend, value, nodePar, warn = dendextend_options("warn"), ... )
assign_values_to_leaves_nodePar( dend, value, nodePar, warn = dendextend_options("warn"), ... )
dend |
a dendrogram object |
value |
a new value vector for the nodePar attribute. It should be the same length as the number of leaves in the tree. If not, it will recycle the value and issue a warning. |
nodePar |
the value inside nodePar to adjust. |
warn |
logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE. |
... |
not used |
A dendrogram, after adjusting the nodePar attribute in all of its leaves,
## Not run: dend <- USArrests[1:5, ] %>% dist() %>% hclust("ave") %>% as.dendrogram() # reproduces "labels_colors<-" # although it does force us to run through the tree twice, # hence "labels_colors<-" is better... plot(dend) dend <- assign_values_to_leaves_nodePar(dend = dend, value = c(3, 2), nodePar = "lab.col") plot(dend) dend <- assign_values_to_leaves_nodePar(dend, 1, "pch") plot(dend) # fix the annoying pch=1: dend <- assign_values_to_leaves_nodePar(dend, NA, "pch") plot(dend) # adjust the cex: dend <- assign_values_to_leaves_nodePar(dend, 19, "pch") dend <- assign_values_to_leaves_nodePar(dend, 2, "lab.cex") plot(dend) str(unclass(dend)) get_leaves_attr(dend, "nodePar", simplify = FALSE) ## End(Not run)
## Not run: dend <- USArrests[1:5, ] %>% dist() %>% hclust("ave") %>% as.dendrogram() # reproduces "labels_colors<-" # although it does force us to run through the tree twice, # hence "labels_colors<-" is better... plot(dend) dend <- assign_values_to_leaves_nodePar(dend = dend, value = c(3, 2), nodePar = "lab.col") plot(dend) dend <- assign_values_to_leaves_nodePar(dend, 1, "pch") plot(dend) # fix the annoying pch=1: dend <- assign_values_to_leaves_nodePar(dend, NA, "pch") plot(dend) # adjust the cex: dend <- assign_values_to_leaves_nodePar(dend, 19, "pch") dend <- assign_values_to_leaves_nodePar(dend, 2, "lab.cex") plot(dend) str(unclass(dend)) get_leaves_attr(dend, "nodePar", simplify = FALSE) ## End(Not run)
Go through the dendrogram nodes and updates the values inside its nodePar
If the value has Inf then the value in edgePar will not be changed.
assign_values_to_nodes_nodePar( dend, value, nodePar = c("pch", "cex", "col", "xpd", "bg"), warn = dendextend_options("warn"), ... )
assign_values_to_nodes_nodePar( dend, value, nodePar = c("pch", "cex", "col", "xpd", "bg"), warn = dendextend_options("warn"), ... )
dend |
a dendrogram object |
value |
a new value vector for the nodePar attribute. It should be the same length as the number of nodes in the tree. If not, it will recycle the value and issue a warning. |
nodePar |
the value inside nodePar to adjust. This may contain components named pch, cex, col, xpd, and/or bg. |
warn |
logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE. |
... |
not used |
A dendrogram, after adjusting the nodePar attribute in all of its nodes,
get_leaves_attr, assign_values_to_leaves_nodePar
## Not run: dend <- USArrests[1:5, ] %>% dist() %>% hclust("ave") %>% as.dendrogram() # reproduces "labels_colors<-" # although it does force us to run through the tree twice, # hence "labels_colors<-" is better... plot(dend) dend2 <- dend %>% assign_values_to_nodes_nodePar(value = 19, nodePar = "pch") %>% assign_values_to_nodes_nodePar(value = c(1, 2), nodePar = "cex") %>% assign_values_to_nodes_nodePar(value = c(2, 1), nodePar = "col") plot(dend2) ### Making sure this works for NA with character. dend %>% assign_values_to_nodes_nodePar(value = 19, nodePar = "pch") %>% assign_values_to_nodes_nodePar(value = c("red", NA), nodePar = "col") -> dend2 plot(dend2) ## End(Not run)
## Not run: dend <- USArrests[1:5, ] %>% dist() %>% hclust("ave") %>% as.dendrogram() # reproduces "labels_colors<-" # although it does force us to run through the tree twice, # hence "labels_colors<-" is better... plot(dend) dend2 <- dend %>% assign_values_to_nodes_nodePar(value = 19, nodePar = "pch") %>% assign_values_to_nodes_nodePar(value = c(1, 2), nodePar = "cex") %>% assign_values_to_nodes_nodePar(value = c(2, 1), nodePar = "col") plot(dend2) ### Making sure this works for NA with character. dend %>% assign_values_to_nodes_nodePar(value = 19, nodePar = "pch") %>% assign_values_to_nodes_nodePar(value = c("red", NA), nodePar = "col") -> dend2 plot(dend2) ## End(Not run)
Bakers Gamma for two k matrices
bakers_gamma_for_2_k_matrix( k_matrix_dend1, k_matrix_dend2, to_plot = FALSE, ... )
bakers_gamma_for_2_k_matrix( k_matrix_dend1, k_matrix_dend2, to_plot = FALSE, ... )
k_matrix_dend1 |
a matrix of k cluster groupings from a dendrogram |
k_matrix_dend2 |
a (second) matrix of k cluster groupings from a dendrogram |
to_plot |
logical (FALSE). Should a scaterplot be plotted, showing the correlation between the lowest shared branch between two items in the two compared trees. |
... |
not used |
Baker's Gamma coefficient.
Bk is the calculation of Fowlkes-Mallows index for a series of k cuts for two dendrograms.
Bk(tree1, tree2, k, warn = dendextend_options("warn"), ...)
Bk(tree1, tree2, k, warn = dendextend_options("warn"), ...)
tree1 |
a dendrogram/hclust/phylo object. |
tree2 |
a dendrogram/hclust/phylo object. |
k |
an integer scalar or vector with the desired number of cluster groups. If missing - the Bk will be calculated for a default k range of 2:(nleaves-1). No point in checking k=1/k=n, since both will give Bk=1. |
warn |
logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE. |
... |
Ignored (passed to FM_index_R). |
From Wikipedia:
Fowlkes-Mallows index (see references) is an external evaluation method that is used to determine the similarity between two clusterings (clusters obtained after a clustering algorithm). This measure of similarity could be either between two hierarchical clusterings or a clustering and a benchmark classification. A higher the value for the Fowlkes-Mallows index indicates a greater similarity between the clusters and the benchmark classifications.
A list (of k's length) of Fowlkes-Mallows index between two dendrogram for a scalar/vector of k values. The names of the lists' items is the k for which it was calculated.
Fowlkes, E. B.; Mallows, C. L. (1 September 1983). "A Method for Comparing Two Hierarchical Clusterings". Journal of the American Statistical Association 78 (383): 553.
https://en.wikipedia.org/wiki/Fowlkes-Mallows_index
FM_index, cor_bakers_gamma, Bk_plot
## Not run: set.seed(23235) ss <- TRUE # sample(1:150, 10 ) hc1 <- hclust(dist(iris[ss, -5]), "com") hc2 <- hclust(dist(iris[ss, -5]), "single") tree1 <- as.dendrogram(hc1) tree2 <- as.dendrogram(hc2) # cutree(tree1) Bk(hc1, hc2, k = 3) Bk(hc1, hc2, k = 2:10) Bk(hc1, hc2) Bk(tree1, tree2, k = 3) Bk(tree1, tree2, k = 2:5) system.time(Bk(hc1, hc2, k = 2:5)) # 0.01 system.time(Bk(hc1, hc2)) # 1.28 system.time(Bk(tree1, tree2, k = 2:5)) # 0.24 # after fixes. system.time(Bk(tree1, tree2, k = 2:10)) # 0.31 # after fixes. system.time(Bk(tree1, tree2)) # 7.85 Bk(tree1, tree2, k = 99:101) y <- Bk(hc1, hc2, k = 2:10) plot(unlist(y) ~ c(2:10), type = "b", ylim = c(0, 1)) # can take a few seconds y <- Bk(hc1, hc2) plot(unlist(y) ~ as.numeric(names(y)), main = "Bk plot", pch = 20, xlab = "k", ylab = "FM Index", type = "b", ylim = c(0, 1) ) # we are still missing some hypothesis testing here. # for this we'll have the Bk_plot function. ## End(Not run)
## Not run: set.seed(23235) ss <- TRUE # sample(1:150, 10 ) hc1 <- hclust(dist(iris[ss, -5]), "com") hc2 <- hclust(dist(iris[ss, -5]), "single") tree1 <- as.dendrogram(hc1) tree2 <- as.dendrogram(hc2) # cutree(tree1) Bk(hc1, hc2, k = 3) Bk(hc1, hc2, k = 2:10) Bk(hc1, hc2) Bk(tree1, tree2, k = 3) Bk(tree1, tree2, k = 2:5) system.time(Bk(hc1, hc2, k = 2:5)) # 0.01 system.time(Bk(hc1, hc2)) # 1.28 system.time(Bk(tree1, tree2, k = 2:5)) # 0.24 # after fixes. system.time(Bk(tree1, tree2, k = 2:10)) # 0.31 # after fixes. system.time(Bk(tree1, tree2)) # 7.85 Bk(tree1, tree2, k = 99:101) y <- Bk(hc1, hc2, k = 2:10) plot(unlist(y) ~ c(2:10), type = "b", ylim = c(0, 1)) # can take a few seconds y <- Bk(hc1, hc2) plot(unlist(y) ~ as.numeric(names(y)), main = "Bk plot", pch = 20, xlab = "k", ylab = "FM Index", type = "b", ylim = c(0, 1) ) # we are still missing some hypothesis testing here. # for this we'll have the Bk_plot function. ## End(Not run)
Bk is the calculation of Fowlkes-Mallows index for a series of k cuts for two dendrograms.
Bk permutation calculates the Bk under the null hypothesis of no similarirty between the two trees by randomally shuffling the labels of the two trees and calculating their Bk.
Bk_permutations( tree1, tree2, k, R = 1000, warn = dendextend_options("warn"), ... )
Bk_permutations( tree1, tree2, k, R = 1000, warn = dendextend_options("warn"), ... )
tree1 |
a dendrogram/hclust/phylo object. |
tree2 |
a dendrogram/hclust/phylo object. |
k |
an integer scalar or vector with the desired number of cluster groups. If missing - the Bk will be calculated for a default k range of 2:(nleaves-1). No point in checking k=1/k=n, since both will give Bk=1. |
R |
integer (Default is 1000). The number of Bk permutation to perform for each k. |
warn |
logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE. If set to TRUE, extra checks are made to varify that the two clusters have the same size and the same labels. |
... |
Ignored (passed to FM_index_R). |
From Wikipedia:
Fowlkes-Mallows index (see references) is an external evaluation method that is used to determine the similarity between two clusterings (clusters obtained after a clustering algorithm). This measure of similarity could be either between two hierarchical clusterings or a clustering and a benchmark classification. A higher the value for the Fowlkes-Mallows index indicates a greater similarity between the clusters and the benchmark classifications.
A list (of the length of k's), where each element of the list has R (number of permutations) calculations of Fowlkes-Mallows index between two dendrogram after having their labels shuffled.
The names of the lists' items is the k for which it was calculated.
Fowlkes, E. B.; Mallows, C. L. (1 September 1983). "A Method for Comparing Two Hierarchical Clusterings". Journal of the American Statistical Association 78 (383): 553.
https://en.wikipedia.org/wiki/Fowlkes-Mallows_index
## Not run: set.seed(23235) ss <- TRUE # sample(1:150, 10 ) hc1 <- hclust(dist(iris[ss, -5]), "com") hc2 <- hclust(dist(iris[ss, -5]), "single") # tree1 <- as.treerogram(hc1) # tree2 <- as.treerogram(hc2) # cutree(tree1) some_Bk <- Bk(hc1, hc2, k = 20) some_Bk_permu <- Bk_permutations(hc1, hc2, k = 20) # we can see that the Bk is much higher than the permutation Bks: plot( x = rep(1, 1000), y = some_Bk_permu[[1]], main = "Bk distribution under H0", ylim = c(0, 1) ) points(1, y = some_Bk, pch = 19, col = 2) ## End(Not run)
## Not run: set.seed(23235) ss <- TRUE # sample(1:150, 10 ) hc1 <- hclust(dist(iris[ss, -5]), "com") hc2 <- hclust(dist(iris[ss, -5]), "single") # tree1 <- as.treerogram(hc1) # tree2 <- as.treerogram(hc2) # cutree(tree1) some_Bk <- Bk(hc1, hc2, k = 20) some_Bk_permu <- Bk_permutations(hc1, hc2, k = 20) # we can see that the Bk is much higher than the permutation Bks: plot( x = rep(1, 1000), y = some_Bk_permu[[1]], main = "Bk distribution under H0", ylim = c(0, 1) ) points(1, y = some_Bk, pch = 19, col = 2) ## End(Not run)
Bk is the calculation of Fowlkes-Mallows index for a series of k cuts for two dendrograms. A Bk plot is simply a scatter plot of Bk versus k. This plot helps in identifiying the similarity between two dendrograms in different levels of k (number of clusters).
Bk_plot( tree1, tree2, k, add_E = TRUE, rejection_line_asymptotic = TRUE, rejection_line_permutation = FALSE, R = 1000, k_permutation, conf.level = 0.95, p.adjust.methods = c("none", "bonferroni"), col_line_Bk = 1, col_line_asymptotic = 2, col_line_permutation = 4, warn = dendextend_options("warn"), main = "Bk plot", xlab = "k (number of clusters)", ylab = "Bk (Fowlkes-Mallows Index)", xlim, ylim = c(0, 1), try_cutree_hclust = TRUE, ... )
Bk_plot( tree1, tree2, k, add_E = TRUE, rejection_line_asymptotic = TRUE, rejection_line_permutation = FALSE, R = 1000, k_permutation, conf.level = 0.95, p.adjust.methods = c("none", "bonferroni"), col_line_Bk = 1, col_line_asymptotic = 2, col_line_permutation = 4, warn = dendextend_options("warn"), main = "Bk plot", xlab = "k (number of clusters)", ylab = "Bk (Fowlkes-Mallows Index)", xlim, ylim = c(0, 1), try_cutree_hclust = TRUE, ... )
tree1 |
a dendrogram/hclust/phylo object. |
tree2 |
a dendrogram/hclust/phylo object. |
k |
an integer scalar or vector with the desired number of cluster groups. If missing - the Bk will be calculated for a default k range of 2:(nleaves-1). No point in checking k=1/k=n, since both will give Bk=1. |
add_E |
logical (TRUE). Should we add a line of the Expected Bk value for each k, under the null hypothesis of no relation between the clusterings? |
rejection_line_asymptotic |
logical (TRUE). Should we add a line of the one sided rejection region based on the asymptotic distribution of Bk values, for each k, under the null hypothesis of no relation between the clusterings? |
rejection_line_permutation |
logical (FALSE). Should we add a line of the one sided rejection region based on the asymptotic distribution of Bk values, for each k, under the null hypothesis of no relation between the clusterings? |
R |
integer (Default is 1000). The number of Bk permutation to perform for each k. Applicable only if rejection_line_permutation is TRUE. |
k_permutation |
the k's to be used for permutation (sometimes we might be only interested in some k's and it is not important to run the simulation for all possible ks). If missing - k itself will be used. |
conf.level |
the level of one sided confidence interval used for creation of the rejection lines. |
p.adjust.methods |
a character scalar of either "none" (default), or "bonferroni". This controls the multiple correction method to use for the critical rejection values. Currently only the Bonferroni method is implemented (based on the number of different k values). |
col_line_Bk |
the color of the Bk line. |
col_line_asymptotic |
the color of the rejection asymptotic Bk line. |
col_line_permutation |
the color of the rejection asymptotic Bk line. |
warn |
logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE. If set to TRUE, extra checks are made to varify that the two clusters have the same size and the same labels. |
main |
passed to plot. |
xlab |
passed to plot. |
ylab |
passed to plot. |
xlim |
passed to plot. If missign, xlim is from 2 to nleaves-1 |
ylim |
passed to plot. |
try_cutree_hclust |
logical (TRUE). Since cutree for hclust is MUCH faster than for dendrogram - Bk_plot will first try to change the dendrogram into an hclust object. If it will fail (for example, with unbranched trees), it will continue using the cutree.dendrogram functions. If try_cutree_hclust=FALSE, it will force to use cutree.dendrogram and not cutree.hclust. |
... |
Ignored. |
From Wikipedia:
Fowlkes-Mallows index (see references) is an external evaluation method that is used to determine the similarity between two clusterings (clusters obtained after a clustering algorithm). This measure of similarity could be either between two hierarchical clusterings or a clustering and a benchmark classification. A higher the value for the Fowlkes-Mallows index indicates a greater similarity between the clusters and the benchmark classifications.
The default Bk plot comes with a line with dots (type "b") of the Bk values. Also with a fragmented (lty=2) line (of the same color) of the expected Bk line under H0, And a solid red line of the upper critical Bk values for rejection
After plotting the Bk plot. Returns (invisible) the output of the elements used for constructing the plot: The Bk values, Bk permutations (if used), Bk theoratical values, etc.
Fowlkes, E. B.; Mallows, C. L. (1 September 1983). "A Method for Comparing Two Hierarchical Clusterings". Journal of the American Statistical Association 78 (383): 553.
https://en.wikipedia.org/wiki/Fowlkes-Mallows_index
## Not run: set.seed(23235) ss <- TRUE # sample(1:150, 10 ) hc1 <- hclust(dist(iris[ss, -5]), "com") hc2 <- hclust(dist(iris[ss, -5]), "single") # tree1 <- as.treerogram(hc1) # tree2 <- as.treerogram(hc2) # cutree(tree1) Bk_plot(hc1, hc2, k = 2:20, xlim = c(2, 149)) Bk_plot(hc1, hc2) Bk_plot(hc1, hc2, k = 3) Bk_plot(hc1, hc2, k = 3:10) Bk_plot(hc1, hc2) Bk_plot(hc1, hc2, p.adjust.methods = "bonferroni") # higher rejection lines # this one can take a bit of time: Bk_plot(hc1, hc2, rejection_line_permutation = TRUE, k_permutation = c(2, 4, 6, 8, 10, 20, 30, 40, 50), R = 100 ) # we can see that the permutation line is VERY close to the asymptotic line. # This is great since it means one can often use the asymptotic results # Without having to do many simulations. # works just as well for dendrograms: dend1 <- as.dendrogram(hc1) dend2 <- as.dendrogram(hc2) Bk_plot(dend1, dend2, k = 2:3, try_cutree_hclust = FALSE) # slower than hclust, but works... Bk_plot(hc1, dend2, k = 2:3, try_cutree_hclust = FALSE) # slower than hclust, but works... Bk_plot(dend1, dend1, k = 2:3, try_cutree_hclust = TRUE) # slower than hclust, but works... Bk_plot(hc1, hc1, k = 2:3) # slower than hclust, but works... # for some reason it can't turn dend2 back to hclust :( a <- Bk_plot(hc1, hc2, k = 2:3, try_cutree_hclust = TRUE) # slower than hclust, but works... hc1_mixed <- as.hclust(sample(as.dendrogram(hc1))) Bk_plot( tree1 = hc1, tree2 = hc1_mixed, add_E = FALSE, rejection_line_permutation = TRUE, k_permutation = c(2, 4, 6, 8, 10, 20, 30, 40, 50), R = 100 ) ## End(Not run)
## Not run: set.seed(23235) ss <- TRUE # sample(1:150, 10 ) hc1 <- hclust(dist(iris[ss, -5]), "com") hc2 <- hclust(dist(iris[ss, -5]), "single") # tree1 <- as.treerogram(hc1) # tree2 <- as.treerogram(hc2) # cutree(tree1) Bk_plot(hc1, hc2, k = 2:20, xlim = c(2, 149)) Bk_plot(hc1, hc2) Bk_plot(hc1, hc2, k = 3) Bk_plot(hc1, hc2, k = 3:10) Bk_plot(hc1, hc2) Bk_plot(hc1, hc2, p.adjust.methods = "bonferroni") # higher rejection lines # this one can take a bit of time: Bk_plot(hc1, hc2, rejection_line_permutation = TRUE, k_permutation = c(2, 4, 6, 8, 10, 20, 30, 40, 50), R = 100 ) # we can see that the permutation line is VERY close to the asymptotic line. # This is great since it means one can often use the asymptotic results # Without having to do many simulations. # works just as well for dendrograms: dend1 <- as.dendrogram(hc1) dend2 <- as.dendrogram(hc2) Bk_plot(dend1, dend2, k = 2:3, try_cutree_hclust = FALSE) # slower than hclust, but works... Bk_plot(hc1, dend2, k = 2:3, try_cutree_hclust = FALSE) # slower than hclust, but works... Bk_plot(dend1, dend1, k = 2:3, try_cutree_hclust = TRUE) # slower than hclust, but works... Bk_plot(hc1, hc1, k = 2:3) # slower than hclust, but works... # for some reason it can't turn dend2 back to hclust :( a <- Bk_plot(hc1, hc2, k = 2:3, try_cutree_hclust = TRUE) # slower than hclust, but works... hc1_mixed <- as.hclust(sample(as.dendrogram(hc1))) Bk_plot( tree1 = hc1, tree2 = hc1_mixed, add_E = FALSE, rejection_line_permutation = TRUE, k_permutation = c(2, 4, 6, 8, 10, 20, 30, 40, 50), R = 100 ) ## End(Not run)
The user supplies a dend, a vector of clusters, and what to modify (and how).
And the function returns a dendrogram with branches col/lwd/lty accordingly. (the function assumes unique labels)
branches_attr_by_clusters( dend, clusters, values, attr = c("col", "lwd", "lty"), branches_changed_have_which_labels = c("any", "all"), ... )
branches_attr_by_clusters( dend, clusters, values, attr = c("col", "lwd", "lty"), branches_changed_have_which_labels = c("any", "all"), ... )
dend |
a dendrogram dend |
clusters |
an integer vector of clusters. This HAS to be of the same length as the number of leaves. Items that belong to no cluster should get the value 0. The vector should be of the same order as that of the labels in the dendrogram. If you create the clusters from something like cutree you would first need to use order.dendrogram on it, before using it in the function. |
values |
the attributes to use for non 0 values. This should be of the same length as the number of unique non-0 clusters. If it is shorter, it is recycled. OR, this can also be of the same length as the number of leaves in the tree In which case, the values will be aggreagted (i.e.: tapply), to match the number of clusters. The first value of each cluster will be used as the main value. TODO: So far, the function doesn't deal well with NA values. (this might be changed in the future) |
attr |
a character with one of the following values: col/lwd/lty |
branches_changed_have_which_labels |
character with either "any" (default) or "all". Inidicates how the branches should be updated. |
... |
ignored. |
This is probably NOT a very fast implementation of the function, but it works.
This function was designed to enable the manipulation (mainly coloring) of branches, based on the results from the cutreeDynamic function.
A dendrogram with modified branches (col/lwd/lty).
branches_attr_by_labels, get_leaves_attr, nnodes, nleaves cutreeDynamic, plotDendroAndColors
## Not run: ### Getting the hc object iris_dist <- iris[, -5] %>% dist() hc <- iris_dist %>% hclust() # This is how it looks without any colors: dend <- as.dendrogram(hc) plot(dend) # Both functions give the same outcome # options 1: dend %>% set("branches_k_color", k = 4) %>% plot() # options 2: clusters <- cutree(dend, 4)[order.dendrogram(dend)] dend %>% branches_attr_by_clusters(clusters) %>% plot() # and the second option is much slower: system.time(set(dend, "branches_k_color", k = 4)) # 0.26 sec system.time(branches_attr_by_clusters(dend, clusters)) # 1.61 sec # BUT, it also allows us to do more flaxible things! #-------------------------- # Plotting dynamicTreeCut #-------------------------- # let's get the clusters library(dynamicTreeCut) clusters <- cutreeDynamic(hc, distM = as.matrix(iris_dist)) # we need to sort them to the order of the dendrogram: clusters <- clusters[order.dendrogram(dend)] # get some functions: library(colorspace) no0_unique <- function(x) { u_x <- unique(x) u_x[u_x != 0] } clusters_numbers <- no0_unique(clusters) n_clusters <- length(clusters_numbers) cols <- rainbow_hcl(n_clusters) dend2 <- branches_attr_by_clusters(dend, clusters, values = cols) # dend2 <- branches_attr_by_clusters(dend, clusters) plot(dend2) # add colored bars: ord_cols <- rainbow_hcl(n_clusters)[order(clusters_numbers)] tmp_cols <- rep(1, length(clusters)) tmp_cols[clusters != 0] <- ord_cols[clusters != 0][clusters] colored_bars(tmp_cols, y_shift = -1.1, rowLabels = "") # all of the ordering is to handle the fact that the cluster numbers are not ascending... # How is this compared with the usual cutree? dend3 <- color_branches(dend, k = n_clusters) labels(dend2) <- as.character(labels(dend2)) # this needs fixing, since the labels are not character! # Well, both cluster solutions are not perfect, but at least they are interesting... tanglegram(dend2, dend3, main_left = "cutreeDynamic", main_right = "cutree", columns_width = c(5, .5, 5), color_lines = cols[iris[order.dendrogram(dend2), 5]] ) # (Notice how the color_lines is of the true Species of each Iris) # The main difference is at the bottom, ## End(Not run)
## Not run: ### Getting the hc object iris_dist <- iris[, -5] %>% dist() hc <- iris_dist %>% hclust() # This is how it looks without any colors: dend <- as.dendrogram(hc) plot(dend) # Both functions give the same outcome # options 1: dend %>% set("branches_k_color", k = 4) %>% plot() # options 2: clusters <- cutree(dend, 4)[order.dendrogram(dend)] dend %>% branches_attr_by_clusters(clusters) %>% plot() # and the second option is much slower: system.time(set(dend, "branches_k_color", k = 4)) # 0.26 sec system.time(branches_attr_by_clusters(dend, clusters)) # 1.61 sec # BUT, it also allows us to do more flaxible things! #-------------------------- # Plotting dynamicTreeCut #-------------------------- # let's get the clusters library(dynamicTreeCut) clusters <- cutreeDynamic(hc, distM = as.matrix(iris_dist)) # we need to sort them to the order of the dendrogram: clusters <- clusters[order.dendrogram(dend)] # get some functions: library(colorspace) no0_unique <- function(x) { u_x <- unique(x) u_x[u_x != 0] } clusters_numbers <- no0_unique(clusters) n_clusters <- length(clusters_numbers) cols <- rainbow_hcl(n_clusters) dend2 <- branches_attr_by_clusters(dend, clusters, values = cols) # dend2 <- branches_attr_by_clusters(dend, clusters) plot(dend2) # add colored bars: ord_cols <- rainbow_hcl(n_clusters)[order(clusters_numbers)] tmp_cols <- rep(1, length(clusters)) tmp_cols[clusters != 0] <- ord_cols[clusters != 0][clusters] colored_bars(tmp_cols, y_shift = -1.1, rowLabels = "") # all of the ordering is to handle the fact that the cluster numbers are not ascending... # How is this compared with the usual cutree? dend3 <- color_branches(dend, k = n_clusters) labels(dend2) <- as.character(labels(dend2)) # this needs fixing, since the labels are not character! # Well, both cluster solutions are not perfect, but at least they are interesting... tanglegram(dend2, dend3, main_left = "cutreeDynamic", main_right = "cutree", columns_width = c(5, .5, 5), color_lines = cols[iris[order.dendrogram(dend2), 5]] ) # (Notice how the color_lines is of the true Species of each Iris) # The main difference is at the bottom, ## End(Not run)
The user supplies a dend, labels, and type of condition (all/any), and TF_values And the function returns a dendrogram with branches col/lwd/lty accordingly
branches_attr_by_labels( dend, labels, TF_values = c(2, Inf), attr = c("col", "lwd", "lty"), type = c("all", "any"), ... )
branches_attr_by_labels( dend, labels, TF_values = c(2, Inf), attr = c("col", "lwd", "lty"), type = c("all", "any"), ... )
dend |
a dendrogram dend |
labels |
a character vector of labels from the tree |
TF_values |
a two dimensional vector with the TF_values to use in case a branch fulfills the condition (TRUE) and in the case that it does not (FALSE). Defaults are 2/Inf for col, lwd and lty. (so it will insert the first value, and will not change all the FALSE cases) |
attr |
a character with one of the following values: col/lwd/lty |
type |
a character vector of either "all" or "any", indicating which of the branches should be painted: ones that all of their labels belong to the supplied labels, or also ones that even some of their labels are included in the labels vector. |
... |
ignored. |
A dendrogram with modified branches (col/lwd/lty).
noded_with_condition, get_leaves_attr, nnodes, nleaves
## Not run: library(dendextend) set.seed(23235) ss <- sample(1:150, 10) # Getting the dend dend dend <- iris[ss, -5] %>% dist() %>% hclust() %>% as.dendrogram() dend %>% plot() dend %>% branches_attr_by_labels(c("123", "126", "23", "29")) %>% plot() dend %>% branches_attr_by_labels(c("123", "126", "23", "29"), "all") %>% plot() # the same as above dend %>% branches_attr_by_labels(c("123", "126", "23", "29"), "any") %>% plot() dend %>% branches_attr_by_labels( c("123", "126", "23", "29"), "any", "col", c("blue", "red") ) %>% plot() dend %>% branches_attr_by_labels( c("123", "126", "23", "29"), "any", "lwd", c(4, 1) ) %>% plot() dend %>% branches_attr_by_labels( c("123", "126", "23", "29"), "any", "lty", c(2, 1) ) %>% plot() ## End(Not run)
## Not run: library(dendextend) set.seed(23235) ss <- sample(1:150, 10) # Getting the dend dend dend <- iris[ss, -5] %>% dist() %>% hclust() %>% as.dendrogram() dend %>% plot() dend %>% branches_attr_by_labels(c("123", "126", "23", "29")) %>% plot() dend %>% branches_attr_by_labels(c("123", "126", "23", "29"), "all") %>% plot() # the same as above dend %>% branches_attr_by_labels(c("123", "126", "23", "29"), "any") %>% plot() dend %>% branches_attr_by_labels( c("123", "126", "23", "29"), "any", "col", c("blue", "red") ) %>% plot() dend %>% branches_attr_by_labels( c("123", "126", "23", "29"), "any", "lwd", c(4, 1) ) %>% plot() dend %>% branches_attr_by_labels( c("123", "126", "23", "29"), "any", "lty", c(2, 1) ) %>% plot() ## End(Not run)
The user supplies a dend, lists, and type of condition (all/any), and TF_values And the function returns a dendrogram with branches col/lwd/lty accordingly
branches_attr_by_lists( dend, lists, TF_values = c(2, 1), attr = c("col", "lwd", "lty"), ... )
branches_attr_by_lists( dend, lists, TF_values = c(2, 1), attr = c("col", "lwd", "lty"), ... )
dend |
a dendrogram dend |
lists |
a list where each element contains the labels of members in selected nodes down to which the branches shall be adapted |
TF_values |
a two dimensional vector with the TF_values to use in case a branch fulfills the condition (TRUE) and in the case that it does not (FALSE). Defaults are 2/1 for col, lwd and lty. (so it will insert the first value, and will not change all the FALSE cases) |
attr |
a character with one of the following values: col/lwd/lty |
... |
ignored. |
A dendrogram with modified branches (col/lwd/lty).
## Not run: library(dendextend) set.seed(23235) ss <- sample(1:150, 10) # Getting the dend dend dend <- iris[ss, -5] %>% dist() %>% hclust() %>% as.dendrogram() dend %>% plot() # define a list of nodes L <- list(c("109", "123", "126", "145"), "29", c("59", "67", "97")) dend %>% branches_attr_by_lists(L) %>% plot() # choose different color, and also change lwd and lty dend %>% branches_attr_by_lists(L, TF_value = "blue") %>% branches_attr_by_lists(L, attr = "lwd", TF_value = 4) %>% branches_attr_by_lists(L, attr = "lty", TF_value = 3) %>% plot() ## End(Not run)
## Not run: library(dendextend) set.seed(23235) ss <- sample(1:150, 10) # Getting the dend dend dend <- iris[ss, -5] %>% dist() %>% hclust() %>% as.dendrogram() dend %>% plot() # define a list of nodes L <- list(c("109", "123", "126", "145"), "29", c("59", "67", "97")) dend %>% branches_attr_by_lists(L) %>% plot() # choose different color, and also change lwd and lty dend %>% branches_attr_by_lists(L, TF_value = "blue") %>% branches_attr_by_lists(L, attr = "lwd", TF_value = 4) %>% branches_attr_by_lists(L, attr = "lty", TF_value = 3) %>% plot() ## End(Not run)
Plot a circlized dendrograms using the circlize package (must be installed for the function to work).
This type of plot is also sometimes called fan tree plot (although the name fan-plot is also used for a different plot in time series analysis), radial tree plot, polar tree plot, circular tree plot, and probably other names as well.
An advantage for using the circlize package directly is for plotting a circular dendrogram so that you can add more graphics for the elements in the tree just by adding more tracks using circos.track.
circlize_dendrogram( dend, facing = c("outside", "inside"), labels = TRUE, labels_track_height = 0.1, dend_track_height = 0.5, ... )
circlize_dendrogram( dend, facing = c("outside", "inside"), labels = TRUE, labels_track_height = 0.1, dend_track_height = 0.5, ... )
dend |
a dendrogram object |
facing |
Is the dendromgrams facing inside to the circle or outside. |
labels |
logical (TRUE) - should the labels be plotted as well. |
labels_track_height |
a value for adjusting the room for the labels. It is 0.2 by default, but if NULL or NA, it will adjust automatically based on the max width of the labels. However, if this is too long, the plot will give an error: Error in check.track.position(track.index, track.start, track.height) : not enough space for cells at track index '2'. |
dend_track_height |
a value for adjusting the room for the dendrogram. |
... |
Ignored. |
The dend that was used for plotting.
Zuguang Gu, Tal Galili
This code is based on the work of Zuguang Gu. If you use the function, please cite both
dendextend (see: citation("dendextend")
), as well as the circlize package (see: citation("circlize")
).
## Not run: dend <- iris[1:40, -5] %>% dist() %>% hclust() %>% as.dendrogram() %>% set("branches_k_color", k = 3) %>% set("branches_lwd", c(5, 2, 1.5)) %>% set("branches_lty", c(1, 1, 3, 1, 1, 2)) %>% set("labels_colors") %>% set("labels_cex", c(.9, 1.2)) %>% set("nodes_pch", 19) %>% set("nodes_col", c("orange", "black", "plum", NA)) circlize_dendrogram(dend) circlize_dendrogram(dend, labels = FALSE) circlize_dendrogram(dend, facing = "inside", labels = FALSE) # In the following we get the dendrogram but can also get extra information on top of it circos.initialize("foo", xlim = c(0, 40)) circos.track(ylim = c(0, 1), panel.fun = function(x, y) { circos.rect(1:40 - 0.8, rep(0, 40), 1:40 - 0.2, runif(40), col = rand_color(40), border = NA) }, bg.border = NA) circos.track(ylim = c(0, 1), panel.fun = function(x, y) { circos.text(1:40 - 0.5, rep(0, 40), labels(dend), col = labels_colors(dend), facing = "clockwise", niceFacing = TRUE, adj = c(0, 0.5) ) }, bg.border = NA, track.height = 0.1) max_height <- attr(dend, "height") circos.track(ylim = c(0, max_height), panel.fun = function(x, y) { circos.dendrogram(dend, max_height = max_height) }, track.height = 0.5, bg.border = NA) circos.clear() ## End(Not run)
## Not run: dend <- iris[1:40, -5] %>% dist() %>% hclust() %>% as.dendrogram() %>% set("branches_k_color", k = 3) %>% set("branches_lwd", c(5, 2, 1.5)) %>% set("branches_lty", c(1, 1, 3, 1, 1, 2)) %>% set("labels_colors") %>% set("labels_cex", c(.9, 1.2)) %>% set("nodes_pch", 19) %>% set("nodes_col", c("orange", "black", "plum", NA)) circlize_dendrogram(dend) circlize_dendrogram(dend, labels = FALSE) circlize_dendrogram(dend, facing = "inside", labels = FALSE) # In the following we get the dendrogram but can also get extra information on top of it circos.initialize("foo", xlim = c(0, 40)) circos.track(ylim = c(0, 1), panel.fun = function(x, y) { circos.rect(1:40 - 0.8, rep(0, 40), 1:40 - 0.2, runif(40), col = rand_color(40), border = NA) }, bg.border = NA) circos.track(ylim = c(0, 1), panel.fun = function(x, y) { circos.text(1:40 - 0.5, rep(0, 40), labels(dend), col = labels_colors(dend), facing = "clockwise", niceFacing = TRUE, adj = c(0, 0.5) ) }, bg.border = NA, track.height = 0.1) max_height <- attr(dend, "height") circos.track(ylim = c(0, max_height), panel.fun = function(x, y) { circos.dendrogram(dend, max_height = max_height) }, track.height = 0.5, bg.border = NA) circos.clear() ## End(Not run)
Lets te user click a plot of dendrogram and rotates the tree based on the location of the click.
Code for mouse selection of (sub-)cluster to be rotated
click_rotate(x, ...) ## Default S3 method: click_rotate(x, ...) ## S3 method for class 'dendrogram' click_rotate( x, plot = TRUE, plot_after = plot, horiz = FALSE, continue = FALSE, ... )
click_rotate(x, ...) ## Default S3 method: click_rotate(x, ...) ## S3 method for class 'dendrogram' click_rotate( x, plot = TRUE, plot_after = plot, horiz = FALSE, continue = FALSE, ... )
x |
a tree object (either a |
... |
parameters passed to the plot |
plot |
(logical) should the dendrogram first be plotted. |
plot_after |
(logical) should the dendrogram be plotted after the rotation? |
horiz |
logical. Should the plot be normal or horizontal? |
continue |
logical. If TRUE, allows the user to keep clicking the plot until a click is made on the labels. |
A rotated tree object
Andrej-Nikolai Spiess, Tal Galili
# create the dend: dend <- USArrests %>% dist() %>% hclust("ave") %>% as.dendrogram() %>% color_labels() ## Not run: # play with the rotation once dend <- click_rotate(dend) dend <- click_rotate(dend, horiz = TRUE) # keep playing with the rotation: while (TRUE) dend <- click_rotate(dend) # the same as dend <- click_rotate(dend, continue = TRUE) ## End(Not run)
# create the dend: dend <- USArrests %>% dist() %>% hclust("ave") %>% as.dendrogram() %>% color_labels() ## Not run: # play with the rotation once dend <- click_rotate(dend) dend <- click_rotate(dend, horiz = TRUE) # keep playing with the rotation: while (TRUE) dend <- click_rotate(dend) # the same as dend <- click_rotate(dend, continue = TRUE) ## End(Not run)
Collapse branches under a tolerance level
collapse_branch(dend, tol = 1e-08, lower = TRUE, ...)
collapse_branch(dend, tol = 1e-08, lower = TRUE, ...)
dend |
dendrogram object |
tol |
a numeric value giving the tolerance to consider a branch length significantly greater than zero |
lower |
logical (TRUE). collapse branches which are lower than tol? |
... |
passed on (not used) |
A dendrogram with both of the root's branches of the same height
# # ladderize is like sort(..., type = "node") dend <- iris[1:5, -5] %>% dist() %>% hclust() %>% as.dendrogram() par(mfrow = c(1, 3)) dend %>% ladderize() %>% plot(horiz = TRUE) abline(v = .2, col = 2, lty = 2) dend %>% collapse_branch(tol = 0.2) %>% ladderize() %>% plot(horiz = TRUE) dend %>% collapse_branch(tol = 0.2) %>% ladderize() %>% hang.dendrogram(hang = 0) %>% plot(horiz = TRUE) par(mfrow = c(1, 2)) dend %>% collapse_branch(tol = 0.2, lower = FALSE) %>% plot(horiz = TRUE, main = "dendrogram") library(ape) dend %>% as.phylo() %>% di2multi(tol = 0.2) %>% plot(main = "phylo")
# # ladderize is like sort(..., type = "node") dend <- iris[1:5, -5] %>% dist() %>% hclust() %>% as.dendrogram() par(mfrow = c(1, 3)) dend %>% ladderize() %>% plot(horiz = TRUE) abline(v = .2, col = 2, lty = 2) dend %>% collapse_branch(tol = 0.2) %>% ladderize() %>% plot(horiz = TRUE) dend %>% collapse_branch(tol = 0.2) %>% ladderize() %>% hang.dendrogram(hang = 0) %>% plot(horiz = TRUE) par(mfrow = c(1, 2)) dend %>% collapse_branch(tol = 0.2, lower = FALSE) %>% plot(horiz = TRUE, main = "dendrogram") library(ape) dend %>% as.phylo() %>% di2multi(tol = 0.2) %>% plot(main = "phylo")
Given a dendrogram object, and a set of labels that are in the same sub-dendrogram, the function performs a recursive DFS algorithm to determine the sub-dendrogram which is composed of (exactly) all 'selected_labels'. It then squashes this sub-dendrogram, and returns the original dendrogram with the squashed dendrogram with it.
collapse_labels(dend, selected_labels, ...)
collapse_labels(dend, selected_labels, ...)
dend |
a dendrogram object |
selected_labels |
A character vector with the labels we expect to have in the sub-dendrogram. This doesn't have to be in the same order as in the dendrogram. |
... |
elipsis (passed to squash_dendrogram) |
Either the original dend. Or, if the labels properly are in the dend by each other, a dend with a squashed sub-dendrogram inside it.
library("dendextend") set.seed(23235) ss <- sample(1:150, 5) # Getting the dend object dend25 <- iris[ss, -5] %>% dist() %>% hclust() %>% as.dendrogram() %>% set("labels", letters[1:5]) par(mfrow = c(1,4)) plot(dend25) plot(collapse_labels(dend25, c("d", "e"))) plot(collapse_labels(dend25, c("c", "d", "e"))) plot(collapse_labels(dend25, c("c", "d", "e"), squashed_original_height=TRUE))
library("dendextend") set.seed(23235) ss <- sample(1:150, 5) # Getting the dend object dend25 <- iris[ss, -5] %>% dist() %>% hclust() %>% as.dendrogram() %>% set("labels", letters[1:5]) par(mfrow = c(1,4)) plot(dend25) plot(collapse_labels(dend25, c("d", "e"))) plot(collapse_labels(dend25, c("c", "d", "e"))) plot(collapse_labels(dend25, c("c", "d", "e"), squashed_original_height=TRUE))
This function is for dendrogram and hclust objects.
This function colors both the terminal leaves of a dend's cluster and the edges
leading to those leaves. The edgePar attribute of nodes will be augmented by
a new list item col.
The groups will be defined by a call to cutree
using the k or h parameters.
If col is a color vector with a different length than the number of clusters (k) - then a recycled color vector will be used.
color_branches( dend, k = NULL, h = NULL, col, groupLabels = NULL, clusters, warn = dendextend_options("warn"), ... )
color_branches( dend, k = NULL, h = NULL, col, groupLabels = NULL, clusters, warn = dendextend_options("warn"), ... )
dend |
A |
k |
number of groups (passed to |
h |
height at which to cut tree (passed to |
col |
Function or vector of Colors. By default it tries to use
rainbow_hcl from the |
groupLabels |
If TRUE add numeric group label - see Details for options |
clusters |
an integer vector of clusters. This is passed to branches_attr_by_clusters. This HAS to be of the same length as the number of leaves. Items that belong to no cluster should get the value 0. The vector should be of the same order as that of the labels in the dendrogram. If you create the clusters from something like cutree you would first need to use order.dendrogram on it, before using it in the function. |
warn |
logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE. |
... |
ignored. |
If groupLabels=TRUE
then numeric group labels will be added
to each cluster. If a vector is supplied then these entries will be used as
the group labels. If a function is supplied then it will be passed a
numeric vector of groups (e.g. 1:5) and must return the formatted group
labels.
If the labels of the dendrogram are NOT character (but, for example integers) - they are coerced into character. This step is essential for the proper operation of the function. A dendrogram labels might happen to be integers if they are based on an hclust performed on a dist of an object without rownames.
a tree object of class dendrogram.
Tal Galili, extensively based on code by Gregory Jefferis
This function is a derived work from the color_clusters
function, with some ideas from the slice
function -
both are from the dendroextras package by jefferis.
It extends it by using cutree.dendrogram - allowing the function to work for trees that hclust can not handle (unbranched and non-ultrametric trees). Also, it allows REPEATED cluster color assignments to branches on to the same tree. Something which the original function was not able to handle.
cutree
,dendrogram
,
hclust
, labels_colors
,
branches_attr_by_clusters
, get_leaves_branches_col,
color_labels
## Not run: par(mfrow = c(1, 2)) dend <- USArrests %>% dist() %>% hclust(method = "ave") %>% as.dendrogram() d1 <- color_branches(dend, k = 5, col = c(3, 1, 1, 4, 1)) plot(d1) # selective coloring of branches :) d2 <- color_branches(dend, 5) plot(d2) par(mfrow = c(1, 2)) d1 <- color_branches(dend, 5, col = c(3, 1, 1, 4, 1), groupLabels = TRUE) plot(d1) # selective coloring of branches :) d2 <- color_branches(dend, 5, groupLabels = TRUE) plot(d2) par(mfrow = c(1, 3)) d5 <- color_branches(dend, 5) plot(d5) d5g <- color_branches(dend, 5, groupLabels = TRUE) plot(d5g) d5gr <- color_branches(dend, 5, groupLabels = as.roman) plot(d5gr) par(mfrow = c(1, 1)) # messy - but interesting: dend_override <- color_branches(dend, 2, groupLabels = as.roman) dend_override <- color_branches(dend_override, 4, groupLabels = as.roman) dend_override <- color_branches(dend_override, 7, groupLabels = as.roman) plot(dend_override) d5 <- color_branches(dend = dend[[1]], k = 5) library(dendextend) data(iris, envir = environment()) d_iris <- dist(iris[, -5]) hc_iris <- hclust(d_iris) dend_iris <- as.dendrogram(hc_iris) dend_iris <- color_branches(dend_iris, k = 3) library(colorspace) labels_colors(dend_iris) <- rainbow_hcl(3)[sort_levels_values( as.numeric(iris[, 5])[order.dendrogram(dend_iris)] )] plot(dend_iris, main = "Clustered Iris dataset", sub = "labels are colored based on the true cluster" ) # cutree(dend_iris,k=3, order_clusters_as_data=FALSE, # try_cutree_hclust=FALSE) # cutree(dend_iris,k=3, order_clusters_as_data=FALSE) library(colorspace) data(iris, envir = environment()) d_iris <- dist(iris[, -5]) hc_iris <- hclust(d_iris) labels(hc_iris) # no labels, because "iris" has no row names dend_iris <- as.dendrogram(hc_iris) is.integer(labels(dend_iris)) # this could cause problems... iris_species <- rev(levels(iris[, 5])) dend_iris <- color_branches(dend_iris, k = 3, groupLabels = iris_species) is.character(labels(dend_iris)) # labels are no longer "integer" # have the labels match the real classification of the flowers: labels_colors(dend_iris) <- rainbow_hcl(3)[sort_levels_values( as.numeric(iris[, 5])[order.dendrogram(dend_iris)] )] # We'll add the flower type labels(dend_iris) <- paste(as.character(iris[, 5])[order.dendrogram(dend_iris)], "(", labels(dend_iris), ")", sep = "" ) dend_iris <- hang.dendrogram(dend_iris, hang_height = 0.1) # reduce the size of the labels: dend_iris <- assign_values_to_leaves_nodePar(dend_iris, 0.5, "lab.cex") par(mar = c(3, 3, 3, 7)) plot(dend_iris, main = "Clustered Iris dataset (the labels give the true flower species)", horiz = TRUE, nodePar = list(cex = .007) ) legend("topleft", legend = iris_species, fill = rainbow_hcl(3)) a <- dend_iris[[1]] dend_iris1 <- color_branches(a, k = 3) plot(dend_iris1) # str(dendrapply(d2, unclass)) # unclass(d1) c(1:5) %>% # take some data dist() %>% # calculate a distance matrix, # on it compute hierarchical clustering using the "average" method, hclust(method = "single") %>% as.dendrogram() %>% color_branches(k = 3) %>% plot() # nice, returns the tree as is... # Example of the "clusters" parameter par(mfrow = c(1, 2)) dend <- c(1:5) %>% dist() %>% hclust() %>% as.dendrogram() dend %>% color_branches(k = 3) %>% plot() dend %>% color_branches(clusters = c(1, 1, 2, 2, 3)) %>% plot() # another example, based on the question here: # https://stackoverflow.com/q/45432271/256662 library(cluster) set.seed(999) iris2 <- iris[sample(x = 1:150, size = 50, replace = F), ] clust <- diana(iris2) dend <- as.dendrogram(clust) temp_col <- c("red", "blue", "green")[as.numeric(iris2$Species)] temp_col <- temp_col[order.dendrogram(dend)] temp_col <- factor(temp_col, unique(temp_col)) library(dendextend) dend %>% color_branches(clusters = as.numeric(temp_col), col = levels(temp_col)) %>% set("labels_colors", as.character(temp_col)) %>% plot() ## End(Not run)
## Not run: par(mfrow = c(1, 2)) dend <- USArrests %>% dist() %>% hclust(method = "ave") %>% as.dendrogram() d1 <- color_branches(dend, k = 5, col = c(3, 1, 1, 4, 1)) plot(d1) # selective coloring of branches :) d2 <- color_branches(dend, 5) plot(d2) par(mfrow = c(1, 2)) d1 <- color_branches(dend, 5, col = c(3, 1, 1, 4, 1), groupLabels = TRUE) plot(d1) # selective coloring of branches :) d2 <- color_branches(dend, 5, groupLabels = TRUE) plot(d2) par(mfrow = c(1, 3)) d5 <- color_branches(dend, 5) plot(d5) d5g <- color_branches(dend, 5, groupLabels = TRUE) plot(d5g) d5gr <- color_branches(dend, 5, groupLabels = as.roman) plot(d5gr) par(mfrow = c(1, 1)) # messy - but interesting: dend_override <- color_branches(dend, 2, groupLabels = as.roman) dend_override <- color_branches(dend_override, 4, groupLabels = as.roman) dend_override <- color_branches(dend_override, 7, groupLabels = as.roman) plot(dend_override) d5 <- color_branches(dend = dend[[1]], k = 5) library(dendextend) data(iris, envir = environment()) d_iris <- dist(iris[, -5]) hc_iris <- hclust(d_iris) dend_iris <- as.dendrogram(hc_iris) dend_iris <- color_branches(dend_iris, k = 3) library(colorspace) labels_colors(dend_iris) <- rainbow_hcl(3)[sort_levels_values( as.numeric(iris[, 5])[order.dendrogram(dend_iris)] )] plot(dend_iris, main = "Clustered Iris dataset", sub = "labels are colored based on the true cluster" ) # cutree(dend_iris,k=3, order_clusters_as_data=FALSE, # try_cutree_hclust=FALSE) # cutree(dend_iris,k=3, order_clusters_as_data=FALSE) library(colorspace) data(iris, envir = environment()) d_iris <- dist(iris[, -5]) hc_iris <- hclust(d_iris) labels(hc_iris) # no labels, because "iris" has no row names dend_iris <- as.dendrogram(hc_iris) is.integer(labels(dend_iris)) # this could cause problems... iris_species <- rev(levels(iris[, 5])) dend_iris <- color_branches(dend_iris, k = 3, groupLabels = iris_species) is.character(labels(dend_iris)) # labels are no longer "integer" # have the labels match the real classification of the flowers: labels_colors(dend_iris) <- rainbow_hcl(3)[sort_levels_values( as.numeric(iris[, 5])[order.dendrogram(dend_iris)] )] # We'll add the flower type labels(dend_iris) <- paste(as.character(iris[, 5])[order.dendrogram(dend_iris)], "(", labels(dend_iris), ")", sep = "" ) dend_iris <- hang.dendrogram(dend_iris, hang_height = 0.1) # reduce the size of the labels: dend_iris <- assign_values_to_leaves_nodePar(dend_iris, 0.5, "lab.cex") par(mar = c(3, 3, 3, 7)) plot(dend_iris, main = "Clustered Iris dataset (the labels give the true flower species)", horiz = TRUE, nodePar = list(cex = .007) ) legend("topleft", legend = iris_species, fill = rainbow_hcl(3)) a <- dend_iris[[1]] dend_iris1 <- color_branches(a, k = 3) plot(dend_iris1) # str(dendrapply(d2, unclass)) # unclass(d1) c(1:5) %>% # take some data dist() %>% # calculate a distance matrix, # on it compute hierarchical clustering using the "average" method, hclust(method = "single") %>% as.dendrogram() %>% color_branches(k = 3) %>% plot() # nice, returns the tree as is... # Example of the "clusters" parameter par(mfrow = c(1, 2)) dend <- c(1:5) %>% dist() %>% hclust() %>% as.dendrogram() dend %>% color_branches(k = 3) %>% plot() dend %>% color_branches(clusters = c(1, 1, 2, 2, 3)) %>% plot() # another example, based on the question here: # https://stackoverflow.com/q/45432271/256662 library(cluster) set.seed(999) iris2 <- iris[sample(x = 1:150, size = 50, replace = F), ] clust <- diana(iris2) dend <- as.dendrogram(clust) temp_col <- c("red", "blue", "green")[as.numeric(iris2$Species)] temp_col <- temp_col[order.dendrogram(dend)] temp_col <- factor(temp_col, unique(temp_col)) library(dendextend) dend %>% color_branches(clusters = as.numeric(temp_col), col = levels(temp_col)) %>% set("labels_colors", as.character(temp_col)) %>% plot() ## End(Not run)
This function is for dendrogram and hclust objects. This function colors tree's labels.
The groups will be defined by a call to cutree
using the k or h parameters.
If col is a color vector with a different length than the number of clusters (k) - then a recycled color vector will be used.
color_labels( dend, k = NULL, h = NULL, labels, col, warn = dendextend_options("warn"), ... )
color_labels( dend, k = NULL, h = NULL, labels, col, warn = dendextend_options("warn"), ... )
dend |
A |
k |
number of groups (passed to |
h |
height at which to cut tree (passed to |
labels |
character vecotor. If not missing, it overrides k and h, and simply colors these labels in the tree based on "col" parameter. |
col |
Function or vector of Colors. By default it tries to use
rainbow_hcl from the |
warn |
logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE. (in case h/k/labels are not supplied, or if col is too short) |
... |
ignored. |
a tree object of class dendrogram.
This function is in the style of color_branches
, and
based on labels_colors
.
cutree
,dendrogram
,
hclust
, labels_colors
, color_branches
,
assign_values_to_leaves_edgePar
## Not run: hc <- hclust(dist(USArrests), "ave") dend <- as.dendrogram(hc) dend <- color_labels(dend, 5, col = c(3, 1, 1, 4, 1)) dend <- color_branches(dend, 5, col = c(3, 1, 1, 4, 1)) plot(dend) # selective coloring of branches AND labels :) # coloring some labels, based on label names: dend <- color_labels(dend, col = "red", labels = labels(dend)[c(4, 16)]) plot(dend) # selective coloring of branches AND labels :) d5 <- color_branches(dend, 5) plot(d5) d5g <- color_branches(dend, 5, groupLabels = TRUE) plot(d5g) d5gr <- color_branches(dend, 5, groupLabels = as.roman) plot(d5gr) ## End(Not run)
## Not run: hc <- hclust(dist(USArrests), "ave") dend <- as.dendrogram(hc) dend <- color_labels(dend, 5, col = c(3, 1, 1, 4, 1)) dend <- color_branches(dend, 5, col = c(3, 1, 1, 4, 1)) plot(dend) # selective coloring of branches AND labels :) # coloring some labels, based on label names: dend <- color_labels(dend, col = "red", labels = labels(dend)[c(4, 16)]) plot(dend) # selective coloring of branches AND labels :) d5 <- color_branches(dend, 5) plot(d5) d5g <- color_branches(dend, 5, groupLabels = TRUE) plot(d5g) d5gr <- color_branches(dend, 5, groupLabels = as.roman) plot(d5gr) ## End(Not run)
Color unique labels in a dendrogram
color_unique_labels(dend, ...)
color_unique_labels(dend, ...)
dend |
a dend object |
... |
NOT USED |
A dendrogram after the colors of its labels have been updated (a different color for each unique label).
x <- c(2011, 2011, 2012, 2012, 2015, 2015, 2015) names(x) <- x dend <- as.dendrogram(hclust(dist(x))) par(mfrow = c(1, 2)) plot(dend) dend2 <- color_unique_labels(dend) plot(dend2)
x <- c(2011, 2011, 2012, 2012, 2015, 2015, 2015) names(x) <- x dend <- as.dendrogram(hclust(dist(x))) par(mfrow = c(1, 2)) plot(dend) dend2 <- color_unique_labels(dend) plot(dend2)
Add colored bars to a dendrogram, usually corresponding to either clusters or some outside categorization.
colored_bars( colors, dend, rowLabels = NULL, cex.rowLabels = 0.9, add = TRUE, y_scale, y_shift, text_shift = 1, sort_by_labels_order = TRUE, horiz = FALSE, ... )
colored_bars( colors, dend, rowLabels = NULL, cex.rowLabels = 0.9, add = TRUE, y_scale, y_shift, text_shift = 1, sort_by_labels_order = TRUE, horiz = FALSE, ... )
colors |
Coloring of objects on the dendrogram. Either a vector (one color per object) or a matrix (can also be an array or a data frame) with each column giving one group with color per object. Each column will be plotted as a horizontal row of colors (when horiz = FALSE) under the dendrogram. As long as the sort_by_labels_order paramter is TRUE (default), the colors vector/matrix should be provided in the order of the original data order (and it will be re-ordered automaticall to the order of the dendrogram) |
dend |
a dendrogram object. If missing, the colors are plotted without and re-ordering (this assumes that the colors are already ordered based on the dend's labels) This is also important in order to get the correct height/location of the colored bars (i.e.: adjusting the y_scale and y_shift) |
rowLabels |
Labels for the colorings given in |
cex.rowLabels |
Font size scale factor for the row labels. See |
add |
logical(TRUE), should the colored bars be added to an existing dendrogram plot? |
y_scale |
how much should the bars be stretched on the y axis? If no dend is supplied - the default will be 1 |
y_shift |
where should the bars be plotted underneath the x axis? By default it will try to locate the bars underneath the labels (it may miss, in which case you would need to enter a number manually) If no dend is supplied - the default will be 0 |
text_shift |
a dendrogram object |
sort_by_labels_order |
logical(TRUE) - if TRUE (default), then the order of the colored bars will be sorted based on the order needed to change the original order of the observations to the current order of the labels in the dendrogram. If FALSE the colored bars are plotted as-is, based on the order of the colors vector. |
horiz |
logical (FALSE by default). Set to TRUE when using plot(dend, horiz = TRUE) |
... |
ignored at this point. |
You will often needs to adjust the y_scale, y_shift and the text_shift parameters, in order to get the bars in the location you would want.
(this can probably be done automatically, but will require more work. since it has to do with the current mar settings, the number of groups, and each computer's specific graphic device. patches for smarter defaults will be appreciated)
An invisible vector/matrix with the ordered colors.
Steve Horvath [email protected], Peter Langfelder [email protected], Tal Galili [email protected]
This function is based on the plotHclustColors from the moduleColor R package. It was modified so that it would work with dendrograms (and not just hclust objects), as well allow to add the colored bars on top of an existing plot (and not only as a seperate plot).
See: https://cran.r-project.org/package=moduleColor For more details.
branches_attr_by_clusters, plotDendroAndColors
rows_picking <- c(1:5, 25:30) dend <- (iris[rows_picking, -5] * 10) %>% dist() %>% hclust() %>% as.dendrogram() odd_numbers <- rows_picking %% 2 cols <- c("gold", "grey")[odd_numbers + 1] # scale is off plot(dend) colored_bars(cols, dend) # move and scale a bit plot(dend) colored_bars(cols, dend, y_shift = -1, rowLabels = "Odd\n numbers" ) # Now let's cut the tree and add that info to the plot: k2 <- cutree(dend, k = 2) cols2 <- c("#0082CE", "#CC476B")[k2] plot(dend) colored_bars(cbind(cols2, cols), dend, rowLabels = c("2 clusters", "Odd numbers") ) # The same, but with an horizontal plot! par(mar = c(6, 2, 2, 4)) plot(dend, horiz = TRUE) colored_bars(cbind(cols2, cols), dend, rowLabels = c("2 clusters", "Odd numbers"), horiz = TRUE ) # let's add clusters color # notice how we need to play with the colors a bit # this is because color_branches places colors from # left to right. Which means we need to give colored_bars # the colors of the items so that ofter sorting they would be # from left to right. Here is how it can be done: the_k <- 3 library(colorspace) cols3 <- rainbow_hcl(the_k, c = 90, l = 50) dend %>% set("branches_k_color", k = the_k, with = cols3) %>% plot() kx <- cutree(dend, k = the_k) ord <- order.dendrogram(dend) kx <- sort_levels_values(kx[ord]) kx <- kx[match(seq_along(ord), ord)] par(mar = c(5, 5, 2, 2)) plot(dend) colored_bars(cbind(cols3[kx], cols2, cols), dend, rowLabels = c("3 clusters", "2 clusters", "Odd numbers") ) ## mtcars example # Create the dend: dend <- as.dendrogram(hclust(dist(mtcars))) # Create a vector giving a color for each car to which company it belongs to car_type <- rep("Other", length(rownames(mtcars))) is_x <- grepl("Merc", rownames(mtcars)) car_type[is_x] <- "Mercedes" is_x <- grepl("Mazda", rownames(mtcars)) car_type[is_x] <- "Mazda" is_x <- grepl("Toyota", rownames(mtcars)) car_type[is_x] <- "Toyota" car_type <- factor(car_type) n_car_types <- length(unique(car_type)) col_car_type <- colorspace::rainbow_hcl(n_car_types, c = 70, l = 50)[car_type] # extra: showing the various clusters cuts k234 <- cutree(dend, k = 2:4) # color labels by car company: labels_colors(dend) <- col_car_type[order.dendrogram(dend)] # color branches based on cutting the tree into 4 clusters: dend <- color_branches(dend, k = 4) ### plots par(mar = c(12, 4, 1, 1)) plot(dend) colored_bars(cbind(k234[, 3:1], col_car_type), dend, rowLabels = c(paste0("k = ", 4:2), "Car Type") ) # horiz version: par(mar = c(4, 1, 1, 12)) plot(dend, horiz = TRUE) colored_bars(cbind(k234[, 3:1], col_car_type), dend, rowLabels = c(paste0("k = ", 4:2), "Car Type"), horiz = TRUE )
rows_picking <- c(1:5, 25:30) dend <- (iris[rows_picking, -5] * 10) %>% dist() %>% hclust() %>% as.dendrogram() odd_numbers <- rows_picking %% 2 cols <- c("gold", "grey")[odd_numbers + 1] # scale is off plot(dend) colored_bars(cols, dend) # move and scale a bit plot(dend) colored_bars(cols, dend, y_shift = -1, rowLabels = "Odd\n numbers" ) # Now let's cut the tree and add that info to the plot: k2 <- cutree(dend, k = 2) cols2 <- c("#0082CE", "#CC476B")[k2] plot(dend) colored_bars(cbind(cols2, cols), dend, rowLabels = c("2 clusters", "Odd numbers") ) # The same, but with an horizontal plot! par(mar = c(6, 2, 2, 4)) plot(dend, horiz = TRUE) colored_bars(cbind(cols2, cols), dend, rowLabels = c("2 clusters", "Odd numbers"), horiz = TRUE ) # let's add clusters color # notice how we need to play with the colors a bit # this is because color_branches places colors from # left to right. Which means we need to give colored_bars # the colors of the items so that ofter sorting they would be # from left to right. Here is how it can be done: the_k <- 3 library(colorspace) cols3 <- rainbow_hcl(the_k, c = 90, l = 50) dend %>% set("branches_k_color", k = the_k, with = cols3) %>% plot() kx <- cutree(dend, k = the_k) ord <- order.dendrogram(dend) kx <- sort_levels_values(kx[ord]) kx <- kx[match(seq_along(ord), ord)] par(mar = c(5, 5, 2, 2)) plot(dend) colored_bars(cbind(cols3[kx], cols2, cols), dend, rowLabels = c("3 clusters", "2 clusters", "Odd numbers") ) ## mtcars example # Create the dend: dend <- as.dendrogram(hclust(dist(mtcars))) # Create a vector giving a color for each car to which company it belongs to car_type <- rep("Other", length(rownames(mtcars))) is_x <- grepl("Merc", rownames(mtcars)) car_type[is_x] <- "Mercedes" is_x <- grepl("Mazda", rownames(mtcars)) car_type[is_x] <- "Mazda" is_x <- grepl("Toyota", rownames(mtcars)) car_type[is_x] <- "Toyota" car_type <- factor(car_type) n_car_types <- length(unique(car_type)) col_car_type <- colorspace::rainbow_hcl(n_car_types, c = 70, l = 50)[car_type] # extra: showing the various clusters cuts k234 <- cutree(dend, k = 2:4) # color labels by car company: labels_colors(dend) <- col_car_type[order.dendrogram(dend)] # color branches based on cutting the tree into 4 clusters: dend <- color_branches(dend, k = 4) ### plots par(mar = c(12, 4, 1, 1)) plot(dend) colored_bars(cbind(k234[, 3:1], col_car_type), dend, rowLabels = c(paste0("k = ", 4:2), "Car Type") ) # horiz version: par(mar = c(4, 1, 1, 12)) plot(dend, horiz = TRUE) colored_bars(cbind(k234[, 3:1], col_car_type), dend, rowLabels = c(paste0("k = ", 4:2), "Car Type"), horiz = TRUE )
Add colored dots next to a dendrogram, usually corresponding to either clusters or some outside categorization.
colored_dots( colors, dend, rowLabels = NULL, cex.rowLabels = 0.9, add = TRUE, y_scale, y_shift, text_shift = 1, sort_by_labels_order = TRUE, horiz = FALSE, dot_size = 1, ... )
colored_dots( colors, dend, rowLabels = NULL, cex.rowLabels = 0.9, add = TRUE, y_scale, y_shift, text_shift = 1, sort_by_labels_order = TRUE, horiz = FALSE, dot_size = 1, ... )
colors |
Coloring of the dots beside the dendrogram. Either a vector (one color per object) or a matrix (can also be an array or a data frame) with each column giving one group with color per object. Each column will be plotted as a colored point (when horiz = FALSE) under the dendrogram. As long as the sort_by_labels_order paramter is TRUE (default), the colors vector/matrix should be provided in the order of the original data order (and it will be re-ordered automatically to the order of the dendrogram) |
dend |
a dendrogram object. If missing, the colors are plotted without and re-ordering (this assumes that the colors are already ordered based on the dend's labels) This is also important in order to get the correct height/location of the colored dots (i.e.: adjusting the y_scale and y_shift) |
rowLabels |
Labels for the colorings given in |
cex.rowLabels |
Font size scale factor for the row labels. See |
add |
logical(TRUE), should the colored dots be added to an existing dendrogram plot? |
y_scale |
how much should the dots be stretched on the y axis? If no dend is supplied - the default will be 1 |
y_shift |
where should the dots be plotted underneath the x axis? By default it will try to locate the dots underneath the labels (it may miss, in which case you would need to enter a number manually) If no dend is supplied - the default will be 0 |
text_shift |
a dendrogram object |
sort_by_labels_order |
logical(TRUE) - if TRUE (default), then the order of the colored dots will be sorted based on the order needed to change the original order of the observations to the current order of the labels in the dendrogram. If FALSE the colored dots are plotted as-is, based on the order of the colors vector. |
horiz |
logical (FALSE by default). Set to TRUE when using plot(dend, horiz = TRUE) |
dot_size |
numeric (1 by default). Passed to cex argument in points |
... |
ignored at this point. |
The reason you might choose colored_dots over colored_bars is when you have a lot of group types and/or a really large dendrogram. Hint: Make a group for each categorical factor and color it one color when true, and assign a fully transparent color when false.
You will often need to adjust the y_scale, y_shift and the text_shift parameters, in order to get the dots in the location you would want.
(This can probably be done automatically, but will require more work. since it has to do with the current mar settings, the number of groups, and each computer's specific graphic device. patches for smarter defaults will be appreciated)
An invisible vector/matrix with the ordered colors.
Steve Horvath [email protected], Tal Galili [email protected], Peter Langfelder [email protected], Chase Clark [email protected]
This function is based on the plotHclustColors from the moduleColor R package. It was modified so that it would work with dendrograms (and not just hclust objects), as well allow to add the colored dots on top of an existing plot (and not only as a seperate plot).
See: https://cran.r-project.org/package=moduleColor For more details.
branches_attr_by_clusters, plotDendroAndColors
rows_picking <- c(1:5, 25:30) dend <- (iris[rows_picking, -5] * 10) %>% dist() %>% hclust() %>% as.dendrogram() odd_numbers <- rows_picking %% 2 cols <- c("red", "white")[odd_numbers + 1] plot(dend) colored_dots(cols, dend) # Example of adjusting postion of dots plot(dend) colored_dots(cols, dend, y_shift = -1, rowLabels = "Odd\n numbers" ) rows_picking <- c(1:5, 25:30) dend <- (iris[rows_picking, -5] * 10) %>% dist() %>% hclust() %>% as.dendrogram() odd_numbers <- rows_picking %% 2 # For leaves that shouldn't have dots, make them the same color as the background, # or set the alpha value to fully transparant cols <- c("black", "white")[odd_numbers + 1] # scale is off plot(dend) colored_dots(cols, dend) # move and scale a bit plot(dend) colored_dots(cols, dend, y_shift = -1, rowLabels = "Odd\n numbers" ) # Now let's cut the tree and add that info to the plot: k2 <- cutree(dend, k = 2) cols2 <- c("#1b9e77", "#d95f02")[k2] par(mar = c(5, 6, 1, 1)) plot(dend) colored_dots(cbind(cols2, cols), dend, rowLabels = c("2 clusters", "Even numbers") ) # The same, but with an horizontal plot! par(mar = c(6, 2, 2, 4)) plot(dend, horiz = TRUE) colored_dots(cbind(cols2, cols), dend, rowLabels = c("2 clusters", "Even numbers"), horiz = TRUE ) # ============================== # ============================== ## mtcars example # Create the dend: dend <- as.dendrogram(hclust(dist(mtcars))) # Get all company names comp_names <- unlist(lapply(rownames(mtcars), function(x) strsplit(x, " ")[[1]][[1]])) # Get the top three occurring companies top_three <- sort(table(comp_names), decreasing = TRUE)[1:3] # Match the top three companies to where they are found in the dendrogram labels top_three <- sapply(names(top_three), function(x) grepl(x, labels(dend))) top_three <- as.data.frame(top_three) # "top_three" is now a data frame of the top three companies as columns. # Each column represents a vector (rows) which is the length of labels(dend). # The vector has values TRUE and FALSE, for whether the company name matched # labels(dend)[i] # Colorblind friendly vector of HEX colors colorblind_friendly <- c("#1b9e77", "#d95f02", "#7570b3") # If we run the for-loop on "top_three" we will turn the vectors into a character-type too early, # so make a copy to "colored_dataframe" which we will work on colored_dataframe <- top_three for (i in 1:3) { # This replaces TRUE values with a color from our vector of colors colored_dataframe[top_three[, i], i] <- colorblind_friendly[[i]] # This replaces FALSE values with black HEX, but fully transparent (invisible on plot) colored_dataframe[!top_three[, i], i] <- "#00000000" } # Color branches and labels by "cutting" the dendrogram at an arbitrary height dend <- color_branches(dend, h = 170) dend <- color_labels(dend, h = 170) ### plots par(mar = c(12, 4, 1, 1)) plot(dend) colored_dots(colored_dataframe, dend, rowLabels = colnames(colored_dataframe), horiz = FALSE, sort_by_labels_order = FALSE ) # Show a dotted line where tree was "cut" abline(h = 170, lty = 3) # horiz version: par(mar = c(4, 1, 1, 12)) plot(dend, horiz = TRUE) colored_dots(colored_dataframe, dend, rowLabels = colnames(colored_dataframe), horiz = TRUE, sort_by_labels_order = FALSE ) # Show a dotted line where the tree was "cut" abline(v = 170, lty = 3)
rows_picking <- c(1:5, 25:30) dend <- (iris[rows_picking, -5] * 10) %>% dist() %>% hclust() %>% as.dendrogram() odd_numbers <- rows_picking %% 2 cols <- c("red", "white")[odd_numbers + 1] plot(dend) colored_dots(cols, dend) # Example of adjusting postion of dots plot(dend) colored_dots(cols, dend, y_shift = -1, rowLabels = "Odd\n numbers" ) rows_picking <- c(1:5, 25:30) dend <- (iris[rows_picking, -5] * 10) %>% dist() %>% hclust() %>% as.dendrogram() odd_numbers <- rows_picking %% 2 # For leaves that shouldn't have dots, make them the same color as the background, # or set the alpha value to fully transparant cols <- c("black", "white")[odd_numbers + 1] # scale is off plot(dend) colored_dots(cols, dend) # move and scale a bit plot(dend) colored_dots(cols, dend, y_shift = -1, rowLabels = "Odd\n numbers" ) # Now let's cut the tree and add that info to the plot: k2 <- cutree(dend, k = 2) cols2 <- c("#1b9e77", "#d95f02")[k2] par(mar = c(5, 6, 1, 1)) plot(dend) colored_dots(cbind(cols2, cols), dend, rowLabels = c("2 clusters", "Even numbers") ) # The same, but with an horizontal plot! par(mar = c(6, 2, 2, 4)) plot(dend, horiz = TRUE) colored_dots(cbind(cols2, cols), dend, rowLabels = c("2 clusters", "Even numbers"), horiz = TRUE ) # ============================== # ============================== ## mtcars example # Create the dend: dend <- as.dendrogram(hclust(dist(mtcars))) # Get all company names comp_names <- unlist(lapply(rownames(mtcars), function(x) strsplit(x, " ")[[1]][[1]])) # Get the top three occurring companies top_three <- sort(table(comp_names), decreasing = TRUE)[1:3] # Match the top three companies to where they are found in the dendrogram labels top_three <- sapply(names(top_three), function(x) grepl(x, labels(dend))) top_three <- as.data.frame(top_three) # "top_three" is now a data frame of the top three companies as columns. # Each column represents a vector (rows) which is the length of labels(dend). # The vector has values TRUE and FALSE, for whether the company name matched # labels(dend)[i] # Colorblind friendly vector of HEX colors colorblind_friendly <- c("#1b9e77", "#d95f02", "#7570b3") # If we run the for-loop on "top_three" we will turn the vectors into a character-type too early, # so make a copy to "colored_dataframe" which we will work on colored_dataframe <- top_three for (i in 1:3) { # This replaces TRUE values with a color from our vector of colors colored_dataframe[top_three[, i], i] <- colorblind_friendly[[i]] # This replaces FALSE values with black HEX, but fully transparent (invisible on plot) colored_dataframe[!top_three[, i], i] <- "#00000000" } # Color branches and labels by "cutting" the dendrogram at an arbitrary height dend <- color_branches(dend, h = 170) dend <- color_labels(dend, h = 170) ### plots par(mar = c(12, 4, 1, 1)) plot(dend) colored_dots(colored_dataframe, dend, rowLabels = colnames(colored_dataframe), horiz = FALSE, sort_by_labels_order = FALSE ) # Show a dotted line where tree was "cut" abline(h = 170, lty = 3) # horiz version: par(mar = c(4, 1, 1, 12)) plot(dend, horiz = TRUE) colored_dots(colored_dataframe, dend, rowLabels = colnames(colored_dataframe), horiz = TRUE, sort_by_labels_order = FALSE ) # Show a dotted line where the tree was "cut" abline(v = 170, lty = 3)
Gets a dend and the output from "nodes_with_shared_labels" and returns a vector (length of labels), indicating the clusters forming shared subtrees
common_subtrees_clusters(dend1, dend2, leaves_get_0_cluster = TRUE, ...)
common_subtrees_clusters(dend1, dend2, leaves_get_0_cluster = TRUE, ...)
dend1 |
a dendrogram. |
dend2 |
a dendrogram. |
leaves_get_0_cluster |
logical (TRUE). Should the leaves which are not part of a larger common subtree get a unique cluster number, or the value 0. |
... |
not used. |
An integer vector, with values indicating which leaves in dend1 form a common subtree cluster, with ones available in dend2
library(dendextend) dend1 <- 1:6 %>% dist() %>% hclust() %>% as.dendrogram() dend2 <- dend1 %>% set("labels", c(1:4, 6:5)) tanglegram(dend1, dend2) clusters1 <- common_subtrees_clusters(dend1, dend2) dend1_2 <- color_branches(dend1, clusters = clusters1) plot(dend1_2) plot(dend1_2, horiz = TRUE) tanglegram(dend1_2, dend2, highlight_distinct_edges = FALSE) tanglegram(dend1_2, dend2)
library(dendextend) dend1 <- 1:6 %>% dist() %>% hclust() %>% as.dendrogram() dend2 <- dend1 %>% set("labels", c(1:4, 6:5)) tanglegram(dend1, dend2) clusters1 <- common_subtrees_clusters(dend1, dend2) dend1_2 <- color_branches(dend1, clusters = clusters1) plot(dend1_2) plot(dend1_2, horiz = TRUE) tanglegram(dend1_2, dend2, highlight_distinct_edges = FALSE) tanglegram(dend1_2, dend2)
Calculate Baker's Gamma correlation coefficient for two trees (also known as Goodman-Kruskal-gamma index).
Assumes the labels in the two trees fully match. If they do not please first use intersect_trees to have them matched.
WARNING: this can be quite slow for medium/large trees.
cor_bakers_gamma(dend1, ...) ## Default S3 method: cor_bakers_gamma(dend1, dend2, ...) ## S3 method for class 'dendrogram' cor_bakers_gamma( dend1, dend2, use_labels_not_values = TRUE, to_plot = FALSE, warn = dendextend_options("warn"), ... ) ## S3 method for class 'hclust' cor_bakers_gamma( dend1, dend2, use_labels_not_values = TRUE, to_plot = FALSE, warn = dendextend_options("warn"), ... ) ## S3 method for class 'dendlist' cor_bakers_gamma(dend1, which = c(1L, 2L), ...)
cor_bakers_gamma(dend1, ...) ## Default S3 method: cor_bakers_gamma(dend1, dend2, ...) ## S3 method for class 'dendrogram' cor_bakers_gamma( dend1, dend2, use_labels_not_values = TRUE, to_plot = FALSE, warn = dendextend_options("warn"), ... ) ## S3 method for class 'hclust' cor_bakers_gamma( dend1, dend2, use_labels_not_values = TRUE, to_plot = FALSE, warn = dendextend_options("warn"), ... ) ## S3 method for class 'dendlist' cor_bakers_gamma(dend1, which = c(1L, 2L), ...)
dend1 |
a tree (dendrogram/hclust/phylo) |
... |
Passed to cutree. |
dend2 |
a tree (dendrogram/hclust/phylo) |
use_labels_not_values |
logical (TRUE). Should labels be used in the k matrix when using cutree? Set to FALSE will make the function a bit faster BUT, it assumes the two trees have the exact same leaves order values for each labels. This can be assured by using match_order_by_labels. |
to_plot |
logical (FALSE). Passed to bakers_gamma_for_2_k_matrix |
warn |
logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE. should a warning be issued when using cutree? |
which |
an integer vector of length 2, indicating which of the trees in the dendlist object should be plotted (relevant for dendlist) |
Baker's Gamma (see reference) is a measure of accosiation (similarity) between two trees of heirarchical clustering (dendrograms).
It is calculated by taking two items, and see what is the heighst possible level of k (number of cluster groups created when cutting the tree) for which the two item still belongs to the same tree. That k is returned, and the same is done for these two items for the second tree. There are n over 2 combinations of such pairs of items from the items in the tree, and all of these numbers are calculated for each of the two trees. Then, these two sets of numbers (a set for the items in each tree) are paired according to the pairs of items compared, and a spearman correlation is calculated.
The value can range between -1 to 1. With near 0 values meaning that the two trees are not statistically similar. For exact p-value one should result to a permutation test. One such option will be to permute over the labels of one tree many times, and calculating the distriubtion under the null hypothesis (keeping the trees topologies constant).
Notice that this measure is not affected by the height of a branch but only of its relative position compared with other branches.
Baker's Gamma association Index between two trees (a number between -1 to 1)
Baker, F. B., Stability of Two Hierarchical Grouping Techniques Case 1: Sensitivity to Data Errors. Journal of the American Statistical Association, 69(346), 440 (1974).
## Not run: set.seed(23235) ss <- sample(1:150, 10) hc1 <- hclust(dist(iris[ss, -5]), "com") hc2 <- hclust(dist(iris[ss, -5]), "single") dend1 <- as.dendrogram(hc1) dend2 <- as.dendrogram(hc2) # cutree(dend1) cor_bakers_gamma(hc1, hc2) cor_bakers_gamma(dend1, dend2) dend1 <- match_order_by_labels(dend1, dend2) # if you are not sure cor_bakers_gamma(dend1, dend2, use_labels_not_values = FALSE) library(microbenchmark) microbenchmark( with_labels = cor_bakers_gamma(dend1, dend2, try_cutree_hclust = FALSE), with_values = cor_bakers_gamma(dend1, dend2, use_labels_not_values = FALSE, try_cutree_hclust = FALSE ), times = 10 ) cor_bakers_gamma(dend1, dend1, use_labels_not_values = FALSE) cor_bakers_gamma(dend1, dend1, use_labels_not_values = TRUE) ## End(Not run)
## Not run: set.seed(23235) ss <- sample(1:150, 10) hc1 <- hclust(dist(iris[ss, -5]), "com") hc2 <- hclust(dist(iris[ss, -5]), "single") dend1 <- as.dendrogram(hc1) dend2 <- as.dendrogram(hc2) # cutree(dend1) cor_bakers_gamma(hc1, hc2) cor_bakers_gamma(dend1, dend2) dend1 <- match_order_by_labels(dend1, dend2) # if you are not sure cor_bakers_gamma(dend1, dend2, use_labels_not_values = FALSE) library(microbenchmark) microbenchmark( with_labels = cor_bakers_gamma(dend1, dend2, try_cutree_hclust = FALSE), with_values = cor_bakers_gamma(dend1, dend2, use_labels_not_values = FALSE, try_cutree_hclust = FALSE ), times = 10 ) cor_bakers_gamma(dend1, dend1, use_labels_not_values = FALSE) cor_bakers_gamma(dend1, dend1, use_labels_not_values = TRUE) ## End(Not run)
Calculates the number of nodes, in each tree, that are common (i.e.: that have the same exact list of labels). The correlation is between 0 (actually, 2*(nnodes-1)/(2*nnodes), for two trees with the same list of labels - since the top node will always be identical for them). Where 1 means that every node in the one tree, has a node in the other tree with the exact same list of labels. Notice this measure is non-parameteric (it ignores the heights and relative position of the nodes).
cor_common_nodes(dend1, dend2, ...)
cor_common_nodes(dend1, dend2, ...)
dend1 |
a dendrogram. |
dend2 |
a dendrogram. |
... |
not used. |
A correlation value between 0 to 1 (almost identical trees)
set.seed(23235) ss <- sample(1:150, 10) hc1 <- iris[ss, -5] %>% dist() %>% hclust("com") hc2 <- iris[ss, -5] %>% dist() %>% hclust("single") dend1 <- as.dendrogram(hc1) dend2 <- as.dendrogram(hc2) cor_cophenetic(dend1, dend2) cor_common_nodes(dend1, dend2) tanglegram(dend1, dend2) # we can see we have only two nodes which are different...
set.seed(23235) ss <- sample(1:150, 10) hc1 <- iris[ss, -5] %>% dist() %>% hclust("com") hc2 <- iris[ss, -5] %>% dist() %>% hclust("single") dend1 <- as.dendrogram(hc1) dend2 <- as.dendrogram(hc2) cor_cophenetic(dend1, dend2) cor_common_nodes(dend1, dend2) tanglegram(dend1, dend2) # we can see we have only two nodes which are different...
Cophenetic correlation coefficient for two trees.
Assumes the labels in the two trees fully match. If they do not please first use intersect_trees to have them matched.
cor_cophenetic(dend1, ...) ## Default S3 method: cor_cophenetic( dend1, dend2, method_coef = c("pearson", "kendall", "spearman"), ... ) ## S3 method for class 'dendlist' cor_cophenetic( dend1, which = c(1L, 2L), method_coef = c("pearson", "kendall", "spearman"), ... )
cor_cophenetic(dend1, ...) ## Default S3 method: cor_cophenetic( dend1, dend2, method_coef = c("pearson", "kendall", "spearman"), ... ) ## S3 method for class 'dendlist' cor_cophenetic( dend1, which = c(1L, 2L), method_coef = c("pearson", "kendall", "spearman"), ... )
dend1 |
a tree (dendrogram/hclust/phylo, or dendlist) |
... |
Ignored. |
dend2 |
Either a tree (dendrogram/hclust/phylo), or a dist object (for example, from the original data matrix). |
method_coef |
a character string indicating which correlation coefficient is to be computed. One of "pearson" (default), "kendall", or "spearman", can be abbreviated. Passed to cor. |
which |
an integer vector of length 2, indicating which of the trees in a dendlist object should have their cor_cophenetic calculated. |
From cophenetic: The cophenetic distance between two observations that have been clustered is defined to be the intergroup dissimilarity at which the two observations are first combined into a single cluster. Note that this distance has many ties and restrictions.
cor_cophenetic calculates the correlation between two cophenetic distance matrices of the two trees.
The value can range between -1 to 1. With near 0 values meaning that the two trees are not statistically similar. For exact p-value one should result to a permutation test. One such option will be to permute over the labels of one tree many times, and calculating the distriubtion under the null hypothesis (keeping the trees topologies constant).
Notice that this measure IS affected by the height of a branch.
The correlation between cophenetic
Sokal, R. R. and F. J. Rohlf. 1962. The comparison of dendrograms by objective methods. Taxon, 11:33-40
Sneath, P.H.A. and Sokal, R.R. (1973) Numerical Taxonomy: The Principles and Practice of Numerical Classification, p. 278 ff; Freeman, San Francisco.
https://en.wikipedia.org/wiki/Cophenetic_correlation
## Not run: set.seed(23235) ss <- sample(1:150, 10) hc1 <- iris[ss, -5] %>% dist() %>% hclust("com") hc2 <- iris[ss, -5] %>% dist() %>% hclust("single") dend1 <- as.dendrogram(hc1) dend2 <- as.dendrogram(hc2) # cutree(dend1) cophenetic(hc1) cophenetic(hc2) # notice how the dist matrix for the dendrograms have different orders: cophenetic(dend1) cophenetic(dend2) cor(cophenetic(hc1), cophenetic(hc2)) # 0.874 cor(cophenetic(dend1), cophenetic(dend2)) # 0.16 # the difference is becasue the order of the distance table in the case of # stats:::cophenetic.dendrogram will change between dendrograms! # however, this is consistant (since I force-sort the rows/columns): cor_cophenetic(hc1, hc2) cor_cophenetic(dend1, dend2) cor_cophenetic(dendlist(dend1, dend2)) # we can also use different cor methods (almost the same result though): cor_cophenetic(hc1, hc2, method = "spearman") # 0.8456014 cor_cophenetic(dend1, dend2, method = "spearman") # # cophenetic correlation is about 10 times (!) faster than bakers_gamma cor: library(microbenchmark) microbenchmark( cor_bakers_gamma = cor_bakers_gamma(dend1, dend2, try_cutree_hclust = FALSE), cor_cophenetic = cor_cophenetic(dend1, dend2), times = 10 ) # but only because of the cutree for dendrogram. When allowing hclust cutree # it is only about twice as fast: microbenchmark( cor_bakers_gamma = cor_bakers_gamma(dend1, dend2, try_cutree_hclust = TRUE), cor_cophenetic = cor_cophenetic(dend1, dend2), times = 10 ) ## End(Not run)
## Not run: set.seed(23235) ss <- sample(1:150, 10) hc1 <- iris[ss, -5] %>% dist() %>% hclust("com") hc2 <- iris[ss, -5] %>% dist() %>% hclust("single") dend1 <- as.dendrogram(hc1) dend2 <- as.dendrogram(hc2) # cutree(dend1) cophenetic(hc1) cophenetic(hc2) # notice how the dist matrix for the dendrograms have different orders: cophenetic(dend1) cophenetic(dend2) cor(cophenetic(hc1), cophenetic(hc2)) # 0.874 cor(cophenetic(dend1), cophenetic(dend2)) # 0.16 # the difference is becasue the order of the distance table in the case of # stats:::cophenetic.dendrogram will change between dendrograms! # however, this is consistant (since I force-sort the rows/columns): cor_cophenetic(hc1, hc2) cor_cophenetic(dend1, dend2) cor_cophenetic(dendlist(dend1, dend2)) # we can also use different cor methods (almost the same result though): cor_cophenetic(hc1, hc2, method = "spearman") # 0.8456014 cor_cophenetic(dend1, dend2, method = "spearman") # # cophenetic correlation is about 10 times (!) faster than bakers_gamma cor: library(microbenchmark) microbenchmark( cor_bakers_gamma = cor_bakers_gamma(dend1, dend2, try_cutree_hclust = FALSE), cor_cophenetic = cor_cophenetic(dend1, dend2), times = 10 ) # but only because of the cutree for dendrogram. When allowing hclust cutree # it is only about twice as fast: microbenchmark( cor_bakers_gamma = cor_bakers_gamma(dend1, dend2, try_cutree_hclust = TRUE), cor_cophenetic = cor_cophenetic(dend1, dend2), times = 10 ) ## End(Not run)
Calculates the FM_index Correlation for some k.
cor_FM_index(dend1, dend2, k, ...)
cor_FM_index(dend1, dend2, k, ...)
dend1 |
a dendrogram. |
dend2 |
a dendrogram. |
k |
an integer (number of clusters to cut the tree) |
... |
not used. |
A correlation value between 0 to 1 (almost identical clusters for some k)
set.seed(23235) ss <- sample(1:150, 10) hc1 <- iris[ss, -5] %>% dist() %>% hclust("com") hc2 <- iris[ss, -5] %>% dist() %>% hclust("single") dend1 <- as.dendrogram(hc1) dend2 <- as.dendrogram(hc2) cor_FM_index(dend1, dend2, k = 2) cor_FM_index(dend1, dend2, k = 3) cor_FM_index(dend1, dend2, k = 4)
set.seed(23235) ss <- sample(1:150, 10) hc1 <- iris[ss, -5] %>% dist() %>% hclust("com") hc2 <- iris[ss, -5] %>% dist() %>% hclust("single") dend1 <- as.dendrogram(hc1) dend2 <- as.dendrogram(hc2) cor_FM_index(dend1, dend2, k = 2) cor_FM_index(dend1, dend2, k = 3) cor_FM_index(dend1, dend2, k = 4)
A correlation matrix between a list of trees.
Assumes the labels in the two trees fully match. If they do not please first use intersect_trees to have them matched.
cor.dendlist( dend, method = c("cophenetic", "baker", "common_nodes", "FM_index"), ... )
cor.dendlist( dend, method = c("cophenetic", "baker", "common_nodes", "FM_index"), ... )
dend |
a dendlist of trees |
method |
a character string indicating which correlation coefficient is to be computed. One of "cophenetic" (default), "baker", "common_nodes", or "FM_index". It can be abbreviated. |
... |
passed to cor functions. |
A correlation matrix between the different trees
cophenetic, cor_cophenetic, cor_bakers_gamma, cor_common_nodes, cor_FM_index
## Not run: set.seed(23235) ss <- sample(1:150, 10) dend1 <- iris[ss, -5] %>% dist() %>% hclust("com") %>% as.dendrogram() dend2 <- iris[ss, -5] %>% dist() %>% hclust("single") %>% as.dendrogram() dend3 <- iris[ss, -5] %>% dist() %>% hclust("ave") %>% as.dendrogram() dend4 <- iris[ss, -5] %>% dist() %>% hclust("centroid") %>% as.dendrogram() # cutree(dend1) cors <- cor.dendlist(dendlist(d1 = dend1, d2 = dend2, d3 = dend3, d4 = dend4)) cors # a nice plot for them: library(corrplot) corrplot(cor.dendlist(dend1234), "pie", "lower") ## End(Not run)
## Not run: set.seed(23235) ss <- sample(1:150, 10) dend1 <- iris[ss, -5] %>% dist() %>% hclust("com") %>% as.dendrogram() dend2 <- iris[ss, -5] %>% dist() %>% hclust("single") %>% as.dendrogram() dend3 <- iris[ss, -5] %>% dist() %>% hclust("ave") %>% as.dendrogram() dend4 <- iris[ss, -5] %>% dist() %>% hclust("centroid") %>% as.dendrogram() # cutree(dend1) cors <- cor.dendlist(dendlist(d1 = dend1, d2 = dend2, d3 = dend3, d4 = dend4)) cors # a nice plot for them: library(corrplot) corrplot(cor.dendlist(dend1234), "pie", "lower") ## End(Not run)
This function counts the number of "practical" terminal nodes (nodes which are not leaves, but has 0 height to them are considered "terminal" nodes). If the tree is standard, that would simply be the number of leaves (only the leaves will have height 0). However, in cases where the tree has several nodes (before the leaves) with 0 height, the count_terminal_nodes counts such nodes as terminal nodes
The function is recursive in that it either returns 1 if it reached a terminal node (either a leaf or a 0 height node), else: it will count the number of terminal nodes in each of its sub-nodes, sum them up, and return them.
count_terminal_nodes(dend_node, ...)
count_terminal_nodes(dend_node, ...)
dend_node |
a dendrogram object for which to count its number of terminal nodes (leaves or 0 height nodes). |
... |
not used |
The number of terminal nodes (excluding the leaves of nodes of height 0)
# define dendrogram object to play with: hc <- hclust(dist(USArrests[1:3, ]), "ave") dend <- as.dendrogram(hc) ### # Trivial case count_terminal_nodes(dend) # 3 terminal nodes length(labels(dend)) # 3 - the same number plot(dend, main = "This is considered a tree \n with THREE terminal nodes (leaves)" ) ### # NON-Trivial case str(dend) attr(dend[[2]], "height") <- 0 count_terminal_nodes(dend) # 2 terminal nodes, why? see this plot: plot(dend, main = "This is considered a tree \n with TWO terminal nodes only" ) # while we have 3 leaves, in practice we have only 2 terminal nodes # (this is a feature, not a bug.)
# define dendrogram object to play with: hc <- hclust(dist(USArrests[1:3, ]), "ave") dend <- as.dendrogram(hc) ### # Trivial case count_terminal_nodes(dend) # 3 terminal nodes length(labels(dend)) # 3 - the same number plot(dend, main = "This is considered a tree \n with THREE terminal nodes (leaves)" ) ### # NON-Trivial case str(dend) attr(dend[[2]], "height") <- 0 count_terminal_nodes(dend) # 2 terminal nodes, why? see this plot: plot(dend, main = "This is considered a tree \n with TWO terminal nodes only" ) # while we have 3 leaves, in practice we have only 2 terminal nodes # (this is a feature, not a bug.)
Cuts the dend at height h and returns a list with the FUN function
implemented on all the sub trees created by cut at height h.
This is used for creating a cutree.dendrogram function,
by using the labels
function as FUN.
This is the Rcpp version of the function, offering a 10-60 times improvement in speed (depending on the tree size it is used on).
cut_lower_fun(dend, h, FUN = labels, warn = dendextend_options("warn"), ...)
cut_lower_fun(dend, h, FUN = labels, warn = dendextend_options("warn"), ...)
dend |
a dendrogram object. |
h |
a scalar of height to cut the dend by. |
FUN |
a function to run. (default is "labels") |
warn |
logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE. Should the user be warned if reverting to default? |
... |
passed to FUN. |
A list with the output of running FUN on each of the sub dends derived from cutting "dend"
Tal Galili
labels
, dendrogram
,
cutree.dendrogram
dend <- as.dendrogram(hclust(dist(iris[1:4, -5]))) # this is really cool! cut_lower_fun(dend, .4, labels) lapply(cut(dend, h = .4)$lower, labels) cut_lower_fun(dend, .4, order.dendrogram)
dend <- as.dendrogram(hclust(dist(iris[1:4, -5]))) # this is really cool! cut_lower_fun(dend, .4, labels) lapply(cut(dend, h = .4)$lower, labels) cut_lower_fun(dend, .4, order.dendrogram)
Cuts a dendrogram tree into several groups by specifying the desired number of clusters k(s), or cut height(s).
For hclust.dendrogram
-
In case there exists no such k for which exists a relevant split of the
dendrogram, a warning is issued to the user, and NA is returned.
cutree(tree, k = NULL, h = NULL, ...) ## Default S3 method: cutree(tree, k = NULL, h = NULL, ...) ## S3 method for class 'hclust' cutree( tree, k = NULL, h = NULL, use_labels_not_values = TRUE, order_clusters_as_data = TRUE, warn = dendextend_options("warn"), NA_to_0L = TRUE, ... ) ## S3 method for class 'phylo' cutree(tree, k = NULL, h = NULL, ...) ## S3 method for class 'phylo' cutree(tree, k = NULL, h = NULL, ...) ## S3 method for class 'agnes' cutree(tree, k = NULL, h = NULL, ...) ## S3 method for class 'diana' cutree(tree, k = NULL, h = NULL, ...) ## S3 method for class 'dendrogram' cutree( tree, k = NULL, h = NULL, dend_heights_per_k = NULL, use_labels_not_values = TRUE, order_clusters_as_data = TRUE, warn = dendextend_options("warn"), try_cutree_hclust = TRUE, NA_to_0L = TRUE, ... )
cutree(tree, k = NULL, h = NULL, ...) ## Default S3 method: cutree(tree, k = NULL, h = NULL, ...) ## S3 method for class 'hclust' cutree( tree, k = NULL, h = NULL, use_labels_not_values = TRUE, order_clusters_as_data = TRUE, warn = dendextend_options("warn"), NA_to_0L = TRUE, ... ) ## S3 method for class 'phylo' cutree(tree, k = NULL, h = NULL, ...) ## S3 method for class 'phylo' cutree(tree, k = NULL, h = NULL, ...) ## S3 method for class 'agnes' cutree(tree, k = NULL, h = NULL, ...) ## S3 method for class 'diana' cutree(tree, k = NULL, h = NULL, ...) ## S3 method for class 'dendrogram' cutree( tree, k = NULL, h = NULL, dend_heights_per_k = NULL, use_labels_not_values = TRUE, order_clusters_as_data = TRUE, warn = dendextend_options("warn"), try_cutree_hclust = TRUE, NA_to_0L = TRUE, ... )
tree |
a dendrogram object |
k |
numeric scalar (OR a vector) with the number of clusters the tree should be cut into. |
h |
numeric scalar (OR a vector) with a height where the tree should be cut. |
... |
(not currently in use) |
use_labels_not_values |
logical, defaults to TRUE. If the actual labels of the
clusters do not matter - and we want to gain speed (say, 10 times faster) -
then use FALSE (gives the "leaves order" instead of their labels.).
This is passed to |
order_clusters_as_data |
logical, defaults to TRUE. There are two ways by which
to order the clusters: 1) By the order of the original data. 2) by the order of the
labels in the dendrogram. In order to be consistent with cutree, this is set
to TRUE.
This is passed to |
warn |
logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE. Should the function send a warning in case the desried k is not available? |
NA_to_0L |
logical. default is TRUE. When no clusters are possible, Should the function return 0 (TRUE, default), or NA (when set to FALSE). |
dend_heights_per_k |
a named vector that resulted from running.
|
try_cutree_hclust |
logical. default is TRUE. Since cutree for hclust is MUCH faster than for dendrogram - cutree.dendrogram will first try to change the dendrogram into an hclust object. If it will fail (for example, with unbranched trees), it will continue using the cutree.dendrogram function. If try_cutree_hclust=FALSE, it will force to use cutree.dendrogram and not cutree.hclust. |
At least one of k or h must be specified, k overrides h if both are given.
as opposed to cutree for hclust, cutree.dendrogram
allows the
cutting of trees at a given height also for non-ultrametric trees
(ultrametric tree == a tree with monotone clustering heights).
If k or h are scalar - cutree.dendrogram
returns an integer vector with group
memberships.
Otherwise a matrix with group memberships is returned where each column
corresponds to the elements of k or h, respectively
(which are also used as column names).
In case there exists no such k for which exists a relevant split of the dendrogram, a warning is issued to the user, and NA is returned.
cutree.dendrogram
was written by Tal Galili.
cutree.hclust
is redirecting the function
to cutree from base R.
hclust
, cutree
,
cutree_1h.dendrogram
, cutree_1k.dendrogram
,
## Not run: hc <- hclust(dist(USArrests[c(1, 6, 13, 20, 23), ]), "ave") dend <- as.dendrogram(hc) unbranch_dend <- unbranch(dend, 2) cutree(hc, k = 2:4) # on hclust cutree(dend, k = 2:4) # on dendrogram cutree(hc, k = 2) # on hclust cutree(dend, k = 2) # on dendrogram cutree(dend, h = c(20, 25.5, 50, 170)) cutree(hc, h = c(20, 25.5, 50, 170)) # the default (ordered by original data's order) cutree(dend, k = 2:3, order_clusters_as_data = FALSE) labels(dend) # as.hclust(unbranch_dend) # ERROR - can not do this... cutree(unbranch_dend, k = 2) # all NA's cutree(unbranch_dend, k = 1:4) cutree(unbranch_dend, h = c(20, 25.5, 50, 170)) cutree(dend, h = c(20, 25.5, 50, 170)) library(microbenchmark) ## this shows how as.hclust is expensive - but still worth it if possible microbenchmark( cutree(hc, k = 2:4), cutree(as.hclust(dend), k = 2:4), cutree(dend, k = 2:4), cutree(dend, k = 2:4, try_cutree_hclust = FALSE) ) # the dendrogram is MUCH slower... # Unit: microseconds ## expr min lq median uq max neval ## cutree(hc, k = 2:4) 91.270 96.589 99.3885 107.5075 338.758 100 ## tree(as.hclust(dend), ## k = 2:4) 1701.629 1767.700 1854.4895 2029.1875 8736.591 100 ## cutree(dend, k = 2:4) 1807.456 1869.887 1963.3960 2125.2155 5579.705 100 ## cutree(dend, k = 2:4, ## try_cutree_hclust = FALSE) 8393.914 8570.852 8755.3490 9686.7930 14194.790 100 # and trying to "hclust" is not expensive (which is nice...) microbenchmark( cutree_unbranch_dend = cutree(unbranch_dend, k = 2:4), cutree_unbranch_dend_not_trying_to_hclust = cutree(unbranch_dend, k = 2:4, try_cutree_hclust = FALSE) ) ## Unit: milliseconds ## expr min lq median uq max neval ## cutree_unbranch_dend 7.309329 7.428314 7.494107 7.752234 17.59581 100 ## cutree_unbranch_dend_not ## _trying_to_hclust 6.945375 7.079198 7.148629 7.577536 16.99780 100 ## There were 50 or more warnings (use warnings() to see the first 50) # notice that if cutree can't find clusters for the desired k/h, it will produce 0's instead! # (It will produce a warning though...) # This is a different behaviout than stats::cutree # For example: cutree(as.dendrogram(hclust(dist(c(1, 1, 1, 2, 2)))), k = 5 ) ## End(Not run)
## Not run: hc <- hclust(dist(USArrests[c(1, 6, 13, 20, 23), ]), "ave") dend <- as.dendrogram(hc) unbranch_dend <- unbranch(dend, 2) cutree(hc, k = 2:4) # on hclust cutree(dend, k = 2:4) # on dendrogram cutree(hc, k = 2) # on hclust cutree(dend, k = 2) # on dendrogram cutree(dend, h = c(20, 25.5, 50, 170)) cutree(hc, h = c(20, 25.5, 50, 170)) # the default (ordered by original data's order) cutree(dend, k = 2:3, order_clusters_as_data = FALSE) labels(dend) # as.hclust(unbranch_dend) # ERROR - can not do this... cutree(unbranch_dend, k = 2) # all NA's cutree(unbranch_dend, k = 1:4) cutree(unbranch_dend, h = c(20, 25.5, 50, 170)) cutree(dend, h = c(20, 25.5, 50, 170)) library(microbenchmark) ## this shows how as.hclust is expensive - but still worth it if possible microbenchmark( cutree(hc, k = 2:4), cutree(as.hclust(dend), k = 2:4), cutree(dend, k = 2:4), cutree(dend, k = 2:4, try_cutree_hclust = FALSE) ) # the dendrogram is MUCH slower... # Unit: microseconds ## expr min lq median uq max neval ## cutree(hc, k = 2:4) 91.270 96.589 99.3885 107.5075 338.758 100 ## tree(as.hclust(dend), ## k = 2:4) 1701.629 1767.700 1854.4895 2029.1875 8736.591 100 ## cutree(dend, k = 2:4) 1807.456 1869.887 1963.3960 2125.2155 5579.705 100 ## cutree(dend, k = 2:4, ## try_cutree_hclust = FALSE) 8393.914 8570.852 8755.3490 9686.7930 14194.790 100 # and trying to "hclust" is not expensive (which is nice...) microbenchmark( cutree_unbranch_dend = cutree(unbranch_dend, k = 2:4), cutree_unbranch_dend_not_trying_to_hclust = cutree(unbranch_dend, k = 2:4, try_cutree_hclust = FALSE) ) ## Unit: milliseconds ## expr min lq median uq max neval ## cutree_unbranch_dend 7.309329 7.428314 7.494107 7.752234 17.59581 100 ## cutree_unbranch_dend_not ## _trying_to_hclust 6.945375 7.079198 7.148629 7.577536 16.99780 100 ## There were 50 or more warnings (use warnings() to see the first 50) # notice that if cutree can't find clusters for the desired k/h, it will produce 0's instead! # (It will produce a warning though...) # This is a different behaviout than stats::cutree # For example: cutree(as.dendrogram(hclust(dist(c(1, 1, 1, 2, 2)))), k = 5 ) ## End(Not run)
Cuts a dendrogram tree into several groups by specifying the desired cut height (only a single height!).
cutree_1h.dendrogram( dend, h, order_clusters_as_data = TRUE, use_labels_not_values = TRUE, warn = dendextend_options("warn"), ... )
cutree_1h.dendrogram( dend, h, order_clusters_as_data = TRUE, use_labels_not_values = TRUE, warn = dendextend_options("warn"), ... )
dend |
a dendrogram object |
h |
numeric scalar (NOT a vector) with a height where the dend should be cut. |
order_clusters_as_data |
logical, defaults to TRUE. There are two ways by which to order the clusters: 1) By the order of the original data. 2) by the order of the labels in the dendrogram. In order to be consistent with cutree, this is set to TRUE. |
use_labels_not_values |
logical, defaults to TRUE. If the actual labels of the clusters do not matter - and we want to gain speed (say, 10 times faster) - then use FALSE (gives the "leaves order" instead of their labels.). |
warn |
logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE. |
... |
(not currently in use) |
cutree_1h.dendrogram
returns an integer vector with group memberships
Tal Galili
hc <- hclust(dist(USArrests[c(1, 6, 13, 20, 23), ]), "ave") dend <- as.dendrogram(hc) cutree(hc, h = 50) # on hclust cutree_1h.dendrogram(dend, h = 50) # on a dendrogram labels(dend) # the default (ordered by original data's order) cutree_1h.dendrogram(dend, h = 50, order_clusters_as_data = TRUE) # A different order of labels - order by their order in the tree cutree_1h.dendrogram(dend, h = 50, order_clusters_as_data = FALSE) # make it faster ## Not run: library(microbenchmark) microbenchmark( cutree_1h.dendrogram(dend, h = 50), cutree_1h.dendrogram(dend, h = 50, use_labels_not_values = FALSE) ) # 0.8 vs 0.6 sec - for 100 runs ## End(Not run)
hc <- hclust(dist(USArrests[c(1, 6, 13, 20, 23), ]), "ave") dend <- as.dendrogram(hc) cutree(hc, h = 50) # on hclust cutree_1h.dendrogram(dend, h = 50) # on a dendrogram labels(dend) # the default (ordered by original data's order) cutree_1h.dendrogram(dend, h = 50, order_clusters_as_data = TRUE) # A different order of labels - order by their order in the tree cutree_1h.dendrogram(dend, h = 50, order_clusters_as_data = FALSE) # make it faster ## Not run: library(microbenchmark) microbenchmark( cutree_1h.dendrogram(dend, h = 50), cutree_1h.dendrogram(dend, h = 50, use_labels_not_values = FALSE) ) # 0.8 vs 0.6 sec - for 100 runs ## End(Not run)
Cuts a dendrogram tree into several groups by specifying the desired number of clusters k (only a single k value!).
In case there exists no such k for which exists a relevant split of the dendrogram, a warning is issued to the user, and NA is returned.
cutree_1k.dendrogram( dend, k, dend_heights_per_k = NULL, use_labels_not_values = TRUE, order_clusters_as_data = TRUE, warn = dendextend_options("warn"), ... )
cutree_1k.dendrogram( dend, k, dend_heights_per_k = NULL, use_labels_not_values = TRUE, order_clusters_as_data = TRUE, warn = dendextend_options("warn"), ... )
dend |
a dendrogram object |
k |
numeric scalar (not a vector!) with the number of clusters the tree should be cut into. |
dend_heights_per_k |
a named vector that resulted from running.
|
use_labels_not_values |
logical, defaults to TRUE. If the actual labels of the
clusters do not matter - and we want to gain speed (say, 10 times faster) -
then use FALSE (gives the "leaves order" instead of their labels.).
This is passed to |
order_clusters_as_data |
logical, defaults to TRUE. There are two ways by which
to order the clusters: 1) By the order of the original data. 2) by the order of the
labels in the dendrogram. In order to be consistent with cutree, this is set
to TRUE.
This is passed to |
warn |
logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE. Should the function send a warning in case the desried k is not available? |
... |
(not currently in use) |
cutree_1k.dendrogram
returns an integer vector with group
memberships.
In case there exists no such k for which exists a relevant split of the dendrogram, a warning is issued to the user, and NA is returned.
Tal Galili
hclust
, cutree
,
cutree_1h.dendrogram
hc <- hclust(dist(USArrests[c(1, 6, 13, 20, 23), ]), "ave") dend <- as.dendrogram(hc) cutree(hc, k = 3) # on hclust cutree_1k.dendrogram(dend, k = 3) # on a dendrogram labels(dend) # the default (ordered by original data's order) cutree_1k.dendrogram(dend, k = 3, order_clusters_as_data = TRUE) # A different order of labels - order by their order in the tree cutree_1k.dendrogram(dend, k = 3, order_clusters_as_data = FALSE) # make it faster ## Not run: library(microbenchmark) dend_ks <- heights_per_k.dendrogram microbenchmark( cutree_1k.dendrogram = cutree_1k.dendrogram(dend, k = 4), cutree_1k.dendrogram_no_labels = cutree_1k.dendrogram(dend, k = 4, use_labels_not_values = FALSE ), cutree_1k.dendrogram_no_labels_per_k = cutree_1k.dendrogram(dend, k = 4, use_labels_not_values = FALSE, dend_heights_per_k = dend_ks ) ) # the last one is the fastest... ## End(Not run)
hc <- hclust(dist(USArrests[c(1, 6, 13, 20, 23), ]), "ave") dend <- as.dendrogram(hc) cutree(hc, k = 3) # on hclust cutree_1k.dendrogram(dend, k = 3) # on a dendrogram labels(dend) # the default (ordered by original data's order) cutree_1k.dendrogram(dend, k = 3, order_clusters_as_data = TRUE) # A different order of labels - order by their order in the tree cutree_1k.dendrogram(dend, k = 3, order_clusters_as_data = FALSE) # make it faster ## Not run: library(microbenchmark) dend_ks <- heights_per_k.dendrogram microbenchmark( cutree_1k.dendrogram = cutree_1k.dendrogram(dend, k = 4), cutree_1k.dendrogram_no_labels = cutree_1k.dendrogram(dend, k = 4, use_labels_not_values = FALSE ), cutree_1k.dendrogram_no_labels_per_k = cutree_1k.dendrogram(dend, k = 4, use_labels_not_values = FALSE, dend_heights_per_k = dend_ks ) ) # the last one is the fastest... ## End(Not run)
Plots two trees side by side, highlighting edges unique to each tree in red.
dend_diff(dend, ...) ## S3 method for class 'dendrogram' dend_diff(dend, dend2, horiz = TRUE, ...) ## S3 method for class 'dendlist' dend_diff(dend, ..., which = c(1L, 2L))
dend_diff(dend, ...) ## S3 method for class 'dendrogram' dend_diff(dend, dend2, horiz = TRUE, ...) ## S3 method for class 'dendlist' dend_diff(dend, ..., which = c(1L, 2L))
dend |
a dendrogram or dendlist to compre with |
... |
passed to plot.dendrogram |
dend2 |
a dendrogram to compare with |
horiz |
logical (TRUE) indicating if the dendrogram should be drawn horizontally or not. |
which |
an integer vector indicating, in the case "dend" is a dendlist, on which of the trees should the modification be performed. If missing - the change will be performed on all of objects in the dendlist. |
Invisible dendlist of both trees.
A dendrogram implementation for phylo.diff from the distory package
distinct_edges, highlight_distinct_edges, dist.dendlist, tanglegram assign_values_to_branches_edgePar, distinct.edges,
x <- 1:5 %>% dist() %>% hclust() %>% as.dendrogram() y <- set(x, "labels", 5:1) dend_diff(x, y) dend_diff(dendlist(x, y)) dend_diff(dendlist(y, x)) dend1 <- 1:10 %>% dist() %>% hclust() %>% as.dendrogram() dend2 <- dend1 %>% set("labels", c(1, 3, 2, 4, 5:10)) dend_diff(dend1, dend2)
x <- 1:5 %>% dist() %>% hclust() %>% as.dendrogram() y <- set(x, "labels", 5:1) dend_diff(x, y) dend_diff(dendlist(x, y)) dend_diff(dendlist(y, x)) dend1 <- 1:10 %>% dist() %>% hclust() %>% as.dendrogram() dend2 <- dend1 %>% set("labels", c(1, 3, 2, 4, 5:10)) dend_diff(dend1, dend2)
There are many options for choosing distance and linkage functions for hclust. This function goes through various combinations of the two and helps find the one that is most "similar" to the original distance matrix.
dend_expend( x, dist_methods = c("euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski"), hclust_methods = c("ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median", "centroid"), hclust_fun = hclust, optim_fun = cor_cophenetic, ... ) find_dend(x, ...)
dend_expend( x, dist_methods = c("euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski"), hclust_methods = c("ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median", "centroid"), hclust_fun = hclust, optim_fun = cor_cophenetic, ... ) find_dend(x, ...)
x |
A matrix or a data.frame. Can also be a dist object. |
dist_methods |
A vector of possible dist methods. |
hclust_methods |
A vector of possible hclust methods. |
hclust_fun |
By default hclust. |
optim_fun |
A function that accepts a dend and a dist and returns how the two are in agreement. Default is cor_cophenetic. |
... |
options passed from find_dend to dend_expend. |
dend_expend: A list with three items. The first item is called "dends" and includes a dendlist with all the possible dendrogram combinations. The second is "dists" and includes a list with all the possible distance matrix combination. The third. "performance", is data.frame with three columns: dist_methods, hclust_methods, and optim. optim is calculated (by default) as the cophenetic correlation (see: cor_cophenetic) between the distance matrix and the cophenetic distance of the hclust object.
find_dend: A dendrogram which is "optimal" based on the output from dend_expend.
x <- datasets::mtcars out <- dend_expend(x, dist_methods = c("euclidean", "manhattan")) out$performance dend_expend(dist(x))$performance best_dend <- find_dend(x, dist_methods = c("euclidean", "manhattan")) plot(best_dend)
x <- datasets::mtcars out <- dend_expend(x, dist_methods = c("euclidean", "manhattan")) out$performance dend_expend(dist(x))$performance best_dend <- find_dend(x, dist_methods = c("euclidean", "manhattan")) plot(best_dend)
This is a function inside its own environment. This enables a bunch of functions to be manipulated outside the package, even when they are called from function within the dendextend package.
TODO: describe options.
A new "warn" dendextend_options parameter. logical (FALSE). Should warning be issued?
dendextend_options(option, value)
dendextend_options(option, value)
option |
a character scalar of the value of the options we would like to access or update. |
value |
any value that we would like to update into the "option" element in dendextend_options |
a list with functions
Kurt Hornik
dendextend_options("a") dendextend_options("a", 1) dendextend_options("a") dendextend_options("a", NULL) dendextend_options("a") dendextend_options()
dendextend_options("a") dendextend_options("a", 1) dendextend_options("a") dendextend_options("a", NULL) dendextend_options("a") dendextend_options()
It accepts several dendrograms and or dendlist objects and chain them all together. This function aim to help with the usability of comparing two or more dendrograms.
dendlist(..., which) ## S3 method for class 'dendlist' plot(x, which = c(1L, 2L), ...)
dendlist(..., which) ## S3 method for class 'dendlist' plot(x, which = c(1L, 2L), ...)
... |
several dendrogram/hclust/phylo or dendlist objects If an object is hclust or phylo - it will be converted into a dendrogram. |
which |
an integer vector of length 2, indicating which of the trees in the dendlist object should be plotted (relevant for dendlist) When used inside dendlist, which is still an integer, but it can be of any length, and it can be used to create a smaller dendlist. |
x |
a dendlist object |
It there are list() in the ..., they are omitted. If ... is missing, it returns an empty dendlist.
A list of class dendlist where each item is a dendrogram
## Not run: dend <- iris[, -5] %>% dist() %>% hclust() %>% as.dendrogram() dend2 <- iris[, -5] %>% dist() %>% hclust(method = "single") %>% as.dendrogram() dendlist(1:4, 5, a = dend) # Error # dendlist <- function (...) list(...) dendlist(dend) dendlist(dend, dend) dendlist(dend, dend, dendlist(dend)) # notice how the order of dendlist(dend, dend2) dendlist(dend) %>% dendlist(dend2) dendlist(dend) %>% dendlist(dend2) %>% dendlist(dend) dendlist(dend, dend2) %>% tanglegram() tanglegram(tree1 = dendlist(dend, dend2)) dend <- iris[1:20, -5] %>% dist() %>% hclust() %>% as.dendrogram() dend2 <- iris[1:20, -5] %>% dist() %>% hclust(method = "single") %>% as.dendrogram() x <- dendlist(dend, dend2) plot(x) ## End(Not run)
## Not run: dend <- iris[, -5] %>% dist() %>% hclust() %>% as.dendrogram() dend2 <- iris[, -5] %>% dist() %>% hclust(method = "single") %>% as.dendrogram() dendlist(1:4, 5, a = dend) # Error # dendlist <- function (...) list(...) dendlist(dend) dendlist(dend, dend) dendlist(dend, dend, dendlist(dend)) # notice how the order of dendlist(dend, dend2) dendlist(dend) %>% dendlist(dend2) dendlist(dend) %>% dendlist(dend2) %>% dendlist(dend) dendlist(dend, dend2) %>% tanglegram() tanglegram(tree1 = dendlist(dend, dend2)) dend <- iris[1:20, -5] %>% dist() %>% hclust() %>% as.dendrogram() dend2 <- iris[1:20, -5] %>% dist() %>% hclust(method = "single") %>% as.dendrogram() x <- dendlist(dend, dend2) plot(x) ## End(Not run)
Implements dendrogram seriation. The function tries to turn the dend into hclust, on which it runs DendSer.
Also, if a distance matrix is missing, it will try to use the cophenetic distance.
DendSer.dendrogram(dend, ser_weight, ...)
DendSer.dendrogram(dend, ser_weight, ...)
dend |
An object of class dendrogram |
ser_weight |
Used by cost function to evaluate ordering. For cost=costLS, this is a vector of object weights. Otherwise is a dist or symmetric matrix. passed to DendSer. If it is missing, the cophenetic distance is used instead. |
... |
parameters passed to DendSer |
Numeric vector giving an optimal dendrogram order
DendSer
, DendSer.dendrogram ,
untangle_DendSer, rotate_DendSer
## Not run: library(DendSer) # already used from within the function hc <- hclust(dist(USArrests[1:4, ]), "ave") dend <- as.dendrogram(hc) DendSer.dendrogram(dend) ## End(Not run)
## Not run: library(DendSer) # already used from within the function hc <- hclust(dist(USArrests[1:4, ]), "ave") dend <- as.dendrogram(hc) DendSer.dendrogram(dend) ## End(Not run)
Turns a dist object from a "wide" to a "long" table
dist_long(d, ...)
dist_long(d, ...)
d |
a distance object |
... |
not used |
A data.frame with two columns of rows and column names of the dist object and a third column (distance) with the distance between the two.
data(iris) iris[2:6, -5] %>% dist() %>% data.matrix() iris[2:6, -5] %>% dist() %>% as.vector() iris[2:6, -5] %>% dist() %>% dist_long() # This can later be used to making a network plot based on the distances.
data(iris) iris[2:6, -5] %>% dist() %>% data.matrix() iris[2:6, -5] %>% dist() %>% as.vector() iris[2:6, -5] %>% dist() %>% dist_long() # This can later be used to making a network plot based on the distances.
This function seems to bring different results than ape - checking this out is still an open issue: github issue
This function computes the Robinson-Foulds distance (also known as symmetric difference) between two dendrograms. This is the number of edges (branches) in tree_1 with a combination of labels that exist in it but not in any subtree of tree2, plus the same calculation of tree2 when compared to tree1. This is the sum of length of distinct_edges(x,y) with distinct_edges(y,x).
This function might implement other topological distances in the future.
dist.dendlist(dend, method = c("edgeset"), ...)
dist.dendlist(dend, method = c("edgeset"), ...)
dend |
a dendlist |
method |
currently only 'edgeset' is implemented. |
... |
Ignored. |
A dist object with topological distances between all trees
distinct_edges, dist.topo, dist.multiPhylo, treedist,
x <- 1:5 %>% dist() %>% hclust() %>% as.dendrogram() y <- set(x, "labels", 5:1) dist.dendlist(dendlist(x1 = x, x2 = x, y1 = y)) dend_diff(x, y) # Larger trees x <- 1:6 %>% dist() %>% hclust() %>% as.dendrogram() y <- set(x, "labels", c(1:3, 6, 4, 5)) dend_diff(x, y) dist.dendlist(dendlist(x, y)) distinct_edges(x, y) distinct_edges(y, x) length(distinct_edges(x, y)) + length(distinct_edges(y, x)) # dist.dendlist
x <- 1:5 %>% dist() %>% hclust() %>% as.dendrogram() y <- set(x, "labels", 5:1) dist.dendlist(dendlist(x1 = x, x2 = x, y1 = y)) dend_diff(x, y) # Larger trees x <- 1:6 %>% dist() %>% hclust() %>% as.dendrogram() y <- set(x, "labels", c(1:3, 6, 4, 5)) dend_diff(x, y) dist.dendlist(dendlist(x, y)) distinct_edges(x, y) distinct_edges(y, x) length(distinct_edges(x, y)) + length(distinct_edges(y, x)) # dist.dendlist
Finds the edges present in the first tree but not in the second
distinct_edges(dend, dend2, ...)
distinct_edges(dend, dend2, ...)
dend |
a dendrogram to find unique edges in |
dend2 |
a dendrogram to compare with |
... |
Ignored. |
A numeric vector of edge ids for the first tree (dend) that are not present in the second tree (dend2).
A dendrogram implementation for distinct.edges from the distory package
distinct_edges, highlight_distinct_edges, dist.dendlist, tanglegram distinct.edges
x <- 1:5 %>% dist() %>% hclust() %>% as.dendrogram() y <- set(x, "labels", 5:1) distinct_edges(x, y) distinct_edges(y, x) dend_diff(x, y) # tanglegram(x, y)
x <- 1:5 %>% dist() %>% hclust() %>% as.dendrogram() y <- set(x, "labels", 5:1) distinct_edges(x, y) distinct_edges(y, x) dend_diff(x, y) # tanglegram(x, y)
Duplicates a leaf in a tree. Useful for non-parametric bootstraping trees since it emulates what would have happened if the tree was constructed based on a row-sample with replacments from the original data matrix.
duplicate_leaf( dend, leaf_label, times, fix_members = TRUE, fix_order = TRUE, fix_midpoint = TRUE, ... )
duplicate_leaf( dend, leaf_label, times, fix_members = TRUE, fix_order = TRUE, fix_midpoint = TRUE, ... )
dend |
a dendrogram object |
leaf_label |
the label of the laef to replicate. |
times |
the number of times we will have this leaf after replication |
fix_members |
logical (TRUE). Fix the number of members in attr using fix_members_attr.dendrogram |
fix_order |
logical (TRUE). Fix the leaves order |
fix_midpoint |
logical (TRUE). Fix the midpoint value. If TRUE, it overrides "fix_members" and turns it into TRUE (since it must have a correct number of members in order to work). values using rank_order.dendrogram |
... |
not used |
A dendrogram, after duplicating one of its leaves.
## Not run: # define dendrogram object to play with: dend <- USArrests[1:3, ] %>% dist() %>% hclust(method = "ave") %>% as.dendrogram() plot(dend) duplicate_leaf(dend, "Alaska", 3) duplicate_leaf(dend, "Arizona", 2, fix_members = FALSE, fix_order = FALSE) plot(duplicate_leaf(dend, "Alaska", 2)) plot(duplicate_leaf(dend, "Alaska", 4)) plot(duplicate_leaf(dend, "Arizona", 2)) plot(duplicate_leaf(dend, "Arizona", 4)) ## End(Not run)
## Not run: # define dendrogram object to play with: dend <- USArrests[1:3, ] %>% dist() %>% hclust(method = "ave") %>% as.dendrogram() plot(dend) duplicate_leaf(dend, "Alaska", 3) duplicate_leaf(dend, "Arizona", 2, fix_members = FALSE, fix_order = FALSE) plot(duplicate_leaf(dend, "Alaska", 2)) plot(duplicate_leaf(dend, "Alaska", 4)) plot(duplicate_leaf(dend, "Arizona", 2)) plot(duplicate_leaf(dend, "Arizona", 4)) ## End(Not run)
Measures the entanglement between two trees. Entanglement is a measure between 1 (full entanglement) and 0 (no entanglement). The exact behavior of the number depends on the L norm which is chosen.
entanglement(dend1, ...) ## S3 method for class 'hclust' entanglement(dend1, dend2, ...) ## S3 method for class 'phylo' entanglement(dend1, dend2, ...) ## S3 method for class 'dendlist' entanglement(dend1, which = c(1L, 2L), ...) ## S3 method for class 'dendrogram' entanglement( dend1, dend2, L = 1.5, leaves_matching_method = c("labels", "order"), ... )
entanglement(dend1, ...) ## S3 method for class 'hclust' entanglement(dend1, dend2, ...) ## S3 method for class 'phylo' entanglement(dend1, dend2, ...) ## S3 method for class 'dendlist' entanglement(dend1, which = c(1L, 2L), ...) ## S3 method for class 'dendrogram' entanglement( dend1, dend2, L = 1.5, leaves_matching_method = c("labels", "order"), ... )
dend1 |
a tree object (of class dendrogram/hclust/phylo). |
... |
not used |
dend2 |
a tree object (of class dendrogram/hclust/phylo). |
which |
an integer vector of length 2, indicating which of the trees in a dendlist object should have their entanglement calculated |
L |
the distance norm to use for measuring the distance between the two trees. It can be any positive number, often one will want to use 0, 1, 1.5, 2 (see 'details' for more). |
leaves_matching_method |
a character scalar, either "order" or "labels" (default) . If using "labels", then we use the labels for matching the leaves order value (safer). And if "order" then we use the old leaves order value for matching the leaves order value. Using "order" is faster, but "labels" is safer. "order" will assume that the original two trees had their labels and order values MATCHED. Hence, it is best to make sure that the trees used here have the same labels and the SAME values matched to these values - and then use "order" (for fastest results). |
Entanglement is measured by giving the left tree's labels the values of
1 till tree size, and than match these numbers with the right tree.
Now, entanglement is the L norm distance between these two vectors.
That is, we take the sum of the absolute difference (each one in the power
of L). e.g: sum(abs(x-y)^L)
.
And this is devided by the "worst case" entanglement level (e.g:
when the right tree is the complete reverse of the left tree).
L tells us which panelty level we are at (L0, L1, L2, partial L's etc). L>1 means that we give a big panelty for sharp angles. While L->0 means that any time something is not a streight horizontal line, it gets a large penalty If L=0.1 it means that we much prefer streight lines over non streight lines
The number of leaves in the tree
tanglegram, match_order_by_labels.
## Not run: dend1 <- iris[, -5] %>% dist() %>% hclust("com") %>% as.dendrogram() dend2 <- iris[, -5] %>% dist() %>% hclust("sin") %>% as.dendrogram() dend12 <- dendlist(dend1, dend2) tanglegram(dend12) entanglement(dend12) entanglement(dend12, L = 0) entanglement(dend12, L = 0.25) entanglement(dend1, dend2, L = 0) # 1 entanglement(dend1, dend2, L = 0.25) # 0.97 entanglement(dend1, dend2, L = 1) # 0.93 entanglement(dend1, dend2, L = 2) # 0.88 # a somewhat better tanglegram tanglegram(sort(dend1), sort(dend2)) # and alos a MUCH better entanglement entanglement(sort(dend1), sort(dend2), L = 1.5) # 0.0811 # but not that much, for L=0.25 entanglement(sort(dend1), sort(dend2), L = .25) # 0.579 ################## ################## ################## # massing up the order of leaves is dangerous: entanglement(dend1, dend2, 1.5, "order") # 0.91 order.dendrogram(dend2) <- seq_len(nleaves(dend2)) # this 0.95 number is NO LONGER correct!! entanglement(dend1, dend2, 1.5, "order") # 0.95 # but if we use the "labels" method - we still get the correct number: entanglement(dend1, dend2, 1.5, "labels") # 0.91 # however, we can fix our dend2, as follows: dend2 <- match_order_by_labels(dend2, dend1) # Now that labels and order are matched - entanglement is back at working fine: entanglement(dend1, dend2, 1.5, "order") # 0.91 ## End(Not run)
## Not run: dend1 <- iris[, -5] %>% dist() %>% hclust("com") %>% as.dendrogram() dend2 <- iris[, -5] %>% dist() %>% hclust("sin") %>% as.dendrogram() dend12 <- dendlist(dend1, dend2) tanglegram(dend12) entanglement(dend12) entanglement(dend12, L = 0) entanglement(dend12, L = 0.25) entanglement(dend1, dend2, L = 0) # 1 entanglement(dend1, dend2, L = 0.25) # 0.97 entanglement(dend1, dend2, L = 1) # 0.93 entanglement(dend1, dend2, L = 2) # 0.88 # a somewhat better tanglegram tanglegram(sort(dend1), sort(dend2)) # and alos a MUCH better entanglement entanglement(sort(dend1), sort(dend2), L = 1.5) # 0.0811 # but not that much, for L=0.25 entanglement(sort(dend1), sort(dend2), L = .25) # 0.579 ################## ################## ################## # massing up the order of leaves is dangerous: entanglement(dend1, dend2, 1.5, "order") # 0.91 order.dendrogram(dend2) <- seq_len(nleaves(dend2)) # this 0.95 number is NO LONGER correct!! entanglement(dend1, dend2, 1.5, "order") # 0.95 # but if we use the "labels" method - we still get the correct number: entanglement(dend1, dend2, 1.5, "labels") # 0.91 # however, we can fix our dend2, as follows: dend2 <- match_order_by_labels(dend2, dend1) # Now that labels and order are matched - entanglement is back at working fine: entanglement(dend1, dend2, 1.5, "order") # 0.91 ## End(Not run)
Turning a factor into a number is not trivial.
Using as.numeric
would only return to us the indicator numbers
and NOT the factor levels turned into a number.
fac2num simply turns a factor into a number, as we often need.
fac2num(x, force_integer = FALSE, keep_names = TRUE, ...)
fac2num(x, force_integer = FALSE, keep_names = TRUE, ...)
x |
an object. |
force_integer |
logical (FALSE). Should the values returned be integers? |
keep_names |
logical (TRUE). Should the values returned keep the names of the original vector? |
... |
ignored. |
if x is an object - it returns logical - is the object of class dendrogram.
x <- factor(3:5) as.numeric(x) # 1 2 3 fac2num(x) # 3 4 5
x <- factor(3:5) as.numeric(x) # 1 2 3 fac2num(x) # 3 4 5
Given a dendrogram object, the function performs a recursive DFS algorithm to determine the sub-dendrogram which is composed of (exactly) all 'selected_labels'.
find_dendrogram(dend, selected_labels)
find_dendrogram(dend, selected_labels)
dend |
a dendrogram object |
selected_labels |
A character vector with the labels we expect to have in the sub-dendrogram. This doesn't have to be in the same order as in the dendrogram. |
Either a sub-dendrogram composed of only members of selected_labels. If such a sub-dendrogram doesn't exist, the function returns NULL.
## Not run: # define dendrogram object to play with: dend <- iris[, -5] %>% dist() %>% hclust() %>% as.dendrogram() %>% set("labels_to_character") %>% color_branches(k = 5) first.subdend.only <- names(cutree(dend, 4)[cutree(dend, 4) == 1]) sub.dend <- find_dendrogram(dend, first.subdend.only) # Plotting the result par(mfrow = c(1, 2)) plot(dend, main = "Original dendrogram") plot(sub.dend, main = "First subdendrogram") dend <- 1:10 %>% dist() %>% hclust() %>% as.dendrogram() %>% set("labels_to_character") %>% color_branches(k = 5) selected_labels <- as.character(1:4) sub_dend <- find_dendrogram(dend, selected_labels) plot(dend, main = "Original dendrogram") plot(sub_dend, main = "First subdendrogram") ## End(Not run)
## Not run: # define dendrogram object to play with: dend <- iris[, -5] %>% dist() %>% hclust() %>% as.dendrogram() %>% set("labels_to_character") %>% color_branches(k = 5) first.subdend.only <- names(cutree(dend, 4)[cutree(dend, 4) == 1]) sub.dend <- find_dendrogram(dend, first.subdend.only) # Plotting the result par(mfrow = c(1, 2)) plot(dend, main = "Original dendrogram") plot(sub.dend, main = "First subdendrogram") dend <- 1:10 %>% dist() %>% hclust() %>% as.dendrogram() %>% set("labels_to_character") %>% color_branches(k = 5) selected_labels <- as.character(1:4) sub_dend <- find_dendrogram(dend, selected_labels) plot(dend, main = "Original dendrogram") plot(sub_dend, main = "First subdendrogram") ## End(Not run)
This function estimates the number of clusters based on the maximal average silhouette width derived from running pam on the cophenetic distance matrix of the dendrogram. The output is based on the pamk output.
find_k(dend, krange = 2:min(10, (nleaves(dend) - 1)), ...) ## S3 method for class 'find_k' plot( x, xlab = "Number of clusters (k)", ylab = "Average silhouette width", main = "Estimating the number of clusters using\n average silhouette width", ... )
find_k(dend, krange = 2:min(10, (nleaves(dend) - 1)), ...) ## S3 method for class 'find_k' plot( x, xlab = "Number of clusters (k)", ylab = "Average silhouette width", main = "Estimating the number of clusters using\n average silhouette width", ... )
dend |
A dendrogram (or hclust) tree object |
krange |
integer vector. Numbers of clusters which are to be compared by the average silhouette width criterion. Note: average silhouette width and Calinski-Harabasz can't estimate number of clusters nc=1. If 1 is included, a Duda-Hart test is applied and 1 is estimated if this is not significant. |
... |
passed to pamk (the current defaults criterion="asw" and usepam=TRUE can not be changes). |
x |
An object of class "find_k" (has its own S3 plot method). |
xlab , ylab , main
|
parameters passed to plot. |
A pamk output. This is a list with the following components: 1) pamobject - The output of the optimal run of the pam-function. 2) nc - the optimal number of clusters. 3) crit - vector of criterion values for numbers of clusters. crit[1] is the p-value of the Duda-Hart test if 1 is in krange and diss=FALSE. 4) k - a copy of nc (just to make it easier to extract - since k is often used in other functions)
pamk, pam, silhouette.
dend <- iris[, -5] %>% dist() %>% hclust() %>% as.dendrogram() dend_k <- find_k(dend) plot(dend_k) plot(color_branches(dend, k = dend_k$nc)) library(cluster) sil <- silhouette(dend_k$pamobject) plot(sil) dend <- USArrests %>% dist() %>% hclust(method = "ave") %>% as.dendrogram() dend_k <- find_k(dend) plot(dend_k) plot(color_branches(dend, k = dend_k$nc))
dend <- iris[, -5] %>% dist() %>% hclust() %>% as.dendrogram() dend_k <- find_k(dend) plot(dend_k) plot(color_branches(dend, k = dend_k$nc)) library(cluster) sil <- silhouette(dend_k$pamobject) plot(sil) dend <- USArrests %>% dist() %>% hclust(method = "ave") %>% as.dendrogram() dend_k <- find_k(dend) plot(dend_k) plot(color_branches(dend, k = dend_k$nc))
Fix members attr in a dendrogram after (for example), the tree was pruned or manipulated.
fix_members_attr.dendrogram(dend, ...)
fix_members_attr.dendrogram(dend, ...)
dend |
a dendrogram object |
... |
not used |
A dendrogram, after adjusting the members attr in all of its nodes.
# define dendrogram object to play with: hc <- hclust(dist(USArrests[1:3, ]), "ave") dend <- as.dendrogram(hc) # plot(dend) # prune one leaf dend[[2]] <- dend[[2]][[1]] # plot(dend) dend # but it is NO LONGER true that it has 3 members total! fix_members_attr.dendrogram(dend) # it now knows it has only 2 members :) hc <- hclust(dist(USArrests[1:3, ]), "ave") dend <- as.dendrogram(hc) identical(prune_leaf(dend, "Alaska"), fix_members_attr.dendrogram(prune_leaf(dend, "Alaska"))) str(unclass(prune_leaf(dend, "Alaska"))) str(unclass(fix_members_attr.dendrogram(prune_leaf(dend, "Alaska"))))
# define dendrogram object to play with: hc <- hclust(dist(USArrests[1:3, ]), "ave") dend <- as.dendrogram(hc) # plot(dend) # prune one leaf dend[[2]] <- dend[[2]][[1]] # plot(dend) dend # but it is NO LONGER true that it has 3 members total! fix_members_attr.dendrogram(dend) # it now knows it has only 2 members :) hc <- hclust(dist(USArrests[1:3, ]), "ave") dend <- as.dendrogram(hc) identical(prune_leaf(dend, "Alaska"), fix_members_attr.dendrogram(prune_leaf(dend, "Alaska"))) str(unclass(prune_leaf(dend, "Alaska"))) str(unclass(fix_members_attr.dendrogram(prune_leaf(dend, "Alaska"))))
The function makes sure the two branches of the root of a dendrogram will have the same height. The user can choose how to decide which height to use.
flatten.dendrogram(dend, FUN = max, new_height, ...)
flatten.dendrogram(dend, FUN = max, new_height, ...)
dend |
dendrogram object |
FUN |
how to choose the new height of both branches (defaults to taking the max between the two) |
new_height |
overrides FUN, and sets the new height of the two branches manually |
... |
passed on (not used) |
A dendrogram with both of the root's branches of the same height
hc <- hclust(dist(USArrests[2:9, ]), "com") dend <- as.dendrogram(hc) attr(dend[[1]], "height") <- 150 # make the height un-equal par(mfrow = c(1, 2)) plot(dend, main = "original tree") plot(flatten.dendrogram(dend), main = "Raised tree")
hc <- hclust(dist(USArrests[2:9, ]), "com") dend <- as.dendrogram(hc) attr(dend[[1]], "height") <- 150 # make the height un-equal par(mfrow = c(1, 2)) plot(dend, main = "original tree") plot(flatten.dendrogram(dend), main = "Raised tree")
Rotate a branch in a tree so that the locations of two bundles of leaves are flipped.
flip_leaves(dend, leaves1, leaves2, ...)
flip_leaves(dend, leaves1, leaves2, ...)
dend |
a dendrogram object |
leaves1 |
a vector of leaves order value to flip. |
leaves2 |
a (second) vector of leaves order value to flip. |
... |
not used |
This function is based on a bunch of string manipulation functions. There may be a smarter/better way for doing it...
A dendrogram object with flipped leaves.
tanglegram, match_order_by_labels, entanglement.
## Not run: dend1 <- USArrests[1:5, ] %>% dist() %>% hclust() %>% as.dendrogram() dend2 <- flip_leaves(dend1, c(3, 5), c(1, 2)) tanglegram(dend1, dend2) entanglement(dend1, dend2, L = 2) # 0.4 ## End(Not run)
## Not run: dend1 <- USArrests[1:5, ] %>% dist() %>% hclust() %>% as.dendrogram() dend2 <- flip_leaves(dend1, c(3, 5), c(1, 2)) tanglegram(dend1, dend2) entanglement(dend1, dend2, L = 2) # 0.4 ## End(Not run)
Calculating Fowlkes-Mallows index.
The FM_index_R
function calculates the expectancy and variance of the FM Index
under the null hypothesis of no relation.
FM_index( A1_clusters, A2_clusters, assume_sorted_vectors = FALSE, warn = dendextend_options("warn"), ... )
FM_index( A1_clusters, A2_clusters, assume_sorted_vectors = FALSE, warn = dendextend_options("warn"), ... )
A1_clusters |
a numeric vector of cluster grouping (numeric) of items, with a name attribute of item name for each element from group A1. These are often obtained by using some k cut on a dendrogram. |
A2_clusters |
a numeric vector of cluster grouping (numeric) of items, with a name attribute of item name for each element from group A2. These are often obtained by using some k cut on a dendrogram. |
assume_sorted_vectors |
logical (FALSE). Can we assume to two group vectors are sorter so that they have the same order of items? IF FALSE (default), then the vectors will be sorted based on their name attribute. |
warn |
logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE. |
... |
Ignored |
From Wikipedia:
Fowlkes-Mallows index (see references) is an external evaluation method that is used to determine the similarity between two clusterings (clusters obtained after a clustering algorithm). This measure of similarity could be either between two hierarchical clusterings or a clustering and a benchmark classification. A higher the value for the Fowlkes-Mallows index indicates a greater similarity between the clusters and the benchmark classifications.
The Fowlkes-Mallows index between two vectors of clustering groups.
Includes the attributes E_FM and V_FM for the relevant expectancy and variance under the null hypothesis of no-relation.
Fowlkes, E. B.; Mallows, C. L. (1 September 1983). "A Method for Comparing Two Hierarchical Clusterings". Journal of the American Statistical Association 78 (383): 553.
https://en.wikipedia.org/wiki/Fowlkes-Mallows_index
## Not run: set.seed(23235) ss <- TRUE # sample(1:150, 10 ) hc1 <- hclust(dist(iris[ss, -5]), "com") hc2 <- hclust(dist(iris[ss, -5]), "single") # dend1 <- as.dendrogram(hc1) # dend2 <- as.dendrogram(hc2) # cutree(dend1) FM_index(cutree(hc1, k = 3), cutree(hc1, k = 3)) # 1 with EV # checking speed gains library(microbenchmark) microbenchmark( FM_index(cutree(hc1, k = 3), cutree(hc1, k = 3)), FM_index(cutree(hc1, k = 3), cutree(hc1, k = 3), assume_sorted_vectors = TRUE ), FM_index(cutree(hc1, k = 3), cutree(hc1, k = 3), assume_sorted_vectors = TRUE ) ) # C code is 1.2-1.3 times faster. set.seed(1341) FM_index(cutree(hc1, k = 3), sample(cutree(hc1, k = 3)), assume_sorted_vectors = TRUE ) # 0.38037 FM_index(cutree(hc1, k = 3), sample(cutree(hc1, k = 3)), assume_sorted_vectors = FALSE ) # 1 again :) FM_index(cutree(hc1, k = 3), cutree(hc2, k = 3)) # 0.8059 FM_index(cutree(hc1, k = 30), cutree(hc2, k = 30)) # 0.4529 fo <- function(k) FM_index(cutree(hc1, k), cutree(hc2, k)) lapply(1:4, fo) ks <- 1:150 plot(sapply(ks, fo) ~ ks, type = "b", main = "Bk plot for the iris dataset") ## End(Not run)
## Not run: set.seed(23235) ss <- TRUE # sample(1:150, 10 ) hc1 <- hclust(dist(iris[ss, -5]), "com") hc2 <- hclust(dist(iris[ss, -5]), "single") # dend1 <- as.dendrogram(hc1) # dend2 <- as.dendrogram(hc2) # cutree(dend1) FM_index(cutree(hc1, k = 3), cutree(hc1, k = 3)) # 1 with EV # checking speed gains library(microbenchmark) microbenchmark( FM_index(cutree(hc1, k = 3), cutree(hc1, k = 3)), FM_index(cutree(hc1, k = 3), cutree(hc1, k = 3), assume_sorted_vectors = TRUE ), FM_index(cutree(hc1, k = 3), cutree(hc1, k = 3), assume_sorted_vectors = TRUE ) ) # C code is 1.2-1.3 times faster. set.seed(1341) FM_index(cutree(hc1, k = 3), sample(cutree(hc1, k = 3)), assume_sorted_vectors = TRUE ) # 0.38037 FM_index(cutree(hc1, k = 3), sample(cutree(hc1, k = 3)), assume_sorted_vectors = FALSE ) # 1 again :) FM_index(cutree(hc1, k = 3), cutree(hc2, k = 3)) # 0.8059 FM_index(cutree(hc1, k = 30), cutree(hc2, k = 30)) # 0.4529 fo <- function(k) FM_index(cutree(hc1, k), cutree(hc2, k)) lapply(1:4, fo) ks <- 1:150 plot(sapply(ks, fo) ~ ks, type = "b", main = "Bk plot for the iris dataset") ## End(Not run)
Calculating Fowlkes-Mallows index under the null hypothesis of no relation between the clusterings (random order of the items labels).
FM_index_permutation( A1_clusters, A2_clusters, warn = dendextend_options("warn"), ... )
FM_index_permutation( A1_clusters, A2_clusters, warn = dendextend_options("warn"), ... )
A1_clusters |
a numeric vector of cluster grouping (numeric) of items, with a name attribute of item name for each element from group A1. These are often obtained by using some k cut on a dendrogram. |
A2_clusters |
a numeric vector of cluster grouping (numeric) of items, with a name attribute of item name for each element from group A2. These are often obtained by using some k cut on a dendrogram. |
warn |
logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE. |
... |
Ignored |
The Fowlkes-Mallows index between two vectors of clustering groups. Under H0. (a double without attr)
Fowlkes, E. B.; Mallows, C. L. (1 September 1983). "A Method for Comparing Two Hierarchical Clusterings". Journal of the American Statistical Association 78 (383): 553.
https://en.wikipedia.org/wiki/Fowlkes-Mallows_index
cor_bakers_gamma,
FM_index_R
, FM_index
## Not run: set.seed(23235) ss <- TRUE # sample(1:150, 10 ) hc1 <- hclust(dist(iris[ss, -5]), "com") hc2 <- hclust(dist(iris[ss, -5]), "single") # dend1 <- as.dendrogram(hc1) # dend2 <- as.dendrogram(hc2) # cutree(dend1) # small k A1_clusters <- cutree(hc1, k = 3) # will give a right tailed distribution # large k A1_clusters <- cutree(hc1, k = 50) # will give a discrete distribution # "medium" k A1_clusters <- cutree(hc1, k = 25) # gives almost the normal distribution! A2_clusters <- A1_clusters R <- 10000 set.seed(414130) FM_index_H0 <- replicate(R, FM_index_permutation(A1_clusters, A2_clusters)) # can take 10 sec plot(density(FM_index_H0), main = "FM Index distribution under H0\n (10000 permutation)") abline(v = mean(FM_index_H0), col = 1, lty = 2) # The permutation distribution is with a heavy right tail: # Source of the skew functions is based on: library(psych) skew <- function (x, na.rm = TRUE) { x <- na.omit(x) sum((x - mean(x))^3)/(length(x) * sd(x)^3) } skew(FM_index_H0) # 1.254 mean(FM_index_H0) var(FM_index_H0) the_FM_index <- FM_index(A1_clusters, A2_clusters) the_FM_index our_dnorm <- function(x) { dnorm(x, mean = attr(the_FM_index, "E_FM"), sd = sqrt(attr(the_FM_index, "V_FM")) ) } # our_dnorm(0.35) curve(our_dnorm, col = 4, from = -1, to = 1, n = R, add = TRUE ) abline(v = attr(the_FM_index, "E_FM"), col = 4, lty = 2) legend("topright", legend = c("asymptotic", "permutation"), fill = c(4, 1)) ## End(Not run)
## Not run: set.seed(23235) ss <- TRUE # sample(1:150, 10 ) hc1 <- hclust(dist(iris[ss, -5]), "com") hc2 <- hclust(dist(iris[ss, -5]), "single") # dend1 <- as.dendrogram(hc1) # dend2 <- as.dendrogram(hc2) # cutree(dend1) # small k A1_clusters <- cutree(hc1, k = 3) # will give a right tailed distribution # large k A1_clusters <- cutree(hc1, k = 50) # will give a discrete distribution # "medium" k A1_clusters <- cutree(hc1, k = 25) # gives almost the normal distribution! A2_clusters <- A1_clusters R <- 10000 set.seed(414130) FM_index_H0 <- replicate(R, FM_index_permutation(A1_clusters, A2_clusters)) # can take 10 sec plot(density(FM_index_H0), main = "FM Index distribution under H0\n (10000 permutation)") abline(v = mean(FM_index_H0), col = 1, lty = 2) # The permutation distribution is with a heavy right tail: # Source of the skew functions is based on: library(psych) skew <- function (x, na.rm = TRUE) { x <- na.omit(x) sum((x - mean(x))^3)/(length(x) * sd(x)^3) } skew(FM_index_H0) # 1.254 mean(FM_index_H0) var(FM_index_H0) the_FM_index <- FM_index(A1_clusters, A2_clusters) the_FM_index our_dnorm <- function(x) { dnorm(x, mean = attr(the_FM_index, "E_FM"), sd = sqrt(attr(the_FM_index, "V_FM")) ) } # our_dnorm(0.35) curve(our_dnorm, col = 4, from = -1, to = 1, n = R, add = TRUE ) abline(v = attr(the_FM_index, "E_FM"), col = 4, lty = 2) legend("topright", legend = c("asymptotic", "permutation"), fill = c(4, 1)) ## End(Not run)
Calculating Fowlkes-Mallows index.
The FM_index_R
function also calculates the expectancy and variance of the FM Index
under the null hypothesis of no relation.
FM_index_R( A1_clusters, A2_clusters, assume_sorted_vectors = FALSE, warn = dendextend_options("warn"), ... )
FM_index_R( A1_clusters, A2_clusters, assume_sorted_vectors = FALSE, warn = dendextend_options("warn"), ... )
A1_clusters |
a numeric vector of cluster grouping (numeric) of items, with a name attribute of item name for each element from group A1. These are often obtained by using some k cut on a dendrogram. |
A2_clusters |
a numeric vector of cluster grouping (numeric) of items, with a name attribute of item name for each element from group A2. These are often obtained by using some k cut on a dendrogram. |
assume_sorted_vectors |
logical (FALSE). Can we assume to two group vectors are sorter so that they have the same order of items? IF FALSE (default), then the vectors will be sorted based on their name attribute. |
warn |
logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE. |
... |
Ignored. |
From Wikipedia:
Fowlkes-Mallows index (see references) is an external evaluation method that is used to determine the similarity between two clusterings (clusters obtained after a clustering algorithm). This measure of similarity could be either between two hierarchical clusterings or a clustering and a benchmark classification. A higher the value for the Fowlkes-Mallows index indicates a greater similarity between the clusters and the benchmark classifications.
The Fowlkes-Mallows index between two vectors of clustering groups.
Includes the attributes E_FM and V_FM for the relevant expectancy and variance under the null hypothesis of no-relation.
Fowlkes, E. B.; Mallows, C. L. (1 September 1983). "A Method for Comparing Two Hierarchical Clusterings". Journal of the American Statistical Association 78 (383): 553.
https://en.wikipedia.org/wiki/Fowlkes-Mallows_index
## Not run: set.seed(23235) ss <- TRUE # sample(1:150, 10 ) hc1 <- hclust(dist(iris[ss, -5]), "com") hc2 <- hclust(dist(iris[ss, -5]), "single") # dend1 <- as.dendrogram(hc1) # dend2 <- as.dendrogram(hc2) # cutree(dend1) FM_index_R(cutree(hc1, k = 3), cutree(hc1, k = 3)) # 1 set.seed(1341) FM_index_R(cutree(hc1, k = 3), sample(cutree(hc1, k = 3)), assume_sorted_vectors = TRUE) # 0.38037 FM_index_R(cutree(hc1, k = 3), sample(cutree(hc1, k = 3)), assume_sorted_vectors = FALSE) # 1 again :) FM_index_R(cutree(hc1, k = 3), cutree(hc2, k = 3)) # 0.8059 FM_index_R(cutree(hc1, k = 30), cutree(hc2, k = 30)) # 0.4529 fo <- function(k) FM_index_R(cutree(hc1, k), cutree(hc2, k)) lapply(1:4, fo) ks <- 1:150 plot(sapply(ks, fo) ~ ks, type = "b", main = "Bk plot for the iris dataset") clu_1 <- cutree(hc2, k = 100) # this is a lie - since this one is NOT well defined! clu_2 <- cutree(as.dendrogram(hc2), k = 100) # We see that we get a vector of NAs for this... FM_index_R(clu_1, clu_2) # NA ## End(Not run)
## Not run: set.seed(23235) ss <- TRUE # sample(1:150, 10 ) hc1 <- hclust(dist(iris[ss, -5]), "com") hc2 <- hclust(dist(iris[ss, -5]), "single") # dend1 <- as.dendrogram(hc1) # dend2 <- as.dendrogram(hc2) # cutree(dend1) FM_index_R(cutree(hc1, k = 3), cutree(hc1, k = 3)) # 1 set.seed(1341) FM_index_R(cutree(hc1, k = 3), sample(cutree(hc1, k = 3)), assume_sorted_vectors = TRUE) # 0.38037 FM_index_R(cutree(hc1, k = 3), sample(cutree(hc1, k = 3)), assume_sorted_vectors = FALSE) # 1 again :) FM_index_R(cutree(hc1, k = 3), cutree(hc2, k = 3)) # 0.8059 FM_index_R(cutree(hc1, k = 30), cutree(hc2, k = 30)) # 0.4529 fo <- function(k) FM_index_R(cutree(hc1, k), cutree(hc2, k)) lapply(1:4, fo) ks <- 1:150 plot(sapply(ks, fo) ~ ks, type = "b", main = "Bk plot for the iris dataset") clu_1 <- cutree(hc2, k = 100) # this is a lie - since this one is NOT well defined! clu_2 <- cutree(as.dendrogram(hc2), k = 100) # We see that we get a vector of NAs for this... FM_index_R(clu_1, clu_2) # NA ## End(Not run)
Get height attributes of a dendrogram's branches
get_branches_heights( dend, sort = TRUE, decreasing = FALSE, include_leaves = FALSE, ... )
get_branches_heights( dend, sort = TRUE, decreasing = FALSE, include_leaves = FALSE, ... )
dend |
a dendrogram. |
sort |
logical. Should the heights be sorted? |
decreasing |
logical. Should the sort be increasing or decreasing? Not available for partial sorting. |
include_leaves |
logical (FALSE). Should the output include the leaves value (0's). |
... |
not used. |
a vector of the dendrogram's nodes heights (excluding leaves).
hc <- hclust(dist(USArrests[1:4, ]), "ave") dend <- as.dendrogram(hc) get_branches_heights(dend)
hc <- hclust(dist(USArrests[1:4, ]), "ave") dend <- as.dendrogram(hc) get_branches_heights(dend)
Get height attributes from a dendrogram's children nodes
get_childrens_heights(dend, ...)
get_childrens_heights(dend, ...)
dend |
a dendrogram. |
... |
not used. |
a vector of the heights of a dendrogram's current node's (first level) children.
hc <- hclust(dist(USArrests[1:4, ]), "ave") dend <- as.dendrogram(hc) get_childrens_heights(dend)
hc <- hclust(dist(USArrests[1:4, ]), "ave") dend <- as.dendrogram(hc) get_childrens_heights(dend)
Get/set attributes of dendrogram's leaves
get_leaves_attr(dend, attribute, simplify = TRUE, ...)
get_leaves_attr(dend, attribute, simplify = TRUE, ...)
dend |
a dendrogram object |
attribute |
character scalar of the attribute ( |
simplify |
logical. If TRUE (default), then the return vector is
after using |
... |
not used |
A vector (or a list) with the dendrogram's leaves attribute
Heavily inspired by the code in the
function labels.dendrogram
,
so credit should go to Martin Maechler.
get_nodes_attr, nnodes, nleaves, assign_values_to_leaves_nodePar
# define dendrogram object to play with: hc <- hclust(dist(USArrests[1:3, ]), "ave") dend <- as.dendrogram(hc) # get_leaves_attr(dend) # error :) get_leaves_attr(dend, "label") labels(dend, "label") get_leaves_attr(dend, "height") # should be 0's get_nodes_attr(dend, "height") get_leaves_attr(dend, "nodePar") get_leaves_attr(dend, "leaf") # should be TRUE's get_nodes_attr(dend, "leaf") # conatins NA's get_leaves_attr(dend, "members") # should be 1's get_nodes_attr(dend, "members") # get_leaves_attr(dend, "members", simplify = FALSE) # should be 1's
# define dendrogram object to play with: hc <- hclust(dist(USArrests[1:3, ]), "ave") dend <- as.dendrogram(hc) # get_leaves_attr(dend) # error :) get_leaves_attr(dend, "label") labels(dend, "label") get_leaves_attr(dend, "height") # should be 0's get_nodes_attr(dend, "height") get_leaves_attr(dend, "nodePar") get_leaves_attr(dend, "leaf") # should be TRUE's get_nodes_attr(dend, "leaf") # conatins NA's get_leaves_attr(dend, "members") # should be 1's get_nodes_attr(dend, "members") # get_leaves_attr(dend, "members", simplify = FALSE) # should be 1's
This is helpful to get the attributes of branches of the leaves. For example, after we use color_branches, to get the colors of the labels to match (since getting the colors of branches to match those of the labels can be tricky). This is based on get_leaves_edgePar.
get_leaves_branches_attr(dend, attr = c("col", "lwd", "lty"), ...)
get_leaves_branches_attr(dend, attr = c("col", "lwd", "lty"), ...)
dend |
a dendrogram object |
attr |
character, the attr to get. Can be either "col", "lwd", or "lty". |
... |
not used |
A vector with the dendrogram's leaves nodePar attribute
get_nodes_attr, assign_values_to_leaves_nodePar, labels_colors get_leaves_nodePar, get_leaves_edgePar
# define dendrogram object to play with: hc <- hclust(dist(USArrests[1:5, ]), "ave") dend <- as.dendrogram(hc) dend <- dend %>% color_branches(k = 3) %>% set("branches_lwd", c(2, 1, 2)) %>% set("branches_lty", c(1, 2, 1)) plot(dend) get_leaves_branches_attr(dend, "col") get_leaves_branches_attr(dend, "lwd") get_leaves_branches_attr(dend, "lty") labels_colors(dend) <- get_leaves_branches_attr(dend, "col") plot(dend)
# define dendrogram object to play with: hc <- hclust(dist(USArrests[1:5, ]), "ave") dend <- as.dendrogram(hc) dend <- dend %>% color_branches(k = 3) %>% set("branches_lwd", c(2, 1, 2)) %>% set("branches_lty", c(1, 2, 1)) plot(dend) get_leaves_branches_attr(dend, "col") get_leaves_branches_attr(dend, "lwd") get_leaves_branches_attr(dend, "lty") labels_colors(dend) <- get_leaves_branches_attr(dend, "col") plot(dend)
It is useful to get the colors of branches of the leaves, after we use color_branches, so to then match the colors of the labels to that of the branches (since getting the colors of branches to match those of the labels can be tricky). This is based on get_leaves_branches_attr which is based on get_leaves_edgePar.
TODO: The function get_leaves_branches_col may behave oddly when extracting colors with missing col attributes when the lwd attribute is available. This may resolt in a vector with the wrong length (with omitted NA values). This might need to be fixed in the future, and attention should be given to this case.
get_leaves_branches_col(dend, ...)
get_leaves_branches_col(dend, ...)
dend |
a dendrogram object |
... |
not used |
A vector with the dendrogram's leaves' branches' colors
get_nodes_attr, assign_values_to_leaves_nodePar, labels_colors get_leaves_nodePar, get_leaves_edgePar, get_leaves_branches_attr
# define dendrogram object to play with: hc <- hclust(dist(USArrests[1:5, ]), "ave") dend <- as.dendrogram(hc) par(mfrow = c(1, 2), mar = c(5, 2, 1, 0)) dend <- dend %>% color_branches(k = 3) %>% set("branches_lwd", c(2, 1, 2)) %>% set("branches_lty", c(1, 2, 1)) plot(dend) labels_colors(dend) <- get_leaves_branches_col(dend) plot(dend)
# define dendrogram object to play with: hc <- hclust(dist(USArrests[1:5, ]), "ave") dend <- as.dendrogram(hc) par(mfrow = c(1, 2), mar = c(5, 2, 1, 0)) dend <- dend %>% color_branches(k = 3) %>% set("branches_lwd", c(2, 1, 2)) %>% set("branches_lty", c(1, 2, 1)) plot(dend) labels_colors(dend) <- get_leaves_branches_col(dend) plot(dend)
This is helpful to get the attributes of branches of the leaves. For example, after we use color_branches, to get the colors of the labels to match (since getting the colors of branches to match those of the labels can be tricky).
get_leaves_edgePar(dend, simplify = FALSE, ...)
get_leaves_edgePar(dend, simplify = FALSE, ...)
dend |
a dendrogram object |
simplify |
logical (default is FALSE). If TRUE, then the return vector is
after using |
... |
not used |
A list (or a vector) with the dendrogram's leaves edgePar attribute
get_nodes_attr, assign_values_to_leaves_nodePar, labels_colors get_leaves_nodePar
# define dendrogram object to play with: hc <- hclust(dist(USArrests[1:5, ]), "ave") dend <- as.dendrogram(hc) # get_leaves_edgePar(dend) # error :) get_leaves_edgePar(dend) dend <- color_branches(dend, k = 3) get_leaves_edgePar(dend) get_leaves_edgePar(dend, TRUE) dend <- dend %>% set("branches_lwd", c(2, 1, 2)) get_leaves_edgePar(dend) plot(dend)
# define dendrogram object to play with: hc <- hclust(dist(USArrests[1:5, ]), "ave") dend <- as.dendrogram(hc) # get_leaves_edgePar(dend) # error :) get_leaves_edgePar(dend) dend <- color_branches(dend, k = 3) get_leaves_edgePar(dend) get_leaves_edgePar(dend, TRUE) dend <- dend %>% set("branches_lwd", c(2, 1, 2)) get_leaves_edgePar(dend) plot(dend)
Get the nodePar attributes of dendrogram's leaves (includes pch, color, and cex)
get_leaves_nodePar(dend, simplify = FALSE, ...)
get_leaves_nodePar(dend, simplify = FALSE, ...)
dend |
a dendrogram object |
simplify |
logical (default is FALSE). If TRUE, then the return vector is
after using |
... |
not used |
A list (or a vector) with the dendrogram's leaves nodePar attribute
get_nodes_attr, assign_values_to_leaves_nodePar, labels_colors get_leaves_edgePar
# define dendrogram object to play with: hc <- hclust(dist(USArrests[1:3, ]), "ave") dend <- as.dendrogram(hc) # get_leaves_attr(dend) # error :) get_leaves_nodePar(dend) labels_colors(dend) <- 1:3 get_leaves_nodePar(dend) dend <- assign_values_to_leaves_nodePar(dend, 2, "lab.cex") get_leaves_nodePar(dend) plot(dend)
# define dendrogram object to play with: hc <- hclust(dist(USArrests[1:3, ]), "ave") dend <- as.dendrogram(hc) # get_leaves_attr(dend) # error :) get_leaves_nodePar(dend) labels_colors(dend) <- 1:3 get_leaves_nodePar(dend) dend <- assign_values_to_leaves_nodePar(dend, 2, "lab.cex") get_leaves_nodePar(dend) plot(dend)
Allows easy access to attributes of branches and/or leaves, with option of returning a vector with/withough NA's (for marking the missing attr value)
get_nodes_attr( dend, attribute, id, include_leaves = TRUE, include_branches = TRUE, simplify = TRUE, na.rm = FALSE, ... )
get_nodes_attr( dend, attribute, id, include_leaves = TRUE, include_branches = TRUE, simplify = TRUE, na.rm = FALSE, ... )
dend |
a dendrogram object |
attribute |
character scalar of the attribute ( |
id |
integer vector. If given - only the attr of these nodes id will be returned (via depth first search) |
include_leaves |
logical. Should leaves attributes be included as well? |
include_branches |
logical. Should non-leaf (branch node) attributes be included as well? |
simplify |
logical (default is TRUE). should the result be simplified to a vector (using simplify2array ) if possible? If it is not possible it will return a matrix. When FALSE, a list is returned. |
na.rm |
logical. Should NA attributes be REMOVED from the resulting vector? |
... |
not used |
A vector with the dendrogram's nodes attribute. If an attribute is missing from some nodes, it will return NA in that vector.
Heavily inspired by the code in the
function labels.dendrogram
,
so credit should go to Martin Maechler.
get_leaves_attr, nnodes, nleaves
# define dendrogram object to play with: hc <- hclust(dist(USArrests[1:3, ]), "ave") dend <- as.dendrogram(hc) # get_leaves_attr(dend) # error :) get_leaves_attr(dend, "label") labels(dend, "label") get_leaves_attr(dend, "height") # should be 0's get_nodes_attr(dend, "height") get_leaves_attr(dend, "leaf") # should be TRUE's get_nodes_attr(dend, "leaf") # conatins NA's get_leaves_attr(dend, "members") # should be 1's get_nodes_attr(dend, "members", include_branches = FALSE, na.rm = TRUE) # get_nodes_attr(dend, "members") # get_nodes_attr(dend, "members", simplify = FALSE) get_nodes_attr(dend, "members", include_leaves = FALSE, na.rm = TRUE) # get_nodes_attr(dend, "members", id = c(1, 3), simplify = FALSE) get_nodes_attr(dend, "members", id = c(1, 3)) # hang_dend <- hang.dendrogram(dend) get_leaves_attr(hang_dend, "height") # no longer 0! get_nodes_attr(hang_dend, "height") # does not include any 0s! # does not include leaves values: get_nodes_attr(hang_dend, "height", include_leaves = FALSE) # remove leaves values all together: get_nodes_attr(hang_dend, "height", include_leaves = FALSE, na.rm = TRUE) ## Not run: library(microbenchmark) # get_leaves_attr is twice faster than get_nodes_attr microbenchmark( get_leaves_attr(dend, "members"), # should be 1's get_nodes_attr(dend, "members", include_branches = FALSE, na.rm = TRUE) ) ## End(Not run)
# define dendrogram object to play with: hc <- hclust(dist(USArrests[1:3, ]), "ave") dend <- as.dendrogram(hc) # get_leaves_attr(dend) # error :) get_leaves_attr(dend, "label") labels(dend, "label") get_leaves_attr(dend, "height") # should be 0's get_nodes_attr(dend, "height") get_leaves_attr(dend, "leaf") # should be TRUE's get_nodes_attr(dend, "leaf") # conatins NA's get_leaves_attr(dend, "members") # should be 1's get_nodes_attr(dend, "members", include_branches = FALSE, na.rm = TRUE) # get_nodes_attr(dend, "members") # get_nodes_attr(dend, "members", simplify = FALSE) get_nodes_attr(dend, "members", include_leaves = FALSE, na.rm = TRUE) # get_nodes_attr(dend, "members", id = c(1, 3), simplify = FALSE) get_nodes_attr(dend, "members", id = c(1, 3)) # hang_dend <- hang.dendrogram(dend) get_leaves_attr(hang_dend, "height") # no longer 0! get_nodes_attr(hang_dend, "height") # does not include any 0s! # does not include leaves values: get_nodes_attr(hang_dend, "height", include_leaves = FALSE) # remove leaves values all together: get_nodes_attr(hang_dend, "height", include_leaves = FALSE, na.rm = TRUE) ## Not run: library(microbenchmark) # get_leaves_attr is twice faster than get_nodes_attr microbenchmark( get_leaves_attr(dend, "members"), # should be 1's get_nodes_attr(dend, "members", include_branches = FALSE, na.rm = TRUE) ) ## End(Not run)
Get the x-y coordinates of a dendrogram's nodes. Can be used to add text or images on the tree.
get_nodes_xy( dend, type = c("rectangle", "triangle"), center = FALSE, horiz = FALSE, ... )
get_nodes_xy( dend, type = c("rectangle", "triangle"), center = FALSE, horiz = FALSE, ... )
dend |
a dendrogram object |
type |
type of plot. |
center |
logical; if TRUE, nodes are plotted centered with respect to the leaves in the branch. Otherwise (default), plot them in the middle of all direct child nodes. |
horiz |
logical indicating if the dendrogram should be drawn horizontally or not. |
... |
not used |
A 2-dimensional matrix, with rows as the number of nodes, and the first column is the x location, while the second is the y location.
This is a striped down version of the
function plot.dendrogram
.
It performs (almost) the same task, only it does not do any plotting
but it does save the x-y coordiantes of the nodes.
get_nodes_attr, nnodes, nleaves
## Not run: # If we would like to see the numbers from plot: # ?getOption("verbose") # options(verbose=TRUE) # options(verbose=FALSE) # ----- # Draw a depth first search illustration # ----- dend <- 1:5 %>% dist() %>% hclust() %>% as.dendrogram() get_nodes_xy(dend) # polygon(get_nodes_xy(dend), col = 2) plot(dend, leaflab = "none", main = "Depth-first search in a dendrogram" ) xy <- get_nodes_xy(dend) for (i in 1:(nrow(xy) - 1)) { arrows(xy[i, 1], xy[i, 2], angle = 17, length = .5, xy[i + 1, 1], xy[i + 1, 2], lty = 1, col = 3, lwd = 1.5 ) } points(xy, pch = 19, cex = 4) text(xy, labels = 1:nnodes(dend), cex = 1.2, col = "white", adj = c(0.4, 0.4)) ## End(Not run)
## Not run: # If we would like to see the numbers from plot: # ?getOption("verbose") # options(verbose=TRUE) # options(verbose=FALSE) # ----- # Draw a depth first search illustration # ----- dend <- 1:5 %>% dist() %>% hclust() %>% as.dendrogram() get_nodes_xy(dend) # polygon(get_nodes_xy(dend), col = 2) plot(dend, leaflab = "none", main = "Depth-first search in a dendrogram" ) xy <- get_nodes_xy(dend) for (i in 1:(nrow(xy) - 1)) { arrows(xy[i, 1], xy[i, 2], angle = 17, length = .5, xy[i + 1, 1], xy[i + 1, 2], lty = 1, col = 3, lwd = 1.5 ) } points(xy, pch = 19, cex = 4) text(xy, labels = 1:nnodes(dend), cex = 1.2, col = "white", adj = c(0.4, 0.4)) ## End(Not run)
get attributes from the dendrogram's root(!) branches
get_root_branches_attr(dend, the_attr, warn = dendextend_options("warn"), ...)
get_root_branches_attr(dend, the_attr, warn = dendextend_options("warn"), ...)
dend |
dendrogram object |
the_attr |
the attribute to get from the branches (for example "height") |
warn |
logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE. Should a warning be printed when the function is used on an object which is NOT a dendrogram. |
... |
passed on to attr |
The attributes of the branches (often two) of the dendrogram's root
hc <- hclust(dist(USArrests[2:9, ]), "com") dend <- as.dendrogram(hc) get_root_branches_attr(dend, "height") # 0.00000 71.96247 # plot(dend) str(dend, 2)
hc <- hclust(dist(USArrests[2:9, ]), "com") dend <- as.dendrogram(hc) get_root_branches_attr(dend, "height") # 0.00000 71.96247 # plot(dend) str(dend, 2)
Extracts a list (dendlist) of subdendrogram structures based on the cutree cutree.dendrogram
function
from a given dendrogram object. It can be useful in case we're interested in a visual investigation of
specific clustering results.
get_subdendrograms(dend, k, order_clusters_as_data = FALSE, ...)
get_subdendrograms(dend, k, order_clusters_as_data = FALSE, ...)
dend |
a dendrogram object |
k |
the number of subdendrograms that should be extracted |
order_clusters_as_data |
passed to cutree, default is FALSE (while the cutree default is TRUE). The reason is since it's easier to look at the dendrogram plot and then get subtrees that are in the same order is in the plot/dendrogram object. This is in contrast to more traditional use of cutree, where it is used with the original order or rows from the data. |
... |
parameters that should be passed to the cutree
|
A list of k subdendrograms, based on the cutree
cutree.dendrogram
clustering
clusters.
# needed packages: # install.packages(gplots) # install.packages(viridis) # install.packages(devtools) # devtools::install_github('talgalili/dendextend') #' dendextend from github # define dendrogram object to play with: dend <- iris[1:20, -5] %>% dist() %>% hclust() %>% as.dendrogram() %>% # set("labels_to_character") %>% color_branches(k = 5) labels(dend) <- letters[1:20] plot(dend) dend_list <- get_subdendrograms(dend, 5) lapply(dend_list, labels) # [[1]] # [1] "a" "b" # # [[2]] # [1] "c" "d" "e" "f" "g" # # [[3]] # [1] "h" "i" # # [[4]] # [1] "j" "k" "l" "m" # # [[5]] # [1] "n" "o" "p" "q" "r" "s" "t" # define dendrogram object to play with: dend <- iris[, -5] %>% dist() %>% hclust() %>% as.dendrogram() %>% set("labels_to_character") %>% color_branches(k = 5) dend_list <- get_subdendrograms(dend, 5) # Plotting the result par(mfrow = c(2, 3)) plot(dend, main = "Original dendrogram") sapply(dend_list, plot) # plot a heatmap of only one of the sub dendrograms par(mfrow = c(1, 1)) library(gplots) sub_dend <- dend_list[[1]] #' get the sub dendrogram # make sure of the size of the dend nleaves(sub_dend) length(order.dendrogram(sub_dend)) # get the subset of the data subset_iris <- as.matrix(iris[order.dendrogram(sub_dend), -5]) # update the dendrogram's internal order so to not cause an error in heatmap.2 order.dendrogram(sub_dend) <- as.integer(rank(order.dendrogram(sub_dend))) heatmap.2(subset_iris, Rowv = sub_dend, trace = "none", col = viridis::viridis(100))
# needed packages: # install.packages(gplots) # install.packages(viridis) # install.packages(devtools) # devtools::install_github('talgalili/dendextend') #' dendextend from github # define dendrogram object to play with: dend <- iris[1:20, -5] %>% dist() %>% hclust() %>% as.dendrogram() %>% # set("labels_to_character") %>% color_branches(k = 5) labels(dend) <- letters[1:20] plot(dend) dend_list <- get_subdendrograms(dend, 5) lapply(dend_list, labels) # [[1]] # [1] "a" "b" # # [[2]] # [1] "c" "d" "e" "f" "g" # # [[3]] # [1] "h" "i" # # [[4]] # [1] "j" "k" "l" "m" # # [[5]] # [1] "n" "o" "p" "q" "r" "s" "t" # define dendrogram object to play with: dend <- iris[, -5] %>% dist() %>% hclust() %>% as.dendrogram() %>% set("labels_to_character") %>% color_branches(k = 5) dend_list <- get_subdendrograms(dend, 5) # Plotting the result par(mfrow = c(2, 3)) plot(dend, main = "Original dendrogram") sapply(dend_list, plot) # plot a heatmap of only one of the sub dendrograms par(mfrow = c(1, 1)) library(gplots) sub_dend <- dend_list[[1]] #' get the sub dendrogram # make sure of the size of the dend nleaves(sub_dend) length(order.dendrogram(sub_dend)) # get the subset of the data subset_iris <- as.matrix(iris[order.dendrogram(sub_dend), -5]) # update the dendrogram's internal order so to not cause an error in heatmap.2 order.dendrogram(sub_dend) <- as.integer(rank(order.dendrogram(sub_dend))) heatmap.2(subset_iris, Rowv = sub_dend, trace = "none", col = viridis::viridis(100))
Several functions for creating a dendrogram plot using ggplot2. The core process is to transform a dendrogram into a ggdend object using as.ggdend, and then plot it using ggplot. These two steps can be done in one command with either the function ggplot or ggdend.
The reason we want to have as.ggdend (and not only ggplot.dendrogram), is (1) so that you could create your own mapping of ggdend and, (2) since as.ggdend might be slow for large trees, it is probably better to be able to run it only once for such cases.
A ggdend class object is a list with 3 componants: segments, labels, nodes. Each one contains the graphical parameters from the original dendrogram, but in a tabular form that can be used by ggplot2+geom_segment+geom_text to create a dendrogram plot.
ggdend(...) as.ggdend(dend, ...) ## S3 method for class 'dendrogram' as.ggdend(dend, type = c("rectangle", "triangle"), edge.root = FALSE, ...) prepare.ggdend(data, ...) ## S3 method for class 'ggdend' ggplot( data = NULL, mapping = aes(), ..., segments = TRUE, labels = TRUE, nodes = TRUE, horiz = FALSE, theme = theme_dendro(), offset_labels = 0, na.rm = TRUE, environment = parent.frame() ) ## S3 method for class 'dendrogram' ggplot(data, ...) ## S3 method for class 'ggdend' print(x, ...)
ggdend(...) as.ggdend(dend, ...) ## S3 method for class 'dendrogram' as.ggdend(dend, type = c("rectangle", "triangle"), edge.root = FALSE, ...) prepare.ggdend(data, ...) ## S3 method for class 'ggdend' ggplot( data = NULL, mapping = aes(), ..., segments = TRUE, labels = TRUE, nodes = TRUE, horiz = FALSE, theme = theme_dendro(), offset_labels = 0, na.rm = TRUE, environment = parent.frame() ) ## S3 method for class 'dendrogram' ggplot(data, ...) ## S3 method for class 'ggdend' print(x, ...)
... |
mostly ignored. |
dend |
a dendrogram tree (to be turned into a ggdend object) |
type |
The type of plot, indicating the shape of the dendrogram. "rectangle" will draw rectangular lines, while "triangle" will draw triangular lines. |
edge.root |
currently ignored. One day it might do the following: logical; if true, draw an edge to the root node. |
data , x
|
a ggdend class object (passed to ggplot.dendrogram or print.ggdend). |
mapping |
(passed in ggplot.ggdend) Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
segments |
a logical (TRUE) if to plot the segments (branches). |
labels |
a logical (TRUE) if to plot the labels. |
nodes |
a logical (TRUE) if to plot the nodes (points). |
horiz |
a logical (TRUE) indicating if the dendrogram should be drawn horizontally or not. |
theme |
the ggplot2 theme to use (default is theme_dendro, can also be NULL for the default ggplot2 theme) |
offset_labels |
a numeric value to offset the labels from the leaves |
na.rm |
A logical (TRUE) to control removal of missing values. Passed to geom_line and geom_point |
environment |
(passed in ggplot.ggdend) deprecated / ignored. |
prepare.ggdend
is used by plot.ggdend
to take the ggdend
object
and prepare it for plotting. This is because the defaults of various parameters in dendrogram's
are not always stored in the object itself, but are built-in into the plot.dendrogram function.
For example, the color of the labels is not (by default) specified in the dendrogram (only if we change it
from black to something else). Hence, when taking the object into a different plotting engine (say ggplot2), we
want to prepare the object by filling-in various defaults.
This function is autmatically invoked within the plot.ggdend
function. You would probably use
it only if you'd wish to build your own ggplot2 mapping.
as.ggdend
- returns an object of class ggdend which is a list with 3 componants: segments, labels, nodes.
Each one contains the graphical parameters from the original dendrogram, but in a tabular form that
can be used by ggplot2+geom_segment+geom_text to create a dendrogram plot.
prepare.ggdend
- a ggdend
object (after filling it with various default values)
ggplot.ggdend
- a ggplot object
Tal Galili, using code modified from Andrie de Vries
These are extended versions of the functions ggdendrogram, dendro_data (and the hidden dendrogram_data) from Andrie de Vries's ggdendro package. The motivation for this fork is the need to add more graphical parameters to the plotted tree. This required a strong mixter of functions from ggdendro and dendextend (to the point that it seemed better to just fork the code into its current form)
dendrogram, get_nodes_attr, get_leaves_nodePar, ggplot, ggdendrogram, dendro_data,
## Not run: library(dendextend) # library(ggdendro) # Create a complex dend: dend <- iris[1:30, -5] %>% dist() %>% hclust() %>% as.dendrogram() %>% set("branches_k_color", k = 3) %>% set("branches_lwd", c(1.5, 1, 1.5)) %>% set("branches_lty", c(1, 1, 3, 1, 1, 2)) %>% set("labels_colors") %>% set("labels_cex", c(.9, 1.2)) # plot the dend in usual "base" plotting engine: plot(dend) # Now let's do it in ggplot2 :) ggd1 <- as.ggdend(dend) library(ggplot2) ggplot(ggd1) # reproducing the above plot in ggplot2 :) # Triangle version: plot(dend, type = "triangle") ggd2 <- as.ggdend(dend, type = "triangle") ggplot(ggd2) # More modifications: labels(dend) <- paste0(labels(dend), "00000") ggd1 <- as.ggdend(dend) # Use ylim to deal with long labels in ggplot2 ggplot(ggd1) + ylim(-.4, max(get_branches_heights(dend))) ggplot(ggd1, horiz = TRUE) # horiz plot in ggplot2 # Adding some extra spice to it... # creating a radial plot: ggplot(ggd1) + scale_y_reverse(expand = c(0.2, 0)) + coord_polar(theta = "x") # The text doesn't look so great, so let's remove it: ggplot(ggd1, labels = FALSE) + scale_y_reverse(expand = c(0.2, 0)) + coord_polar(theta = "x") # This can now be sent to plot.ly - which adds zoom-in abilities, and more. # Here is how it might look like: https://plot.ly/~talgalili/6/y-vs-x/ ## Quick guide: # install.packages("devtools") # library("devtools") # devtools::install_github("ropensci/plotly") # library(plotly) # set_credentials_file(...) # you'll need to get it from here: https://plot.ly/ggplot2/getting-started/ # ggplot(ggd1) # py <- plotly() # py$ggplotly() # And you'll get something like this: https://plot.ly/~talgalili/6/y-vs-x/ # Another example: https://plot.ly/ggplot2/ ## End(Not run)
## Not run: library(dendextend) # library(ggdendro) # Create a complex dend: dend <- iris[1:30, -5] %>% dist() %>% hclust() %>% as.dendrogram() %>% set("branches_k_color", k = 3) %>% set("branches_lwd", c(1.5, 1, 1.5)) %>% set("branches_lty", c(1, 1, 3, 1, 1, 2)) %>% set("labels_colors") %>% set("labels_cex", c(.9, 1.2)) # plot the dend in usual "base" plotting engine: plot(dend) # Now let's do it in ggplot2 :) ggd1 <- as.ggdend(dend) library(ggplot2) ggplot(ggd1) # reproducing the above plot in ggplot2 :) # Triangle version: plot(dend, type = "triangle") ggd2 <- as.ggdend(dend, type = "triangle") ggplot(ggd2) # More modifications: labels(dend) <- paste0(labels(dend), "00000") ggd1 <- as.ggdend(dend) # Use ylim to deal with long labels in ggplot2 ggplot(ggd1) + ylim(-.4, max(get_branches_heights(dend))) ggplot(ggd1, horiz = TRUE) # horiz plot in ggplot2 # Adding some extra spice to it... # creating a radial plot: ggplot(ggd1) + scale_y_reverse(expand = c(0.2, 0)) + coord_polar(theta = "x") # The text doesn't look so great, so let's remove it: ggplot(ggd1, labels = FALSE) + scale_y_reverse(expand = c(0.2, 0)) + coord_polar(theta = "x") # This can now be sent to plot.ly - which adds zoom-in abilities, and more. # Here is how it might look like: https://plot.ly/~talgalili/6/y-vs-x/ ## Quick guide: # install.packages("devtools") # library("devtools") # devtools::install_github("ropensci/plotly") # library(plotly) # set_credentials_file(...) # you'll need to get it from here: https://plot.ly/ggplot2/getting-started/ # ggplot(ggd1) # py <- plotly() # py$ggplotly() # And you'll get something like this: https://plot.ly/~talgalili/6/y-vs-x/ # Another example: https://plot.ly/ggplot2/ ## End(Not run)
Adjust the height attr in all of the dendrogram leaves so that the tree will hang. This is similar to as.dendrogram(hclust, hang=0.1) Only that it now works on other object than hclust turned into a dendrogram. For example, this allows us to hang non-binary trees.
hang.dendrogram(dend, hang = 0.1, hang_height, ...)
hang.dendrogram(dend, hang = 0.1, hang_height, ...)
dend |
a dendrogram object |
hang |
The fraction of the plot height by which labels should hang below the rest of the plot. A negative value will cause the labels to hang down from 0. |
hang_height |
is missing, then using "hang". If a number is given, it overrides "hang" (except if "hang" is negative) |
... |
not used |
A dendrogram, after adjusting the height attr in all of its leaves, so that the tree will hang.
Noticing that as.dendrogram has a "hang" parameter was thanks to Enrique Ramos's answer here:: https://stackoverflow.com/questions/17088136/plot-horizontal-dendrogram-with-hanging-leaves-r
# define dendrogram object to play with: hc <- hclust(dist(USArrests[1:5, ]), "ave") dend <- as.dendrogram(hc) par(mfrow = c(1, 2)) plot(hang.dendrogram(dend)) plot(hc) # identical(as.dendrogram(hc, hang = 0.1), hang.dendrogram(dend, hang = 0.1)) # TRUE!! par(mfrow = c(1, 4)) plot(dend) plot(hang.dendrogram(dend, hang = 0.1)) plot(hang.dendrogram(dend, hang = 0)) plot(hang.dendrogram(dend, hang = -0.1)) par(mfrow = c(1, 1)) plot(hang.dendrogram(dend), horiz = TRUE)
# define dendrogram object to play with: hc <- hclust(dist(USArrests[1:5, ]), "ave") dend <- as.dendrogram(hc) par(mfrow = c(1, 2)) plot(hang.dendrogram(dend)) plot(hc) # identical(as.dendrogram(hc, hang = 0.1), hang.dendrogram(dend, hang = 0.1)) # TRUE!! par(mfrow = c(1, 4)) plot(dend) plot(hang.dendrogram(dend, hang = 0.1)) plot(hang.dendrogram(dend, hang = 0)) plot(hang.dendrogram(dend, hang = -0.1)) par(mfrow = c(1, 1)) plot(hang.dendrogram(dend), horiz = TRUE)
Does a dendrogram has an edgePar/nodePar component?
has_component_in_attribute(dend, component, the_attrib = "edgePar", ...)
has_component_in_attribute(dend, component, the_attrib = "edgePar", ...)
dend |
a dendrogram object. |
component |
a character value to be checked if exists in the tree. For edgePar the list: "col", "lty" and "lwd" (for the segments), "p.col", "p.lwd", and "p.lty" (for the polygon around the text) and "t.col" for the text color. For edgePar "pch", "cex", "col", "xpd", and/or "bg". |
the_attrib |
A character of the attribute for which to check the existence of the component. Often either "edgePar" or "nodePar". |
... |
ignored |
Logical. TRUE if such a component is defined somewhere in the tree, FALSE otherwise. If dend is not a dendrogram, the function will return FALSE.
dat <- iris[1:20, -5] hca <- hclust(dist(dat)) hca2 <- hclust(dist(dat), method = "single") dend <- as.dendrogram(hca) dend2 <- as.dendrogram(hca2) dend %>% set("branches_lwd", 2) %>% set("branches_lty", 2) %>% plot() dend %>% set("branches_lwd", 2) %>% set("branches_lty", 2) %>% has_edgePar("lty") dend %>% set("branches_lwd", 2) %>% has_edgePar("lty") dend %>% set("branches_lwd", 2) %>% has_edgePar("lwd") dend %>% set("branches_lwd", 2) %>% set("clear_branches") %>% has_edgePar("lwd")
dat <- iris[1:20, -5] hca <- hclust(dist(dat)) hca2 <- hclust(dist(dat), method = "single") dend <- as.dendrogram(hca) dend2 <- as.dendrogram(hca2) dend %>% set("branches_lwd", 2) %>% set("branches_lty", 2) %>% plot() dend %>% set("branches_lwd", 2) %>% set("branches_lty", 2) %>% has_edgePar("lty") dend %>% set("branches_lwd", 2) %>% has_edgePar("lty") dend %>% set("branches_lwd", 2) %>% has_edgePar("lwd") dend %>% set("branches_lwd", 2) %>% set("clear_branches") %>% has_edgePar("lwd")
Which height will result in which k for a dendrogram. This helps with speeding up the cutree.dendrogram function.
heights_per_k.dendrogram(dend, ...)
heights_per_k.dendrogram(dend, ...)
dend |
a dendrogram. |
... |
not used. |
a vector of heights, with its names being the k clusters that will result for cutting the dendrogram at each height.
## Not run: hc <- hclust(dist(USArrests[1:4, ]), "ave") dend <- as.dendrogram(hc) heights_per_k.dendrogram(dend) ## 1 2 3 4 ## 86.47086 68.84745 45.98871 28.36531 cutree(hc, h = 68.8) # and indeed we get 2 clusters unbranch_dend <- unbranch(dend, 2) plot(unbranch_dend) heights_per_k.dendrogram(unbranch_dend) # 1 3 4 # 97.90023 57.41808 16.93594 # we do NOT have a height for k=2 because of the tree's structure. ## End(Not run)
## Not run: hc <- hclust(dist(USArrests[1:4, ]), "ave") dend <- as.dendrogram(hc) heights_per_k.dendrogram(dend) ## 1 2 3 4 ## 86.47086 68.84745 45.98871 28.36531 cutree(hc, h = 68.8) # and indeed we get 2 clusters unbranch_dend <- unbranch(dend, 2) plot(unbranch_dend) heights_per_k.dendrogram(unbranch_dend) # 1 3 4 # 97.90023 57.41808 16.93594 # we do NOT have a height for k=2 because of the tree's structure. ## End(Not run)
Highlights (update) the color (col) and/or line width (lwd) of each branch in a dendrogram based on it's node's height. This is a powerful pre-processing for a tanglegram plot of two dendrograms, as it emphasizes the toplogical structure of each tree (and hence, their similarity and differences).
The colors are based on the viridis pallette, and the line width is on the range of 1 to 10. These can be manually changed when using highlight_branches_col and highlight_branches_lwd respectively.
highlight_branches_col(dend, values = rev(viridis(1000, end = 0.9)), ...) highlight_branches_lwd(dend, values = seq(1, 10, length.out = 1000), ...) highlight_branches(dend, type = c("col", "lwd"), ...)
highlight_branches_col(dend, values = rev(viridis(1000, end = 0.9)), ...) highlight_branches_lwd(dend, values = seq(1, 10, length.out = 1000), ...) highlight_branches(dend, type = c("col", "lwd"), ...)
dend |
a dendrogram tree (to be turned into a ggdend object) |
values |
the gradient of values to be used for each branch. The colors are based on the viridis pallette, and the line width is on the range of 1 to 10. These can be manually changed when using highlight_branches_col and highlight_branches_lwd respectively. |
... |
Currently ignored. |
type |
a character vector. Either "col", "lwd", or both. Based on whichever is chosen the dendrogram's branches will be updated. |
A modified dendrogram, with colors/line-width in the branches that are proportional to each branche's height (measured by its lower tip).
set, color_branches, get_branches_heights, viridis
dat <- iris[1:20, -5] hca <- hclust(dist(dat)) hca2 <- hclust(dist(dat), method = "single") dend <- as.dendrogram(hca) dend2 <- as.dendrogram(hca2) par(mfrow = c(1, 3)) dend %>% highlight_branches_col() %>% plot(main = "Coloring branches") dend %>% highlight_branches_lwd() %>% plot(main = "Emphasizing line-width") dend %>% highlight_branches() %>% plot(main = "Emphasizing color\n and line-width") library(viridis) par(mfrow = c(1, 3)) dend %>% highlight_branches_col() %>% plot(main = "Coloring branches \n(default is reversed viridis)") dend %>% highlight_branches_col(viridis(100)) %>% plot(main = "It is better to use\nlighter colors in the leaves") dend %>% highlight_branches_col(rev(magma(1000))) %>% plot(main = "The magma color pallatte\n is also good") dl <- dendlist(dend, dend2) tanglegram(dl, sort = TRUE, common_subtrees_color_lines = FALSE, highlight_distinct_edges = FALSE, highlight_branches_lwd = FALSE ) tanglegram(dl) tanglegram(dl, fast = TRUE) dl <- dendlist(highlight_branches(dend), highlight_branches(dend2)) tanglegram(dl, sort = TRUE, common_subtrees_color_lines = FALSE, highlight_distinct_edges = FALSE) dend %>% set("highlight_branches_col") %>% plot() dl <- dendlist(dend, dend2) %>% set("highlight_branches_col") tanglegram(dl, sort = TRUE, common_subtrees_color_lines = FALSE, highlight_distinct_edges = FALSE) # This is also useful for heatmaps # -------------------------- # library(dendextend) x <- as.matrix(datasets::mtcars) Rowv <- x %>% dist() %>% hclust() %>% as.dendrogram() %>% set("branches_k_color", k = 3) %>% set("highlight_branches_lwd") %>% ladderize() # rotate_DendSer(ser_weight = dist(x)) Colv <- x %>% t() %>% dist() %>% hclust() %>% as.dendrogram() %>% set("branches_k_color", k = 2) %>% set("highlight_branches_lwd") %>% ladderize() # rotate_DendSer(ser_weight = dist(t(x))) library(gplots) heatmap.2(x, Rowv = Rowv, Colv = Colv)
dat <- iris[1:20, -5] hca <- hclust(dist(dat)) hca2 <- hclust(dist(dat), method = "single") dend <- as.dendrogram(hca) dend2 <- as.dendrogram(hca2) par(mfrow = c(1, 3)) dend %>% highlight_branches_col() %>% plot(main = "Coloring branches") dend %>% highlight_branches_lwd() %>% plot(main = "Emphasizing line-width") dend %>% highlight_branches() %>% plot(main = "Emphasizing color\n and line-width") library(viridis) par(mfrow = c(1, 3)) dend %>% highlight_branches_col() %>% plot(main = "Coloring branches \n(default is reversed viridis)") dend %>% highlight_branches_col(viridis(100)) %>% plot(main = "It is better to use\nlighter colors in the leaves") dend %>% highlight_branches_col(rev(magma(1000))) %>% plot(main = "The magma color pallatte\n is also good") dl <- dendlist(dend, dend2) tanglegram(dl, sort = TRUE, common_subtrees_color_lines = FALSE, highlight_distinct_edges = FALSE, highlight_branches_lwd = FALSE ) tanglegram(dl) tanglegram(dl, fast = TRUE) dl <- dendlist(highlight_branches(dend), highlight_branches(dend2)) tanglegram(dl, sort = TRUE, common_subtrees_color_lines = FALSE, highlight_distinct_edges = FALSE) dend %>% set("highlight_branches_col") %>% plot() dl <- dendlist(dend, dend2) %>% set("highlight_branches_col") tanglegram(dl, sort = TRUE, common_subtrees_color_lines = FALSE, highlight_distinct_edges = FALSE) # This is also useful for heatmaps # -------------------------- # library(dendextend) x <- as.matrix(datasets::mtcars) Rowv <- x %>% dist() %>% hclust() %>% as.dendrogram() %>% set("branches_k_color", k = 3) %>% set("highlight_branches_lwd") %>% ladderize() # rotate_DendSer(ser_weight = dist(x)) Colv <- x %>% t() %>% dist() %>% hclust() %>% as.dendrogram() %>% set("branches_k_color", k = 2) %>% set("highlight_branches_lwd") %>% ladderize() # rotate_DendSer(ser_weight = dist(t(x))) library(gplots) heatmap.2(x, Rowv = Rowv, Colv = Colv)
Highlight distint edges in a tree (compared to another one) by changing the branches' color, line width, or line type.
This function enables this feature in dend_diff and tanglegram
highlight_distinct_edges(dend, ...) ## S3 method for class 'dendrogram' highlight_distinct_edges( dend, dend2, value = 2, edgePar = c("col", "lty", "lwd"), ... ) ## S3 method for class 'dendlist' highlight_distinct_edges(dend, ..., which = c(1L, 2L))
highlight_distinct_edges(dend, ...) ## S3 method for class 'dendrogram' highlight_distinct_edges( dend, dend2, value = 2, edgePar = c("col", "lty", "lwd"), ... ) ## S3 method for class 'dendlist' highlight_distinct_edges(dend, ..., which = c(1L, 2L))
dend |
a dendrogram or dendlist to find unique edges in (to highlight) |
... |
Ignored. |
dend2 |
a dendrogram to compare with |
value |
a new value scalar for the edgePar attribute. |
edgePar |
a character indicating the value inside edgePar to adjust. Can be either "col", "lty", or "lwd". |
which |
an integer vector indicating, in the case "dend" is a dendlist, on which of the trees should the modification be performed. If missing - the change will be performed on all of objects in the dendlist. |
A dendrogram with modified edges - the distinct ones are changed (color, line width, or line type)
distinct_edges, highlight_distinct_edges, dist.dendlist, tanglegram assign_values_to_branches_edgePar, distinct.edges,
x <- 1:5 %>% dist() %>% hclust() %>% as.dendrogram() y <- set(x, "labels", 5:1) distinct_edges(x, y) distinct_edges(y, x) par(mfrow = c(1, 2)) plot(highlight_distinct_edges(x, y)) plot(y) # tanglegram(highlight_distinct_edges(x, y),y) # dend_diff(x, y) ## Not run: # using highlight_distinct_edges combined with dendlist and set # to clearly highlight "stable" branches. data(iris) ss <- c(1:5, 51:55, 101:105) iris1 <- iris[ss, -5] %>% dist() %>% hclust(method = "single") %>% as.dendrogram() iris2 <- iris[ss, -5] %>% dist() %>% hclust(method = "complete") %>% as.dendrogram() iris12 <- dendlist(iris1, iris2) %>% set("branches_k_color", k = 3) %>% set("branches_lwd", 3) %>% highlight_distinct_edges(value = 1, edgePar = "lwd") iris12 %>% untangle(method = "step2side") %>% tanglegram( sub = "Iris dataset", main_left = "'single' clustering", main_right = "'complete' clustering" ) ## End(Not run)
x <- 1:5 %>% dist() %>% hclust() %>% as.dendrogram() y <- set(x, "labels", 5:1) distinct_edges(x, y) distinct_edges(y, x) par(mfrow = c(1, 2)) plot(highlight_distinct_edges(x, y)) plot(y) # tanglegram(highlight_distinct_edges(x, y),y) # dend_diff(x, y) ## Not run: # using highlight_distinct_edges combined with dendlist and set # to clearly highlight "stable" branches. data(iris) ss <- c(1:5, 51:55, 101:105) iris1 <- iris[ss, -5] %>% dist() %>% hclust(method = "single") %>% as.dendrogram() iris2 <- iris[ss, -5] %>% dist() %>% hclust(method = "complete") %>% as.dendrogram() iris12 <- dendlist(iris1, iris2) %>% set("branches_k_color", k = 3) %>% set("branches_lwd", 3) %>% highlight_distinct_edges(value = 1, edgePar = "lwd") iris12 %>% untangle(method = "step2side") %>% tanglegram( sub = "Iris dataset", main_left = "'single' clustering", main_right = "'complete' clustering" ) ## End(Not run)
Just like identify.hclust: reads the position of the graphics pointer when the (first) mouse button is pressed. It then cuts the tree at the vertical position of the pointer and highlights the cluster containing the horizontal position of the pointer. Optionally a function is applied to the index of data points contained in the cluster.
## S3 method for class 'dendrogram' identify( x, FUN = NULL, N = 20, MAXCLUSTER, DEV.FUN = NULL, horiz = FALSE, stop_if_out = FALSE, ... )
## S3 method for class 'dendrogram' identify( x, FUN = NULL, N = 20, MAXCLUSTER, DEV.FUN = NULL, horiz = FALSE, stop_if_out = FALSE, ... )
x |
a dendrogram object. |
FUN |
(optional) function to be applied to the index numbers of the data points in a cluster (see 'Details' below). |
N |
the maximum number of clusters to be identified. |
MAXCLUSTER |
the maximum number of clusters that can be produced by a cut (limits the effective vertical range of the pointer). |
DEV.FUN |
(optional) integer scalar. If specified, the corresponding graphics device is made active before FUN is applied. |
horiz |
logical (FALSE), indicating if the rectangles should be drawn horizontally or not (for when using plot(dend, horiz = TRUE) ) . |
stop_if_out |
logical (default is FALSE). This default makes the function NOT stop if k of the locator is outside the range (this default is different than the behavior of the identify.hclust function - but it is nicer for the user.). |
... |
further arguments to FUN. |
By default clusters can be identified using the mouse and an invisible list of indices of the respective data points is returned. If FUN is not NULL, then the index vector of data points is passed to this function as first argument, see the examples below. The active graphics device for FUN can be specified using DEV.FUN. The identification process is terminated by pressing any mouse button other than the first, see also identify.
(Invisibly) returns a list where each element contains a vector of data points contained in the respective cluster.
This function is based on identify.hclust, with slight modifications to have it work with a dendrogram, as well as adding "horiz"
identify.hclust, rect.hclust, order.dendrogram, cutree.dendrogram
## Not run: set.seed(23235) ss <- sample(1:150, 10) hc <- iris[ss, -5] %>% dist() %>% hclust() dend <- hc %>% as.dendrogram() plot(dend) identify(dend) plot(dend, horiz = TRUE) identify(dend, horiz = TRUE) ## End(Not run)
## Not run: set.seed(23235) ss <- sample(1:150, 10) hc <- iris[ss, -5] %>% dist() %>% hclust() dend <- hc %>% as.dendrogram() plot(dend) identify(dend) plot(dend, horiz = TRUE) identify(dend, horiz = TRUE) ## End(Not run)
Return two trees after pruning them so that the only leaves left are the intersection of their labels.
intersect_trees(dend1, dend2, warn = dendextend_options("warn"), ...)
intersect_trees(dend1, dend2, warn = dendextend_options("warn"), ...)
dend1 |
tree object (dendrogram/hclust/phylo) |
dend2 |
tree object (dendrogram/hclust/phylo) |
warn |
logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE. Should a warning be issued if there was a need to perform intersaction. |
... |
passed on |
A dendlist with two pruned trees
hc <- hclust(dist(USArrests[1:5, ]), "ave") dend <- as.dendrogram(hc) labels(dend) <- 1:5 dend1 <- prune(dend, 1) dend2 <- prune(dend, 5) intersect_dend <- intersect_trees(dend1, dend2) layout(matrix(c(1, 1, 2, 3, 4, 5), 3, 2, byrow = TRUE)) plot(dend, main = "Original tree") plot(dend1, main = "Tree 1:\n original with label 1 pruned") plot(dend2, main = "Tree 2:\n original with label 2 pruned") plot(intersect_dend[[1]], main = "Tree 1 pruned with the labels that intersected with those of Tree 2" ) plot(intersect_dend[[2]], main = "Tree 2 pruned with the labels that intersected with those of Tree 1" )
hc <- hclust(dist(USArrests[1:5, ]), "ave") dend <- as.dendrogram(hc) labels(dend) <- 1:5 dend1 <- prune(dend, 1) dend2 <- prune(dend, 5) intersect_dend <- intersect_trees(dend1, dend2) layout(matrix(c(1, 1, 2, 3, 4, 5), 3, 2, byrow = TRUE)) plot(dend, main = "Original tree") plot(dend1, main = "Tree 1:\n original with label 1 pruned") plot(dend2, main = "Tree 2:\n original with label 2 pruned") plot(intersect_dend[[1]], main = "Tree 1 pruned with the labels that intersected with those of Tree 2" ) plot(intersect_dend[[2]], main = "Tree 2 pruned with the labels that intersected with those of Tree 1" )
Checks if the value is and empty list(). Can be useful.
is_null_list(x)
is_null_list(x)
x |
whatever object to check |
logical
# I can run this only if I'd make is_null_list exported ## Not run: # TRUE: is_null_list(list()) # FALSE is_null_list(list(1)) is_null_list(1) x <- list(1, list(), 123) ss_list <- sapply(x, is_null_list) x <- x[!ss_list] x x <- list(1, list(), 123) ss_list <- sapply(x, is_null_list) x <- list(list()) x ## End(Not run) ## Not run: # error is_null_list() ## End(Not run)
# I can run this only if I'd make is_null_list exported ## Not run: # TRUE: is_null_list(list()) # FALSE is_null_list(list(1)) is_null_list(1) x <- list(1, list(), 123) ss_list <- sapply(x, is_null_list) x <- x[!ss_list] x x <- list(1, list(), 123) ss_list <- sapply(x, is_null_list) x <- list(list()) x ## End(Not run) ## Not run: # error is_null_list() ## End(Not run)
Returns TRUE if some class (based on the name of the function).
is.hclust(x) is.dendrogram(x) is.phylo(x) is.dendlist(x) is.dist(x)
is.hclust(x) is.dendrogram(x) is.phylo(x) is.dendlist(x) is.dist(x)
x |
an object. |
Returns TRUE if some class (based on the name of the function).
# TRUE: is.dendlist(dendlist()) # FALSE is.dendlist(1) # TRUE: is.dist(dist(mtcars)) # FALSE is.dist(mtcars)
# TRUE: is.dendlist(dendlist()) # FALSE is.dendlist(1) # TRUE: is.dist(dist(mtcars)) # FALSE is.dist(mtcars)
Vectorized function for checking if numbers are natural or not. Helps in checking if a vector is of type "order".
is.natural.number(x, tol = .Machine$double.eps^0.5, ...)
is.natural.number(x, tol = .Machine$double.eps^0.5, ...)
x |
a vector of numbers |
tol |
tolerence to floating point issues. |
... |
(not currently in use) |
logical - is the entered number natural or not.
Marco Gallotta (a.k.a: marcog), Tal Galili
This function was written by marcog, as an answer to my question here: https://stackoverflow.com/questions/4562257/what-is-the-fastest-way-to-check-if-a-number-is-a-positive-natural-number-in-r
is.numeric
, is.double
, is.integer
is.natural.number(1) # is TRUE (x <- seq(-1, 5, by = 0.5)) is.natural.number(x) # is.natural.number( "a" ) all(is.natural.number(x))
is.natural.number(1) # is TRUE (x <- seq(-1, 5, by = 0.5)) is.natural.number(x) # is.natural.number( "a" ) all(is.natural.number(x))
Khan contains gene expression profiles of four types of small round blue cell tumours of childhood (SRBCT) published by Khan et al. (2001). It also contains further gene annotation retrieved from SOURCE at http://source.stanford.edu/.
khan
khan
Khan is dataset containing the following:
data.frame
of 306 rows and 64 columns.
The training dataset of 64 arrays and 306 gene expression values
data.frame
, of 306 rows and 25 columns.
The test dataset of 25 arrays and 306 genes expression values
vector
of 306 Image clone identifiers
corresponding to the rownames of train and test.
factor
with 4 levels "EWS",
"BL-NHL", "NB" and "RMS", which correspond to the four groups in
the train dataset
factor
with 5 levels "EWS",
"BL-NHL", "NB", "RMS" and "Norm" which correspond to the five
groups in the test dataset
data.frame
of 306 rows and 8 columns.
This table contains further gene annotation retrieved from SOURCE
http://SOURCE.stanford.edu in May 2004. For each of the 306 genes,
it contains:
Image Clone ID
The Unigene cluster to which the gene is assigned
The HUGO gene symbol
The locus ID
Nucleotide sequence accession number
Protein sequence accession number
chromosome location
cytoband location
Khan et al., 2001 used cDNA microarrays containing 6567 clones of which 3789 were known genes and 2778 were ESTs to study the expression of genes in of four types of small round blue cell tumours of childhood (SRBCT). These were neuroblastoma (NB), rhabdomyosarcoma (RMS), Burkitt lymphoma, a subset of non-Hodgkin lymphoma (BL), and the Ewing family of tumours (EWS). Gene expression profiles from both tumour biopsy and cell line samples were obtained and are contained in this dataset. The dataset downloaded from the website contained the filtered dataset of 2308 gene expression profiles as described by Khan et al., 2001. This dataset is available from the http://bioinf.ucd.ie/people/aedin/R/.
In order to reduce the size of the MADE4 package, and produce small example datasets, the top 50 genes from the
ends of 3 axes following bga
were selected. This produced a reduced datasets of 306 genes.
khan
contains a filtered data of 2308 gene expression profiles
as published and provided by Khan et al. (2001) on the supplementary
web site to their publication
OLD (site no longer found): https://research.nhgri.nih.gov/microarray/
The data was copied from the made4 package (https://www.bioconductor.org/packages/release/bioc/html/made4.html)
Culhane AC, et al., 2002 Between-group analysis of microarray data. Bioinformatics. 18(12):1600-8.
Khan,J., Wei,J.S., Ringner,M., Saal,L.H., Ladanyi,M., Westermann,F., Berthold,F., Schwab,M., Antonescu,C.R., Peterson,C. et al. (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med., 7, 673-679.
data(khan) summary(khan)
data(khan) summary(khan)
Retrieve/assign cex to the labels of a dendrogram
labels_cex(dend, ...) labels_cex(dend, ...) <- value
labels_cex(dend, ...) labels_cex(dend, ...) <- value
dend |
a dendrogram object |
... |
not used |
value |
a vector of cex to be used as new label's size for the dendrogram |
A vector with the dendrogram's labels sizes (NULL if none are supplied).
# define dendrogram object to play with: dend <- as.dendrogram(hclust(dist(USArrests[1:3, ]), "ave")) # Defaults: labels_cex(dend) plot(dend) # let's add some color: labels_cex(dend) <- 1:3 labels_cex(dend) plot(dend) labels_cex(dend) <- 1 labels_cex(dend) plot(dend)
# define dendrogram object to play with: dend <- as.dendrogram(hclust(dist(USArrests[1:3, ]), "ave")) # Defaults: labels_cex(dend) plot(dend) # let's add some color: labels_cex(dend) <- 1:3 labels_cex(dend) plot(dend) labels_cex(dend) <- 1 labels_cex(dend) plot(dend)
Retrieve/assign colors to the labels of a dendrogram. Note that usually dend objects come without any color assignment (and the output will be NULL, until colors are assigned).
labels_colors(dend, labels = TRUE, ...) labels_col(dend, labels = TRUE, ...) labels_colors(dend, ...) <- value
labels_colors(dend, labels = TRUE, ...) labels_col(dend, labels = TRUE, ...) labels_colors(dend, ...) <- value
dend |
a dendrogram object |
labels |
Boolean (default is TRUE), should the returned vector of colors return with the leaves labels as names. |
... |
not used |
value |
a vector of colors to be used as new label's colors for the dendrogram |
A vector with the dendrogram's labels colors (or a colored dendrogram, in case assignment is used). The colors are labeled.
Heavily inspired by the code in the example of dendrapply, so credit should go to Martin Maechler. I also implemented some ideas from Gregory Jefferis's dendroextras package (having the "names" of the returned vector be the labels).
cutree
,dendrogram
,
hclust
, color_labels
, color_branches
,
assign_values_to_leaves_edgePar, get_leaves_branches_col
# define dendrogram object to play with: hc <- hclust(dist(USArrests[1:3, ]), "ave") dend <- as.dendrogram(hc) # Defaults: labels_colors(dend) plot(dend) # let's add some color: labels_colors(dend) <- 2:4 labels_colors(dend) plot(dend) # doesn't work... # get_nodes_attr(dend, "nodePar", include_branches = FALSE) # changing color to black labels_colors(dend) <- 1 labels_colors(dend) plot(dend) # removing color (and the nodePar completely - if it has no other attributed but lab.col) suppressWarnings(labels_colors(dend) <- NULL) labels_colors(dend) plot(dend)
# define dendrogram object to play with: hc <- hclust(dist(USArrests[1:3, ]), "ave") dend <- as.dendrogram(hc) # Defaults: labels_colors(dend) plot(dend) # let's add some color: labels_colors(dend) <- 2:4 labels_colors(dend) plot(dend) # doesn't work... # get_nodes_attr(dend, "nodePar", include_branches = FALSE) # changing color to black labels_colors(dend) <- 1 labels_colors(dend) plot(dend) # removing color (and the nodePar completely - if it has no other attributed but lab.col) suppressWarnings(labels_colors(dend) <- NULL) labels_colors(dend) plot(dend)
"label" assignment operator for vectors, dendrogram, and hclust classes.
labels(object, ...) <- value ## Default S3 replacement method: labels(object, ...) <- value ## S3 replacement method for class 'dendrogram' labels(object, ...) <- value ## S3 method for class 'hclust' labels(object, order = TRUE, ...) ## S3 replacement method for class 'hclust' labels(object, ...) <- value ## S3 method for class 'phylo' labels(object, ...) ## S3 replacement method for class 'phylo' labels(object, ...) <- value
labels(object, ...) <- value ## Default S3 replacement method: labels(object, ...) <- value ## S3 replacement method for class 'dendrogram' labels(object, ...) <- value ## S3 method for class 'hclust' labels(object, order = TRUE, ...) ## S3 replacement method for class 'hclust' labels(object, ...) <- value ## S3 method for class 'phylo' labels(object, ...) ## S3 replacement method for class 'phylo' labels(object, ...) <- value
object |
a variable name (possibly quoted) who's label are to be updated |
... |
parameters passed (not currently in use) |
value |
a value to be assigned to object's label |
order |
default is FALSE. Only relevant for extracting labels from an hclust object (with labels.hclust). Setting order=TRUE will return labels in their order in the dendrogram, instead of the riginal labels order retained from object$labels - which ususally corresponding to the row or column names of the dist object provided to the hclust function. |
###################
The updated object
Gavin Simpson, Tal Galili (with some ideas from Gregory Jefferis's dendroextras package)
The functions here are based on code by Gavin and kohske from (adopted to dendrogram by Tal Galili): https://stackoverflow.com/questions/4614223/how-to-have-the-following-work-labelsx-some-value-r-question Also with some ideas from Gregory Jefferis's dendroextras package.
x <- 1:3 labels(x) labels(x) <- letters[1:3] labels(x) # [1] "a" "b" "c" x # a b c # 1 2 3 # get("labels<-") ################ # Example for using the assignment with dendrogram and hclust objects: hc <- hclust(dist(USArrests[1:3, ]), "ave") dend <- as.dendrogram(hc) labels(hc) # "Arizona" "Alabama" "Alaska" labels(hc) <- letters[1:3] labels(hc) # "a" "b" "c" labels(dend) # "Arizona" "Alabama" "Alaska" labels(dend) <- letters[1:3] labels(dend) # "a" "b" "c" suppressWarnings(labels(dend) <- LETTERS[1:2]) # will produce a warning labels(dend) # "A" "B" "A" labels(dend) <- LETTERS[4:6] # will replace the labels correctly # (the fact the tree had duplicate labels will not cause a problem) labels(dend) # "D" "E" "F"
x <- 1:3 labels(x) labels(x) <- letters[1:3] labels(x) # [1] "a" "b" "c" x # a b c # 1 2 3 # get("labels<-") ################ # Example for using the assignment with dendrogram and hclust objects: hc <- hclust(dist(USArrests[1:3, ]), "ave") dend <- as.dendrogram(hc) labels(hc) # "Arizona" "Alabama" "Alaska" labels(hc) <- letters[1:3] labels(hc) # "a" "b" "c" labels(dend) # "Arizona" "Alabama" "Alaska" labels(dend) <- letters[1:3] labels(dend) # "a" "b" "c" suppressWarnings(labels(dend) <- LETTERS[1:2]) # will produce a warning labels(dend) # "A" "B" "A" labels(dend) <- LETTERS[4:6] # will replace the labels correctly # (the fact the tree had duplicate labels will not cause a problem) labels(dend) # "D" "E" "F"
This function reorganizes the internal structure of the tree to get the ladderized effect when plotted.
ladderize(x, right = TRUE, ...) ## S3 method for class 'dendrogram' ladderize(x, right = TRUE, ...) ## S3 method for class 'phylo' ladderize(x, right = TRUE, phy, ...) ## S3 method for class 'dendlist' ladderize(x, right = TRUE, which, ...)
ladderize(x, right = TRUE, ...) ## S3 method for class 'dendrogram' ladderize(x, right = TRUE, ...) ## S3 method for class 'phylo' ladderize(x, right = TRUE, phy, ...) ## S3 method for class 'dendlist' ladderize(x, right = TRUE, which, ...)
x |
a tree object (either a dendrogram, dendlist, or phylo) |
right |
a logical (TRUE) specifying whether the smallest clade is on the right-hand side (when the tree is plotted upwards), or the opposite (if FALSE). |
... |
Currently ignored. |
phy |
a placeholder in case the user uses "phy =" |
which |
an integer (can have any number of elements). It indicates the elements in the dendlist to ladderize. If missing, it will ladderize all the dendrograms in the dendlist. |
A rotated tree object
ladderize
, rev.dendrogram
, rotate
, rotate
dend <- USArrests[1:8, ] %>% dist() %>% hclust() %>% as.dendrogram() %>% set("labels_colors") %>% set("branches_k_color", k = 5) set.seed(123) dend <- shuffle(dend) par(mfrow = c(1, 3)) dend %>% plot(main = "Original") dend %>% ladderize(TRUE) %>% plot(main = "Right (default)") dend %>% ladderize(FALSE) %>% plot(main = "Left (rev of right)")
dend <- USArrests[1:8, ] %>% dist() %>% hclust() %>% as.dendrogram() %>% set("labels_colors") %>% set("branches_k_color", k = 5) set.seed(123) dend <- shuffle(dend) par(mfrow = c(1, 3)) dend %>% plot(main = "Original") dend %>% ladderize(TRUE) %>% plot(main = "Right (default)") dend %>% ladderize(FALSE) %>% plot(main = "Left (rev of right)")
The returned Colors will be in dendrogram order.
leaf_Colors(d, col_to_return = c("edge", "node", "label"))
leaf_Colors(d, col_to_return = c("edge", "node", "label"))
d |
the dendrogram |
col_to_return |
Character scalar - kind of Color attribute to return |
named character vector of Colors, NA_character_ where missing
jefferis
dend <- USArrests %>% dist() %>% hclust(method = "ave") %>% as.dendrogram() d5 <- color_branches(dend, 5) leaf_Colors(d5)
dend <- USArrests %>% dist() %>% hclust(method = "ave") %>% as.dendrogram() d5 <- color_branches(dend, 5) leaf_Colors(d5)
Given two vectors, for two items, of cluster belonging - the function finds the lowest branch (e.g: largest number of k clusters) for which the two items are in the same cluster for the two trees.
lowest_common_branch(item1, item2, ...)
lowest_common_branch(item1, item2, ...)
item1 |
a named numeric vector (of cluster group with names of k level) |
item2 |
a named numeric vector (of cluster group with names of k level) |
... |
not used |
The first location (from left) where the two vectors have the same A dendrogram, after adjusting the members attr in all of its nodes.
item1 <- structure(c(1L, 1L, 1L, 1L), .Names = c("1", "2", "3", "4")) item2 <- structure(c(1L, 1L, 2L, 2L), .Names = c("1", "2", "3", "4")) lowest_common_branch(item1, item2)
item1 <- structure(c(1L, 1L, 1L, 1L), .Names = c("1", "2", "3", "4")) item2 <- structure(c(1L, 1L, 2L, 2L), .Names = c("1", "2", "3", "4")) lowest_common_branch(item1, item2)
Takes one dendrogram and adjusts its order leaves valeus based on the order of another dendrogram. The values are matached based on the labels of the two dendrograms.
This allows for faster entanglement running time, since we can be sure that the leaves order is just as using their labels.
match_order_by_labels( dend_change, dend_template, check_that_labels_match = TRUE )
match_order_by_labels( dend_change, dend_template, check_that_labels_match = TRUE )
dend_change |
tree object (dendrogram) |
dend_template |
tree object (dendrogram) |
check_that_labels_match |
logical (TRUE). If to check that the labels in the two dendrogram match. (if they do not, the function aborts) |
Returns dend_change after adjusting its order values to be like dend_template.
## Not run: dend <- USArrests[1:4, ] %>% dist() %>% hclust() %>% as.dendrogram() order.dendrogram(dend) # c(4L, 3L, 1L, 2L) dend_changed <- dend order.dendrogram(dend_changed) <- 1:4 order.dendrogram(dend_changed) # c(1:4) # now let's fix the order of the new object to be as it was: dend_changed <- match_order_by_labels(dend_changed, dend) # these two are now the same: order.dendrogram(dend_changed) order.dendrogram(dend) ## End(Not run)
## Not run: dend <- USArrests[1:4, ] %>% dist() %>% hclust() %>% as.dendrogram() order.dendrogram(dend) # c(4L, 3L, 1L, 2L) dend_changed <- dend order.dendrogram(dend_changed) <- 1:4 order.dendrogram(dend_changed) # c(1:4) # now let's fix the order of the new object to be as it was: dend_changed <- match_order_by_labels(dend_changed, dend) # these two are now the same: order.dendrogram(dend_changed) order.dendrogram(dend) ## End(Not run)
Takes one dendrogram and adjusts its order leaves valeus based on the order of another dendrogram. The values are matached based on the order of the two dendrograms.
This allows for faster entanglement running time, since we can be sure that the leaves order is just as using their labels.
This is a function is FASTER than match_order_by_labels, but it assumes that the order and the labels of the two trees are matching!!
This will allow for a faster calculation of entanglement.
match_order_dendrogram_by_old_order( dend_change, dend_template, dend_change_old_order, check_that_labels_match = FALSE, check_that_leaves_order_match = FALSE )
match_order_dendrogram_by_old_order( dend_change, dend_template, dend_change_old_order, check_that_labels_match = FALSE, check_that_leaves_order_match = FALSE )
dend_change |
tree object (dendrogram) |
dend_template |
tree object (dendrogram) |
dend_change_old_order |
a numeric vector with the order of leaves in dend_change (at least before it was changes for some reason). This is the vector based on which we adjust the new values of dend_change. |
check_that_labels_match |
logical (FALSE). If to check that the labels in the two dendrogram match. (if they do not, the function aborts) |
check_that_leaves_order_match |
logical (FALSE). If to check that the order in the two dendrogram match. (if they do not, the function aborts) |
Returns dend_change after adjusting its order values to be like dend_template.
entanglement , tanglegram, match_order_by_labels
## Not run: dend <- USArrests[1:4, ] %>% dist() %>% hclust() %>% as.dendrogram() order.dendrogram(dend) # c(4L, 3L, 1L, 2L) # Watch this! dend_changed <- dend dend_changed <- rev(dend_changed) expect_false(identical(order.dendrogram(dend_changed), order.dendrogram(dend))) # we keep the order of dend_change, so that the leaves order are synced # with their labels JUST LIKE dend: old_dend_changed_order <- order.dendrogram(dend_changed) # now we change dend_changed leaves order values: order.dendrogram(dend_changed) <- 1:4 # and we can fix them again, based on their old kept leaves order: dend_changed <- match_order_dendrogram_by_old_order( dend_changed, dend, old_dend_changed_order ) expect_identical(order.dendrogram(dend_changed), order.dendrogram(dend)) ## End(Not run)
## Not run: dend <- USArrests[1:4, ] %>% dist() %>% hclust() %>% as.dendrogram() order.dendrogram(dend) # c(4L, 3L, 1L, 2L) # Watch this! dend_changed <- dend dend_changed <- rev(dend_changed) expect_false(identical(order.dendrogram(dend_changed), order.dendrogram(dend))) # we keep the order of dend_change, so that the leaves order are synced # with their labels JUST LIKE dend: old_dend_changed_order <- order.dendrogram(dend_changed) # now we change dend_changed leaves order values: order.dendrogram(dend_changed) <- 1:4 # and we can fix them again, based on their old kept leaves order: dend_changed <- match_order_dendrogram_by_old_order( dend_changed, dend, old_dend_changed_order ) expect_identical(order.dendrogram(dend_changed), order.dendrogram(dend)) ## End(Not run)
As the name implies. This can also work for non-dendrogram nested lists.
min_depth(dend, ...) max_depth(dend, ...)
min_depth(dend, ...) max_depth(dend, ...)
dend |
Any nested list object (including dendrogram). |
... |
unused at the moment. |
Integer, the (min/max) number of nodes from the root to the leafs
hc <- hclust(dist(USArrests), "ave") (dend1 <- as.dendrogram(hc)) # "print()" method is.list(dend1) is.list(dend1[[1]][[1]][[1]]) dend1[[1]][[1]][[1]] plot(dend1) min_depth(dend1) max_depth(dend1)
hc <- hclust(dist(USArrests), "ave") (dend1 <- as.dendrogram(hc)) # "print()" method is.list(dend1) is.list(dend1[[1]][[1]][[1]]) dend1[[1]][[1]][[1]] plot(dend1) min_depth(dend1) max_depth(dend1)
A function for replacing each NA with the most recent non-NA prior to it.
na_locf(x, first_na_value = 0, recursive = TRUE, ...)
na_locf(x, first_na_value = 0, recursive = TRUE, ...)
x |
some vector |
first_na_value |
If the first observation is NA, fill it with "first_na_value" |
recursive |
logical (TRUE). Should na_locf be re-run until all NA values are filled? |
... |
ignored. |
The original vector, but with all the missing values filled by the value before them.
https://stat.ethz.ch/pipermail/r-help/2003-November/042126.html https://stackoverflow.com/questions/5302049/last-observation-carried-forward-na-locf-on-panel-cross-section-time-series
This could probably be solved MUCH faster using Rcpp.
na_locf(c(NA, NA)) na_locf(c(1, NA)) na_locf(c(1, NA, NA, NA)) na_locf(c(1, NA, NA, NA, 2, 2, NA, 3, NA, 4)) na_locf(c(1, NA, NA, NA, 2, 2, NA, 3, NA, 4), recursive = FALSE) ## Not run: # library(microbenchmark) # library(zoo) # microbenchmark( # na_locf = na_locf(c(1, NA, NA, NA, 2, 2, NA, 3, NA, 4)), # na.locf = na.locf(c(1, NA, NA, NA, 2, 2, NA, 3, NA, 4)) #) # my implementation is 6 times faster :) #microbenchmark( # na_locf = na_locf(rep(c(1, NA, NA, NA, 2, 2, NA, 3, NA, 4), 1000)), # na.locf = na.locf(rep(c(1, NA, NA, NA, 2, 2, NA, 3, NA, 4), 1000)) # ) # my implementation is 3 times faster ## End(Not run)
na_locf(c(NA, NA)) na_locf(c(1, NA)) na_locf(c(1, NA, NA, NA)) na_locf(c(1, NA, NA, NA, 2, 2, NA, 3, NA, 4)) na_locf(c(1, NA, NA, NA, 2, 2, NA, 3, NA, 4), recursive = FALSE) ## Not run: # library(microbenchmark) # library(zoo) # microbenchmark( # na_locf = na_locf(c(1, NA, NA, NA, 2, 2, NA, 3, NA, 4)), # na.locf = na.locf(c(1, NA, NA, NA, 2, 2, NA, 3, NA, 4)) #) # my implementation is 6 times faster :) #microbenchmark( # na_locf = na_locf(rep(c(1, NA, NA, NA, 2, 2, NA, 3, NA, 4), 1000)), # na.locf = na.locf(rep(c(1, NA, NA, NA, 2, 2, NA, 3, NA, 4), 1000)) # ) # my implementation is 3 times faster ## End(Not run)
Counts the number of leaves in a tree (dendrogram or hclust).
nleaves(x, ...) ## Default S3 method: nleaves(x, ...) ## S3 method for class 'dendrogram' nleaves(x, method = c("members", "order"), ...) ## S3 method for class 'dendlist' nleaves(x, ...) ## S3 method for class 'hclust' nleaves(x, ...) ## S3 method for class 'phylo' nleaves(x, ...)
nleaves(x, ...) ## Default S3 method: nleaves(x, ...) ## S3 method for class 'dendrogram' nleaves(x, method = c("members", "order"), ...) ## S3 method for class 'dendlist' nleaves(x, ...) ## S3 method for class 'hclust' nleaves(x, ...) ## S3 method for class 'phylo' nleaves(x, ...)
x |
tree object (dendrogram/hclust/phylo,dendlist) |
... |
not used |
method |
a character scalar (default is "members"). If "order" than nleaves is based on length of order.dendrogram. If "members", than length is trusting what is written in the dendrogram's root attr. "members" is about 4 times faster than "order". |
The idea for the name is from functions like ncol, and nrow.
Also, it is worth noting that the nleaves.dendrogram is based on order.dendrogram instead of labels.dendrogram since the first is MUCH faster than the later.
The phylo method is based on turning the phylo to hclust and than to dendrogram. It may not work for complex phylo trees.
The number of leaves in the tree
hc <- hclust(dist(USArrests[1:5, ]), "ave") dend <- as.dendrogram(hc) nleaves(dend) # 5 nleaves(hc) # 5
hc <- hclust(dist(USArrests[1:5, ]), "ave") dend <- as.dendrogram(hc) nleaves(dend) # 5 nleaves(hc) # 5
Counts the number of nodes in a tree (dendrogram, hclust, phylo).
nnodes(x, ...) ## Default S3 method: nnodes(x, ...) ## S3 method for class 'dendrogram' nnodes(x, ...) ## S3 method for class 'hclust' nnodes(x, ...) ## S3 method for class 'phylo' nnodes(x, ...)
nnodes(x, ...) ## Default S3 method: nnodes(x, ...) ## S3 method for class 'dendrogram' nnodes(x, ...) ## S3 method for class 'hclust' nnodes(x, ...) ## S3 method for class 'phylo' nnodes(x, ...)
x |
tree object (dendrogram or hclust) |
... |
not used |
The idea for the name is from functions like ncol, and nrow.
The phylo method is based on turning the phylo to hclust and than to dendrogram. It may not work for complex phylo trees.
The number of leaves in the tree
nrow, count_terminal_nodes, nleaves
hc <- hclust(dist(USArrests[1:5, ]), "ave") dend <- as.dendrogram(hc) nnodes(dend) # 9 nnodes(hc) # 9
hc <- hclust(dist(USArrests[1:5, ]), "ave") dend <- as.dendrogram(hc) nnodes(dend) # 9 nnodes(hc) # 9
Goes through a tree's nodes in order to return a vector with whether (TRUE/FALSE) each node satisies some condition (function)
noded_with_condition( dend, condition, include_leaves = TRUE, include_branches = TRUE, na.rm = FALSE, ... )
noded_with_condition( dend, condition, include_leaves = TRUE, include_branches = TRUE, na.rm = FALSE, ... )
dend |
a dendrogram dend |
condition |
a function that gets a node and return TRUE or FALSE (based on whether or not that node/tree fulfills the "condition") |
include_leaves |
logical. Should leaves attributes be included as well? |
include_branches |
logical. Should non-leaf (branch node) attributes be included as well? |
na.rm |
logical. Should NA attributes be REMOVED from the resulting vector? |
... |
passed to the condition function |
A logical vector with TRUE/FALSE, specifying for each of the dendrogram's nodes if it fulfills the condition or not.
branches_attr_by_labels, get_leaves_attr, nnodes, nleaves
## Not run: library(dendextend) set.seed(23235) ss <- sample(1:150, 10) # Getting the dend dend dend <- iris[ss, -5] %>% dist() %>% hclust() %>% as.dendrogram() dend %>% plot() # this is the basis for branches_attr_by_labels has_any_labels <- function(sub_dend, the_labels) any(labels(sub_dend) %in% the_labels) cols <- noded_with_condition(dend, has_any_labels, the_labels = c("126", "109", "59") ) %>% ifelse(2, 1) set(dend, "branches_col", cols) %>% plot() # Similar to branches_attr_by_labels - but for heights! high_enough <- function(sub_dend, height) attr(sub_dend, "height") > height cols <- noded_with_condition(dend, high_enough, height = 1) %>% ifelse(2, 1) set(dend, "branches_col", cols) %>% plot() ## End(Not run)
## Not run: library(dendextend) set.seed(23235) ss <- sample(1:150, 10) # Getting the dend dend dend <- iris[ss, -5] %>% dist() %>% hclust() %>% as.dendrogram() dend %>% plot() # this is the basis for branches_attr_by_labels has_any_labels <- function(sub_dend, the_labels) any(labels(sub_dend) %in% the_labels) cols <- noded_with_condition(dend, has_any_labels, the_labels = c("126", "109", "59") ) %>% ifelse(2, 1) set(dend, "branches_col", cols) %>% plot() # Similar to branches_attr_by_labels - but for heights! high_enough <- function(sub_dend, height) attr(sub_dend, "height") > height cols <- noded_with_condition(dend, high_enough, height = 1) %>% ifelse(2, 1) set(dend, "branches_col", cols) %>% plot() ## End(Not run)
order.dendrogram<- assignment operator. This is useful in cases where some object is turned into a dendrogram but its leaves values (the order) are all mixed up.
order.dendrogram(object, ...) <- value
order.dendrogram(object, ...) <- value
object |
a variable name (possibly quoted) who's label are to be updated |
... |
parameters passed (not currently in use) |
value |
a value to be assigned to object's leaves value (their "order") |
dendrogram with updated order leaves values
################ # Example for using the assignment with dendrogram and hclust objects: hc <- hclust(dist(USArrests[1:4, ]), "ave") dend <- as.dendrogram(hc) str(dend) order.dendrogram(dend) # 4 3 1 2 order.dendrogram(dend) <- 1:4 order.dendrogram(dend) # 1 2 3 4 str(dend) # the structure is still fine. # This function is very useful if we try playing with subtrees # For example: hc <- hclust(dist(USArrests[1:6, ]), "ave") dend <- as.dendrogram(hc) sub_dend <- dend[[1]] order.dendrogram(sub_dend) # 4 6 # now using as.hclust(sub_dend) will cause trouble: # labels(as.hclust(sub_dend)) # As of R 3.1.1-patched - this will produce an Error (as it should) :) # let's fix it: order.dendrogram(sub_dend) <- rank(order.dendrogram(sub_dend), ties.method = "first") labels(as.hclust(sub_dend)) # We now have labels :)
################ # Example for using the assignment with dendrogram and hclust objects: hc <- hclust(dist(USArrests[1:4, ]), "ave") dend <- as.dendrogram(hc) str(dend) order.dendrogram(dend) # 4 3 1 2 order.dendrogram(dend) <- 1:4 order.dendrogram(dend) # 1 2 3 4 str(dend) # the structure is still fine. # This function is very useful if we try playing with subtrees # For example: hc <- hclust(dist(USArrests[1:6, ]), "ave") dend <- as.dendrogram(hc) sub_dend <- dend[[1]] order.dendrogram(sub_dend) # 4 6 # now using as.hclust(sub_dend) will cause trouble: # labels(as.hclust(sub_dend)) # As of R 3.1.1-patched - this will produce an Error (as it should) :) # let's fix it: order.dendrogram(sub_dend) <- rank(order.dendrogram(sub_dend), ties.method = "first") labels(as.hclust(sub_dend)) # We now have labels :)
Ordering of the Leaves in a hclust Dendrogram. Like order.dendrogram.
order.hclust(x, ...)
order.hclust(x, ...)
x |
ab hclust object a distance matrix. |
... |
Ignored. |
A vector with length equal to the number of leaves in the hclust dendrogram is returned. From r <- order.hclust(), each element is the index into the original data (from which the hclust was computed).
set.seed(23235) ss <- sample(1:150, 10) hc <- iris[ss, -5] %>% dist() %>% hclust() # dend <- hc %>% as.dendrogram order.hclust(hc)
set.seed(23235) ss <- sample(1:150, 10) hc <- iris[ss, -5] %>% dist() %>% hclust() # dend <- hc %>% as.dendrogram order.hclust(hc)
Returns the set of all bipartitions from all edges, that is: a list with the labels for each of the nodes in the dendrogram.
partition_leaves(dend, ...)
partition_leaves(dend, ...)
dend |
a dendrogram |
... |
Ignored. |
A list with the labels for each of the nodes in the dendrogram.
A dendrogram implementation for partition.leaves from the distory package
distinct_edges, highlight_distinct_edges, dist.dendlist, tanglegram, partition.leaves
x <- 1:3 %>% dist() %>% hclust() %>% as.dendrogram() plot(x) partition_leaves(x) ## Not run: set.seed(23235) ss <- sample(1:150, 10) dend1 <- iris[ss, -5] %>% dist() %>% hclust("com") %>% as.dendrogram() dend2 <- iris[ss, -5] %>% dist() %>% hclust("single") %>% as.dendrogram() partition_leaves(dend1) partition_leaves(dend2) ## End(Not run)
x <- 1:3 %>% dist() %>% hclust() %>% as.dendrogram() plot(x) partition_leaves(x) ## Not run: set.seed(23235) ss <- sample(1:150, 10) dend1 <- iris[ss, -5] %>% dist() %>% hclust("com") %>% as.dendrogram() dend2 <- iris[ss, -5] %>% dist() %>% hclust("single") %>% as.dendrogram() partition_leaves(dend1) partition_leaves(dend2) ## End(Not run)
The default plot(dend, horiz = TRUE)
, gives us a dendrogram tree plot
with the tips turned right. The current function enables the creation of
the same tree, but with the tips turned left. The main challange in doing this
is finding the distance of the labels from the leaves tips - which is solved
with this function.
plot_horiz.dendrogram( x, type = c("rectangle", "triangle"), center = FALSE, edge.root = is.leaf(x) || !is.null(attr(x, "edgetext")), dLeaf = NULL, horiz = TRUE, xaxt = "n", yaxt = "s", xlim = NULL, ylim = NULL, nodePar = NULL, edgePar = list(), leaflab = c("perpendicular", "textlike", "none"), side = TRUE, text_pos = 2, ... )
plot_horiz.dendrogram( x, type = c("rectangle", "triangle"), center = FALSE, edge.root = is.leaf(x) || !is.null(attr(x, "edgetext")), dLeaf = NULL, horiz = TRUE, xaxt = "n", yaxt = "s", xlim = NULL, ylim = NULL, nodePar = NULL, edgePar = list(), leaflab = c("perpendicular", "textlike", "none"), side = TRUE, text_pos = 2, ... )
x |
tree object (dendrogram) |
type |
a character vector with either "rectangle" or "triangle" (passed to plot.dendrogram) |
center |
logical; if TRUE, nodes are plotted centered with respect to the leaves in the branch. Otherwise (default), plot them in the middle of all direct child nodes. |
edge.root |
logical; if true, draw an edge to the root node. |
dLeaf |
a number specifying the distance in user coordinates between the tip of a leaf and its label. If NULL as per default, 3/4 of a letter width is used. |
horiz |
logical indicating if the dendrogram should be drawn horizontally or not. In this function it MUST be TRUE! |
xaxt |
graphical parameters, or arguments for other methods. |
yaxt |
graphical parameters, or arguments for other methods. |
xlim |
(NULL) optional x- and y-limits of the plot, passed to plot.default. The defaults for these show the full dendrogram. |
ylim |
(NULL) optional x- and y-limits of the plot, passed to plot.default. The defaults for these show the full dendrogram. |
nodePar |
NULL. |
edgePar |
list() |
leaflab |
c("perpendicular", "textlike", "none") |
side |
logical (TRUE). Should the tips of the drawn tree be facing the left side. This is the important feature of this function. |
text_pos |
integer from either 1 to 4 (2). Two relevant values are 2 and 4. 2 (default) means that the labels are alligned to the tips of the tree leaves. 4 will have the labels allign to the left, making them look like they were when the tree was on the left side (with leaves tips facing to the right). |
... |
passed to plot. |
The invisiable dLeaf value.
This function is based on replicating plot.dendrogram. In fact, I'd be happy if in the future, some tweaks could be make to plot.dendrogram, so that it would replace the need for this function.
## Not run: dend <- USArrests[1:10, ] %>% dist() %>% hclust() %>% as.dendrogram() par(mfrow = c(1, 2), mar = rep(6, 4)) plot_horiz.dendrogram(dend, side = FALSE) plot_horiz.dendrogram(dend, side = TRUE) # plot_horiz.dendrogram(dend, side=TRUE, dLeaf= 0) # plot_horiz.dendrogram(dend, side=TRUE, nodePar = list(pos = 1)) # sadly, lab.pos is not implemented yet, ## so the labels can not be right aligned... plot_horiz.dendrogram(dend, side = F) plot_horiz.dendrogram(dend, side = TRUE, dLeaf = 0, xlim = c(100, -10)) # bad plot_horiz.dendrogram(dend, side = TRUE, text_offset = 0) plot_horiz.dendrogram(dend, side = TRUE, text_offset = 0, text_pos = 4) ## End(Not run)
## Not run: dend <- USArrests[1:10, ] %>% dist() %>% hclust() %>% as.dendrogram() par(mfrow = c(1, 2), mar = rep(6, 4)) plot_horiz.dendrogram(dend, side = FALSE) plot_horiz.dendrogram(dend, side = TRUE) # plot_horiz.dendrogram(dend, side=TRUE, dLeaf= 0) # plot_horiz.dendrogram(dend, side=TRUE, nodePar = list(pos = 1)) # sadly, lab.pos is not implemented yet, ## so the labels can not be right aligned... plot_horiz.dendrogram(dend, side = F) plot_horiz.dendrogram(dend, side = TRUE, dLeaf = 0, xlim = c(100, -10)) # bad plot_horiz.dendrogram(dend, side = TRUE, text_offset = 0) plot_horiz.dendrogram(dend, side = TRUE, text_offset = 0, text_pos = 4) ## End(Not run)
Trimms a tree (dendrogram, hclust) from a set of leaves based on their labels.
prune(dend, ...) ## Default S3 method: prune(dend, ...) ## S3 method for class 'dendrogram' prune(dend, leaves, reindex_dend = TRUE, ...) ## S3 method for class 'hclust' prune(dend, leaves, ...) ## S3 method for class 'phylo' prune(dend, ...) ## S3 method for class 'rpart' prune(dend, ...)
prune(dend, ...) ## Default S3 method: prune(dend, ...) ## S3 method for class 'dendrogram' prune(dend, leaves, reindex_dend = TRUE, ...) ## S3 method for class 'hclust' prune(dend, leaves, ...) ## S3 method for class 'phylo' prune(dend, ...) ## S3 method for class 'rpart' prune(dend, ...)
dend |
tree object (dendrogram/hclust/phylo) |
... |
passed on |
leaves |
a character vector of the label(S) of the tip(s) (leaves) we wish to prune off the tree. |
reindex_dend |
logical (default is TRUE). If TRUE, the leaves of the new dendrograms include the rank of the old order.dendrogram. This insures that their values are just like the number of leaves. When FALSE, the values in the leaves is that of the original dendrogram. Thie is useful if prunning a dendrogram but then wanting to use order.dendrogram with the original values. When using prune.hclust, then reindex_dend is used by default since otherwise the as.hclust function would return an error. |
I was not sure if to call this function drop.tip (from ape), snip/prune (from rpart) or just remove.leaves. I ended up deciding on prune.
A pruned tree
hc <- hclust(dist(USArrests[1:5, ]), "ave") dend <- as.dendrogram(hc) par(mfrow = c(1, 2)) plot(dend, main = "original tree") plot(prune(dend, c("Alaska", "California")), main = "tree without Alaska and California") # this works because prune uses reindex_dend = TRUE by default as.hclust(prune(dend, c("Alaska", "California"))) prune(hc, c("Alaska", "California"))
hc <- hclust(dist(USArrests[1:5, ]), "ave") dend <- as.dendrogram(hc) par(mfrow = c(1, 2)) plot(dend, main = "original tree") plot(prune(dend, c("Alaska", "California")), main = "tree without Alaska and California") # this works because prune uses reindex_dend = TRUE by default as.hclust(prune(dend, c("Alaska", "California"))) prune(hc, c("Alaska", "California"))
Prune trees to their common subtrees
prune_common_subtrees.dendlist(dend, ...)
prune_common_subtrees.dendlist(dend, ...)
dend |
a dendlist of length two |
... |
ignored |
A dendlist after prunning the labels to only include those that are part of common subtrees in both dendrograms.
# NULL
# NULL
Trims (prunes) one leaf from a dendrogram.
prune_leaf(dend, leaf_name, ...)
prune_leaf(dend, leaf_name, ...)
dend |
dendrogram object |
leaf_name |
a character string as the label of the tip we wish to prune |
... |
passed on |
Used through prune
A dendrogram with a leaf pruned
hc <- hclust(dist(USArrests[1:5, ]), "ave") dend <- as.dendrogram(hc) par(mfrow = c(1, 2)) plot(dend, main = "original tree") plot(prune_leaf(dend, "Alaska"), main = "tree without Alaska")
hc <- hclust(dist(USArrests[1:5, ]), "ave") dend <- as.dendrogram(hc) par(mfrow = c(1, 2)) plot(dend, main = "original tree") plot(prune_leaf(dend, "Alaska"), main = "tree without Alaska")
Get pvclust edges information such as au and bp and return dataframe with proper sample labels. This function is useful when there are a lot of samples involved.
pvclust_edges(pvclust_obj)
pvclust_edges(pvclust_obj)
pvclust_obj |
pvclust object |
data.frame with leaves on column 1 and 2, followed by the rest of the information from edge
hclust object descriptions https://stat.ethz.ch/R-manual/R-patched/library/stats/html/hclust.html
## Not run: library(pvclust) data(lung) # 916 genes for 73 subjects set.seed(13134) result <- pvclust(lung[, 1:20], method.dist = "cor", method.hclust = "average", nboot = 100) pvclust_edges(result) ## End(Not run)
## Not run: library(pvclust) data(lung) # 916 genes for 73 subjects set.seed(13134) result <- pvclust(lung[, 1:20], method.dist = "cor", method.hclust = "average", nboot = 100) pvclust_edges(result) ## End(Not run)
Shows the significant branches in a dendrogram, based on a pvclust object
pvclust_show_signif( dend, pvclust_obj, signif_type = c("bp", "au"), alpha = 0.05, signif_value = c(5, 1), show_type = c("lwd", "col"), ... )
pvclust_show_signif( dend, pvclust_obj, signif_type = c("bp", "au"), alpha = 0.05, signif_value = c(5, 1), show_type = c("lwd", "col"), ... )
dend |
a dendrogram object |
pvclust_obj |
a pvclust object |
signif_type |
a character scalar (either "bp" or "au"), indicating which of the two should be used to update the dendrogram. |
alpha |
a number between 0 to 1, default is .05. Indicates what is the cutoff from which branches will be updated. |
signif_value |
a 2d vector (deafult: c(5,1)), with the first element tells us what the significant branches will get, and the second element which value the non-significant branches will get. |
show_type |
a character scalar (either "lwd" or "col"), indicating which parameter of the branches should be updated based on significance. |
... |
not used |
A dendrogram with updated branches
pvclust_show_signif, pvclust_show_signif_gradient
## Not run: library(pvclust) data(lung) # 916 genes for 73 subjects set.seed(13134) result <- pvclust(lung[, 1:20], method.dist = "cor", method.hclust = "average", nboot = 100) dend <- as.dendrogram(result) result %>% as.dendrogram() %>% hang.dendrogram() %>% plot(main = "Cluster dendrogram with AU/BP values (%)") result %>% text() result %>% pvrect(alpha = 0.95) dend %>% pvclust_show_signif(result) %>% plot() dend %>% pvclust_show_signif(result, show_type = "lwd") %>% plot() result %>% text() result %>% pvrect(alpha = 0.95) dend %>% pvclust_show_signif_gradient(result) %>% plot() dend %>% pvclust_show_signif_gradient(result) %>% pvclust_show_signif(result) %>% plot(main = "Cluster dendrogram with AU/BP values (%)\n bp values are highlighted by signif") result %>% text() result %>% pvrect(alpha = 0.95) ## End(Not run)
## Not run: library(pvclust) data(lung) # 916 genes for 73 subjects set.seed(13134) result <- pvclust(lung[, 1:20], method.dist = "cor", method.hclust = "average", nboot = 100) dend <- as.dendrogram(result) result %>% as.dendrogram() %>% hang.dendrogram() %>% plot(main = "Cluster dendrogram with AU/BP values (%)") result %>% text() result %>% pvrect(alpha = 0.95) dend %>% pvclust_show_signif(result) %>% plot() dend %>% pvclust_show_signif(result, show_type = "lwd") %>% plot() result %>% text() result %>% pvrect(alpha = 0.95) dend %>% pvclust_show_signif_gradient(result) %>% plot() dend %>% pvclust_show_signif_gradient(result) %>% pvclust_show_signif(result) %>% plot(main = "Cluster dendrogram with AU/BP values (%)\n bp values are highlighted by signif") result %>% text() result %>% pvrect(alpha = 0.95) ## End(Not run)
Shows the gradient of significance of branches in a dendrogram, based on a pvclust object
pvclust_show_signif_gradient( dend, pvclust_obj, signif_type = c("bp", "au"), signif_col_fun = colorRampPalette(c("black", "darkred", "red")), ... )
pvclust_show_signif_gradient( dend, pvclust_obj, signif_type = c("bp", "au"), signif_col_fun = colorRampPalette(c("black", "darkred", "red")), ... )
dend |
a dendrogram object |
pvclust_obj |
a pvclust object |
signif_type |
a character scalar (either "bp" or "au"), indicating which of the two should be used to update the dendrogram. |
signif_col_fun |
a function to create colors for the significant gradient. Default is: colorRampPalette(c("black", "darkred", "red")) |
... |
not used |
A dendrogram with updated branches
pvclust_show_signif, pvclust_show_signif_gradient
## Not run: library(pvclust) data(lung) # 916 genes for 73 subjects set.seed(13134) result <- pvclust(lung[, 1:20], method.dist = "cor", method.hclust = "average", nboot = 100) dend <- as.dendrogram(result) result %>% as.dendrogram() %>% hang.dendrogram() %>% plot(main = "Cluster dendrogram with AU/BP values (%)") result %>% text() result %>% pvrect(alpha = 0.95) dend %>% pvclust_show_signif(result) %>% plot() dend %>% pvclust_show_signif(result, show_type = "lwd") %>% plot() result %>% text() result %>% pvrect(alpha = 0.95) dend %>% pvclust_show_signif_gradient(result) %>% plot() dend %>% pvclust_show_signif_gradient(result) %>% pvclust_show_signif(result) %>% plot(main = "Cluster dendrogram with AU/BP values (%)\n bp values are highlighted by signif") result %>% text() result %>% pvrect(alpha = 0.95) ## End(Not run)
## Not run: library(pvclust) data(lung) # 916 genes for 73 subjects set.seed(13134) result <- pvclust(lung[, 1:20], method.dist = "cor", method.hclust = "average", nboot = 100) dend <- as.dendrogram(result) result %>% as.dendrogram() %>% hang.dendrogram() %>% plot(main = "Cluster dendrogram with AU/BP values (%)") result %>% text() result %>% pvrect(alpha = 0.95) dend %>% pvclust_show_signif(result) %>% plot() dend %>% pvclust_show_signif(result, show_type = "lwd") %>% plot() result %>% text() result %>% pvrect(alpha = 0.95) dend %>% pvclust_show_signif_gradient(result) %>% plot() dend %>% pvclust_show_signif_gradient(result) %>% pvclust_show_signif(result) %>% plot(main = "Cluster dendrogram with AU/BP values (%)\n bp values are highlighted by signif") result %>% text() result %>% pvrect(alpha = 0.95) ## End(Not run)
Draws rectangles around the branches of a dendrogram highlighting the corresponding clusters with low p-values. This is based on pvrect, allowing to draw the rects till the bottom of the labels.
pvrect2( x, alpha = 0.95, pv = "au", type = "geq", max.only = TRUE, border = 2, xpd = TRUE, lower_rect, ... )
pvrect2( x, alpha = 0.95, pv = "au", type = "geq", max.only = TRUE, border = 2, xpd = TRUE, lower_rect, ... )
x |
object of class pvclust. |
alpha |
threshold value for p-values., Default: 0.95 |
pv |
character string which specifies the p-value to be used. It should be either of "au" or "bp", corresponding to AU p-value or BP value, respectively. See plot.pvclust for details. , Default: 'au' |
type |
one of "geq", "leq", "gt" or "lt". If "geq" is specified, clusters with p-value greater than or equals the threshold given by "alpha" are returned or displayed. Likewise "leq" stands for lower than or equals, "gt" for greater than and "lt" for lower than the threshold value. The default is "geq"., Default: 'geq' |
max.only |
logical. If some of clusters with high/low p-values have inclusion relation, only the largest cluster is returned (or displayed) when max.only=TRUE., Default: TRUE |
border |
numeric value which specifies the color of borders of rectangles., Default: 2 |
xpd |
A logical value (or NA.), passed to par. Default is TRUE, in order to allow the rect to be below the labels. If FALSE, all plotting is clipped to the plot region, if TRUE, all plotting is clipped to the figure region, and if NA, all plotting is clipped to the device region. See also clip., Default: TRUE |
lower_rect |
a (scalar) value of how low should the lower part of the rect be. If missing, it will take the value of par("usr")[3L] (or par("usr")[2L], depending if horiz = TRUE or not), with also the width of the labels. (notice that we would like to keep xpd = TRUE if we want the rect to be after the labels!) You can use a value such as 0, to get the rect above the labels. |
... |
passed to rect |
## Not run: library(dendextend) library(pvclust) data(lung) # 916 genes for 73 subjects set.seed(13134) result <- pvclust(lung[, 1:20], method.dist = "cor", method.hclust = "average", nboot = 10) par(mar = c(9, 2.5, 2, 0)) dend <- as.dendrogram(result) dend %>% pvclust_show_signif(result, signif_value = c(3, .5)) %>% pvclust_show_signif(result, signif_value = c("black", "grey"), show_type = "col") %>% plot(main = "Cluster dendrogram with AU/BP values (%)") pvrect2(result, alpha = 0.95) # getting the rects to the tips / above the labels pvrect2(result, lower_rect = .15, border = 4, alpha = 0.95, lty = 2) # Original function # pvrect(result, alpha=0.95) text(result, alpha = 0.95) ## End(Not run)
## Not run: library(dendextend) library(pvclust) data(lung) # 916 genes for 73 subjects set.seed(13134) result <- pvclust(lung[, 1:20], method.dist = "cor", method.hclust = "average", nboot = 10) par(mar = c(9, 2.5, 2, 0)) dend <- as.dendrogram(result) dend %>% pvclust_show_signif(result, signif_value = c(3, .5)) %>% pvclust_show_signif(result, signif_value = c("black", "grey"), show_type = "col") %>% plot(main = "Cluster dendrogram with AU/BP values (%)") pvrect2(result, alpha = 0.95) # getting the rects to the tips / above the labels pvrect2(result, lower_rect = .15, border = 4, alpha = 0.95, lty = 2) # Original function # pvrect(result, alpha=0.95) text(result, alpha = 0.95) ## End(Not run)
Raise the height of nodes in a dendrogram tree.
raise.dendrogram(dend, heiget_to_add, ...)
raise.dendrogram(dend, heiget_to_add, ...)
dend |
dendrogram object |
heiget_to_add |
how much height to add to all the branches (not leaves) in the dendrogram |
... |
passed on (not used) |
A raised dendrogram
hc <- hclust(dist(USArrests[2:9, ]), "com") dend <- as.dendrogram(hc) par(mfrow = c(1, 2)) plot(dend, main = "original tree") plot(raise.dendrogram(dend, 100), main = "Raised tree")
hc <- hclust(dist(USArrests[2:9, ]), "com") dend <- as.dendrogram(hc) par(mfrow = c(1, 2)) plot(dend, main = "original tree") plot(raise.dendrogram(dend, 100), main = "Raised tree")
Adjust the height attr in all of the dendrogram nodes so that the tree will have a distance of 1 unit between each parent/child nodes. It can be thought of as ranking the branches between themselves.
This is intended for easier comparison of the topology of two trees.
Notice that this function changes the height of all the leaves into 0, thus erasing the effect of hang.dendrogram (which should be run again, if that is the visualization you are intereted in).
rank_branches(dend, diff_height = 1, ...)
rank_branches(dend, diff_height = 1, ...)
dend |
a dendrogram object |
diff_height |
Numeric scalar (1). Affects the difference in height between two branches. |
... |
not used |
A dendrogram, after adjusting the height attr in all of its branches.
get_branches_heights, get_childrens_heights, hang.dendrogram, tanglegram
# define dendrogram object to play with: dend <- USArrests[1:5, ] %>% dist() %>% hclust() %>% as.dendrogram() par(mfrow = c(1, 3)) plot(dend) plot(rank_branches(dend)) plot(hang.dendrogram(rank_branches(dend)))
# define dendrogram object to play with: dend <- USArrests[1:5, ] %>% dist() %>% hclust() %>% as.dendrogram() par(mfrow = c(1, 3)) plot(dend) plot(rank_branches(dend)) plot(hang.dendrogram(rank_branches(dend)))
Generally, leaves order value should be a sequence of integer values. From 1 to nleaves(dend). This function fixes trees by using rank on existing leaves order values.
rank_order.dendrogram(dend, ...)
rank_order.dendrogram(dend, ...)
dend |
a dendrogram object |
... |
not used |
A dendrogram, after fixing its leaves order values.
# define dendrogram object to play with: dend <- USArrests[1:4, ] %>% dist() %>% hclust(method = "ave") %>% as.dendrogram() # plot(dend) order.dendrogram(dend) dend2 <- prune(dend, "Alaska") order.dendrogram(dend2) order.dendrogram(rank_order.dendrogram(dend2))
# define dendrogram object to play with: dend <- USArrests[1:4, ] %>% dist() %>% hclust(method = "ave") %>% as.dendrogram() # plot(dend) order.dendrogram(dend) dend2 <- prune(dend, "Alaska") order.dendrogram(dend2) order.dendrogram(rank_order.dendrogram(dend2))
Rank a vector based on clusters
rank_values_with_clusters(x, ignore0 = FALSE, ...)
rank_values_with_clusters(x, ignore0 = FALSE, ...)
x |
numeric vector |
ignore0 |
logical (FALSE). If TRUE, will ignore the 0's in the vector |
... |
not used |
an integer vector with the number of unique values as the number of uniques in the original vector. And the values are ranked from 1 (in the beginning of the vector) to the number of unique clusters.
rank_values_with_clusters(c(1, 2, 3)) rank_values_with_clusters(c(1, 1, 3)) rank_values_with_clusters(c(0.1, 0.1, 3000)) rank_values_with_clusters(c(3, 1, 2)) rank_values_with_clusters(c(1, 3, 3, 3, 3, 3, 3, 4, 2, 2)) rank_values_with_clusters(c(3, 1, 2), ignore0 = TRUE) rank_values_with_clusters(c(3, 1, 2), ignore0 = FALSE) rank_values_with_clusters(c(3, 1, 0, 2), ignore0 = TRUE) rank_values_with_clusters(c(3, 1, 0, 2), ignore0 = FALSE)
rank_values_with_clusters(c(1, 2, 3)) rank_values_with_clusters(c(1, 1, 3)) rank_values_with_clusters(c(0.1, 0.1, 3000)) rank_values_with_clusters(c(3, 1, 2)) rank_values_with_clusters(c(1, 3, 3, 3, 3, 3, 3, 4, 2, 2)) rank_values_with_clusters(c(3, 1, 2), ignore0 = TRUE) rank_values_with_clusters(c(3, 1, 2), ignore0 = FALSE) rank_values_with_clusters(c(3, 1, 0, 2), ignore0 = TRUE) rank_values_with_clusters(c(3, 1, 0, 2), ignore0 = FALSE)
Draws rectangles around the branches of a dendrogram highlighting the corresponding clusters. First the dendrogram is cut at a certain level, then a rectangle is drawn around selected branches.
rect.dendrogram( tree, k = NULL, which = NULL, x = NULL, h = NULL, border = 2, cluster = NULL, horiz = FALSE, density = NULL, angle = 45, text = NULL, text_cex = 1, text_col = 1, xpd = TRUE, lower_rect, upper_rect = 0, prop_k_height = 0.5, stop_if_out = FALSE, ... )
rect.dendrogram( tree, k = NULL, which = NULL, x = NULL, h = NULL, border = 2, cluster = NULL, horiz = FALSE, density = NULL, angle = 45, text = NULL, text_cex = 1, text_col = 1, xpd = TRUE, lower_rect, upper_rect = 0, prop_k_height = 0.5, stop_if_out = FALSE, ... )
tree |
a dendrogram object. |
k |
Scalar. Cut the dendrogram such that exactly k clusters (if possible) are produced. |
which |
A vector selecting the clusters around which a rectangle should be drawn. which selects clusters by number (from left to right in the tree), Default is which = 1:k. |
x |
A vector selecting the clusters around which a rectangle should be drawn. x selects clusters containing the respective horizontal coordinates. |
h |
Scalar. Cut the dendrogram by cutting at height h. (k overrides h) |
border |
Vector with border colors for the rectangles. |
cluster |
Optional vector with cluster memberships as returned by cutree(dend_obj, k = k), can be specified for efficiency if already computed. |
horiz |
logical (FALSE), indicating if the rectangles should be drawn horizontally or not (for when using plot(dend, horiz = TRUE) ) . |
density |
Passed to rect: the density of shading lines, in lines per inch. The default value of NULL means that no shading lines are drawn. A zero value of density means no shading lines whereas negative values (and NA) suppress shading (and so allow color filling). If border is a vector of colors, the color of density will default to 1. |
angle |
Passed to rect: angle (in degrees) of the shading lines. (default is 45) |
text |
a character vector of labels to plot underneath the clusters. When NULL (default), no text is displayed. |
text_cex |
a numeric (scalar) value of the text's cex value. |
text_col |
a (scalar) value of the text's col(or) value. |
xpd |
A logical value (or NA.), passed to par. Default is TRUE, in order to allow the rect to be below the labels. If FALSE, all plotting is clipped to the plot region, if TRUE, all plotting is clipped to the figure region, and if NA, all plotting is clipped to the device region. See also clip. |
lower_rect |
a (scalar) value of how low should the lower part of the rect be. If missing, it will take the value of par("usr")[3L] (or par("usr")[2L], depending if horiz = TRUE or not), with also the width of the labels. (notice that we would like to keep xpd = TRUE if we want the rect to be after the labels!) You can use a value such as 0, to get the rect above the labels. Notice that for a plot with small margins, it would be better to set this parameter manually. |
upper_rect |
a (scalar) value to add (default is 0) to how high should the upper part of the rect be. |
prop_k_height |
a (scalar) value (should be between 0 to 1), indicating what proportion of the height our rect will be between the height needed for k and k+1 clustering. |
stop_if_out |
logical (default is TRUE). This makes the function stop if k of the locator is outside the range (this default reproduces the behavior of the rect.hclust function). |
... |
parameters passed to rect (such as lwd, lty, etc.) |
(Invisibly) returns a list where each element contains a vector of data points contained in the respective cluster.
This function is based on rect.hclust, with slight modifications to have it work with a dendrogram, as well as a few added features (e.g: ... to rect, and horiz)
The idea of adding text and shading lines under the clusters comes from skullkey from here: https://stackoverflow.com/questions/4720307/change-dendrogram-leaves
rect.hclust, order.dendrogram, cutree.dendrogram
set.seed(23235) ss <- sample(1:150, 10) hc <- iris[ss, -5] %>% dist() %>% hclust() dend <- hc %>% as.dendrogram() plot(dend) rect.dendrogram(dend, 2, border = 2) rect.dendrogram(dend, 3, border = 4) Vectorize(rect.dendrogram, "k")(dend, 4:5, border = 6) plot(dend) rect.dendrogram(dend, 3, border = 1:3, density = 2, text = c("1", "b", "miao"), text_cex = 3 ) plot(dend) rect.dendrogram(dend, 4, which = c(1, 3), border = c(2, 3)) rect.dendrogram(dend, 4, x = 5, border = c(4)) rect.dendrogram(dend, 3, border = 3, lwd = 2, lty = 2) # now THIS, you can not do with the old rect.hclust plot(dend, horiz = TRUE) rect.dendrogram(dend, 2, border = 2, horiz = TRUE) rect.dendrogram(dend, 4, border = 4, lty = 2, lwd = 3, horiz = TRUE) # This had previously failed since it worked with a wrong k. dend15 <- c(1:5) %>% dist() %>% hclust(method = "average") %>% as.dendrogram() # dend15 <- c(1:25) %>% dist %>% hclust(method = "average") %>% as.dendrogram dend15 %>% set("branches_k_color") %>% plot() dend15 %>% rect.dendrogram( k = 3, border = 8, lty = 5, lwd = 2 )
set.seed(23235) ss <- sample(1:150, 10) hc <- iris[ss, -5] %>% dist() %>% hclust() dend <- hc %>% as.dendrogram() plot(dend) rect.dendrogram(dend, 2, border = 2) rect.dendrogram(dend, 3, border = 4) Vectorize(rect.dendrogram, "k")(dend, 4:5, border = 6) plot(dend) rect.dendrogram(dend, 3, border = 1:3, density = 2, text = c("1", "b", "miao"), text_cex = 3 ) plot(dend) rect.dendrogram(dend, 4, which = c(1, 3), border = c(2, 3)) rect.dendrogram(dend, 4, x = 5, border = c(4)) rect.dendrogram(dend, 3, border = 3, lwd = 2, lty = 2) # now THIS, you can not do with the old rect.hclust plot(dend, horiz = TRUE) rect.dendrogram(dend, 2, border = 2, horiz = TRUE) rect.dendrogram(dend, 4, border = 4, lty = 2, lwd = 3, horiz = TRUE) # This had previously failed since it worked with a wrong k. dend15 <- c(1:5) %>% dist() %>% hclust(method = "average") %>% as.dendrogram() # dend15 <- c(1:25) %>% dist %>% hclust(method = "average") %>% as.dendrogram dend15 %>% set("branches_k_color") %>% plot() dend15 %>% rect.dendrogram( k = 3, border = 8, lty = 5, lwd = 2 )
prune_leaf
does not update leaf indices as it prune
leaves. As a result, some leaves of the pruned dendrogram may have leaf
indeices larger than the number of leaves in the pruned dendrogram, which may
cause errors in downstream functions such as as.hclust
.
This function re-indexes the leaves such that the leaf indices are no larger than the total number of leaves.
reindex_dend(dend)
reindex_dend(dend)
dend |
dendrogram object |
A dendrogram
object with the leaf reindexed
hc <- hclust(dist(USArrests[1:5, ]), "ave") dend <- as.dendrogram(hc) dend_pruned <- prune(dend, c("Alaska", "California"), reindex_dend = FALSE) ## A leave have an index larger than the number of leaves: unlist(dend_pruned) # [1] 4 3 1 #' dend_pruned_reindexed <- reindex_dend(dend_pruned) ## All leaf indices are no larger than the number of leaves: unlist(dend_pruned_reindexed) # [1] 3 2 1 ## The dendrograms are equal: all.equal(dend_pruned, dend_pruned_reindexed) # TRUE
hc <- hclust(dist(USArrests[1:5, ]), "ave") dend <- as.dendrogram(hc) dend_pruned <- prune(dend, c("Alaska", "California"), reindex_dend = FALSE) ## A leave have an index larger than the number of leaves: unlist(dend_pruned) # [1] 4 3 1 #' dend_pruned_reindexed <- reindex_dend(dend_pruned) ## All leaf indices are no larger than the number of leaves: unlist(dend_pruned_reindexed) # [1] 3 2 1 ## The dendrograms are equal: all.equal(dend_pruned, dend_pruned_reindexed) # TRUE
Go through the dendrogram branches and remove its edgePar.
remove_branches_edgePar(dend, ...)
remove_branches_edgePar(dend, ...)
dend |
a dendrogram object |
... |
not used |
A dendrogram, after removing the edgePar attribute in all of its branches,
get_root_branches_attr, assign_values_to_branches_edgePar
## Not run: dend <- USArrests[1:5, ] %>% dist() %>% hclust() %>% as.dendrogram() dend <- color_branches(dend, 3) par(mfrow = c(1, 2)) plot(dend) plot(remove_branches_edgePar(dend)) ## End(Not run)
## Not run: dend <- USArrests[1:5, ] %>% dist() %>% hclust() %>% as.dendrogram() dend <- color_branches(dend, 3) par(mfrow = c(1, 2)) plot(dend) plot(remove_branches_edgePar(dend)) ## End(Not run)
Go through the dendrogram leaves and remove its nodePar.
remove_leaves_nodePar(dend, ...)
remove_leaves_nodePar(dend, ...)
dend |
a dendrogram object |
... |
not used |
A dendrogram, after removing the nodePar attribute in all of its leaves,
get_leaves_attr, assign_values_to_leaves_nodePar
## Not run: dend <- USArrests[1:5, ] %>% dist() %>% hclust() %>% as.dendrogram() dend <- color_labels(dend, 3) par(mfrow = c(1, 2)) plot(dend) plot(remove_leaves_nodePar(dend)) get_leaves_attr(dend, "nodePar") get_leaves_attr(remove_leaves_nodePar(dend), "nodePar") ## End(Not run)
## Not run: dend <- USArrests[1:5, ] %>% dist() %>% hclust() %>% as.dendrogram() dend <- color_labels(dend, 3) par(mfrow = c(1, 2)) plot(dend) plot(remove_leaves_nodePar(dend)) get_leaves_attr(dend, "nodePar") get_leaves_attr(remove_leaves_nodePar(dend), "nodePar") ## End(Not run)
Go through the dendrogram nodes and remove its nodePar
remove_nodes_nodePar(dend, ...)
remove_nodes_nodePar(dend, ...)
dend |
a dendrogram object |
... |
not used |
A dendrogram, after removing the nodePar attribute in all of its nodes,
get_root_branches_attr, assign_values_to_branches_edgePar
## Not run: dend <- USArrests[1:5, ] %>% dist() %>% hclust() %>% as.dendrogram() dend <- color_branches(dend, 3) par(mfrow = c(1, 2)) plot(dend) plot(remove_branches_edgePar(dend)) ## End(Not run)
## Not run: dend <- USArrests[1:5, ] %>% dist() %>% hclust() %>% as.dendrogram() dend <- color_branches(dend, 3) par(mfrow = c(1, 2)) plot(dend) plot(remove_branches_edgePar(dend)) ## End(Not run)
recursivly apply a function on a list - and returns the output as a list,
following the naming convention in the plyr
package
the big difference between this and rapply is that this will also apply
the function on EACH element of the list, even if it's not a "terminal node"
inside the list tree.
An attribute is added to indicate if the value returned is
from a branch or a leaf.
rllply(x, FUN, add_notation = FALSE, ...)
rllply(x, FUN, add_notation = FALSE, ...)
x |
a list. |
FUN |
a function to apply on each element of the list |
add_notation |
logical. Should each node be added a "position_type" attribute, stating if it is a "Branch" or a "Leaf". |
... |
not used. |
a list with ALL of the nodes (from the original "x" list), that FUN was applied on.
## Not run: x <- list(1) x rllply(x, function(x) { x }, add_notation = TRUE) x <- list(1, 2, list(31)) x rllply(x, function(x) { x }, add_notation = TRUE) # the first element is the entire tree # after FUN was applied to its root element. hc <- hclust(dist(USArrests[1:4, ]), "ave") dend <- as.dendrogram(hc) rllply(dend, function(x) { attr(x, "height") }) rllply(dend, function(x) { attr(x, "members") }) ## End(Not run)
## Not run: x <- list(1) x rllply(x, function(x) { x }, add_notation = TRUE) x <- list(1, 2, list(31)) x rllply(x, function(x) { x }, add_notation = TRUE) # the first element is the entire tree # after FUN was applied to its root element. hc <- hclust(dist(USArrests[1:4, ]), "ave") dend <- as.dendrogram(hc) rllply(dend, function(x) { attr(x, "height") }) rllply(dend, function(x) { attr(x, "members") }) ## End(Not run)
Rotates, rev and sort the branches of a tree object (dendrogram, hclust) based on a vector - either of labels order (numbers) or the labels in their new order (character).
rotate(x, ...) ## Default S3 method: rotate(x, order, ...) ## S3 method for class 'dendrogram' rotate(x, order, ...) ## S3 method for class 'hclust' rotate(x, order, ...) ## S3 method for class 'phylo' rotate(x, ..., phy) ## S3 method for class 'dendrogram' sort(x, decreasing = FALSE, type = c("labels", "nodes"), ...) ## S3 method for class 'hclust' sort(x, decreasing = FALSE, ...) ## S3 method for class 'dendlist' sort(x, ...) ## S3 method for class 'hclust' rev(x, ...)
rotate(x, ...) ## Default S3 method: rotate(x, order, ...) ## S3 method for class 'dendrogram' rotate(x, order, ...) ## S3 method for class 'hclust' rotate(x, order, ...) ## S3 method for class 'phylo' rotate(x, ..., phy) ## S3 method for class 'dendrogram' sort(x, decreasing = FALSE, type = c("labels", "nodes"), ...) ## S3 method for class 'hclust' sort(x, decreasing = FALSE, ...) ## S3 method for class 'dendlist' sort(x, ...) ## S3 method for class 'hclust' rev(x, ...)
x |
a tree object (either a |
... |
parameters passed (for example, in case of sort) |
order |
Either numeric or character vector.
Is numeric: it is a numeric vector with the order of the value to be
assigned to object's label. The numbers say are just like when you use order:
which of the items on the tree-plot should be "first" (e.g: most left),
second etc. (this is relevant only to |
phy |
a placeholder in case the user uses "phy =" |
decreasing |
logical. Should the sort be increasing or decreasing? Not available for partial sorting. (relevant only to |
type |
a character indicating how to sort. If "labels" then by lexicographic order of the labels. If "nodes", then by using ladderize (order so that recursively, the leftmost branch will be the smallest) |
The motivation for this function came from the function
order.dendrogram
NOT being very intuitive.
What rotate
aims to do is give a simple tree rotation function which
is based on the order which the user would like to see the tree rotated by
(just as order
works for numeric vectors).
rev.dendrogram
is part of base R, and returns the tree object
after rotating it so that the order of the labels is reversed.
Here we added an S3 method for hclust objects.
The sort
methods sort the labels of the tree (using order
)
and then attempts to rotate the tree to fit that order.
The hclust method of "rotate
" works by first changing the object into
dendrogram, performing the rotation, and then changing it back to hclust.
Special care is taken in preserving some of the properties of the hclust
object.
The ape package has its own rotate
function (which is sadly not S3, so cannot be easily connected with the
current implementation). Still, there is an S3 plug that makes sure people
loading ape first and then dendextend will still be able to
use rotate
without a problem.
Notice that if you first load ape and only then dendextend,
using "rotate" will fail with the error: "Error in rotate(dend, ____) :
object 'phy' is not of class 'phylo'" - this is because rotate in ape
is not S3 and will fail to find the rotate.dendrogram function.
In such a case, simply run unloadNamespace(ape)
. Or, you can run:
unloadNamespace("dendextend"); attachNamespace("dendextend")
The solution for this is that if you have ape installed on your machine,
it will be loaded when you load dendextend (but after it).
This way, rotate
will work fine for both dendrogram and phylo
objects.
A rotated tree object
order.dendrogram
, order
,
rev.dendrogram
, rotate
, ladderize
hc <- hclust(dist(USArrests[c(1, 6, 13, 20, 23), ]), "ave") dend <- as.dendrogram(hc) # For dendrogram objects: labels_colors(dend) <- rainbow(nleaves(dend)) # let's color the labels to make the followup of the rotation easier par(mfrow = c(1, 2)) plot(dend, main = "Original tree") plot(rotate(dend, c(2:5, 1)), main = "Rotates the left most leaf \n into the right side of the tree" ) par(mfrow = c(1, 2)) plot(dend, main = "Original tree") plot(sort(dend), main = "Sorts the labels by alphabetical order \n and rotates the tree to give the best fit possible") par(mfrow = c(1, 2)) plot(dend, main = "Original tree") plot(rev(dend), main = "Reverses the order of the tree labels") # For hclust objects: plot(hc) plot(rotate(hc, c(2:5, 1)), main = "Rotates the left most leaf \n into the right side of the tree") par(mfrow = c(1, 3)) dend %>% plot(main = "Original tree") dend %>% sort() %>% plot(main = "labels sort") dend %>% sort(type = "nodes") %>% plot(main = "nodes (ladderize) sort")
hc <- hclust(dist(USArrests[c(1, 6, 13, 20, 23), ]), "ave") dend <- as.dendrogram(hc) # For dendrogram objects: labels_colors(dend) <- rainbow(nleaves(dend)) # let's color the labels to make the followup of the rotation easier par(mfrow = c(1, 2)) plot(dend, main = "Original tree") plot(rotate(dend, c(2:5, 1)), main = "Rotates the left most leaf \n into the right side of the tree" ) par(mfrow = c(1, 2)) plot(dend, main = "Original tree") plot(sort(dend), main = "Sorts the labels by alphabetical order \n and rotates the tree to give the best fit possible") par(mfrow = c(1, 2)) plot(dend, main = "Original tree") plot(rev(dend), main = "Reverses the order of the tree labels") # For hclust objects: plot(hc) plot(rotate(hc, c(2:5, 1)), main = "Rotates the left most leaf \n into the right side of the tree") par(mfrow = c(1, 3)) dend %>% plot(main = "Original tree") dend %>% sort() %>% plot(main = "labels sort") dend %>% sort(type = "nodes") %>% plot(main = "nodes (ladderize) sort")
Rotates a dendrogram based on its seriation
The function tries to turn the dend into hclust using DendSer.dendrogram (based on DendSer)
Also, if a distance matrix is missing, it will try to use the cophenetic distance.
rotate_DendSer(dend, ser_weight, ...)
rotate_DendSer(dend, ser_weight, ...)
dend |
An object of class dendrogram |
ser_weight |
Used by cost function to evaluate ordering. For cost=costLS, this is a vector of object weights. Otherwise is a dist or symmetric matrix. passed to DendSer.dendrogram and from there to DendSer. If it is missing, the cophenetic distance is used instead. |
... |
parameters passed to DendSer |
Numeric vector giving an optimal dendrogram order
DendSer
, DendSer.dendrogram ,
untangle_DendSer, rotate_DendSer
## Not run: library(DendSer) # already used from within the function dend <- USArrests[1:4, ] %>% dist() %>% hclust("ave") %>% as.dendrogram() DendSer.dendrogram(dend) tanglegram(dend, rotate_DendSer(dend)) ## End(Not run)
## Not run: library(DendSer) # already used from within the function dend <- USArrests[1:4, ] %>% dist() %>% hclust("ave") %>% as.dendrogram() DendSer.dendrogram(dend) tanglegram(dend, rotate_DendSer(dend)) ## End(Not run)
Samples a tree, either by permuting the labels (which is usefull for a permutation test), or by repeated sampling of the same labels (essential for bootstraping when we don't have access to the original data which produced the tree).
Duplicates a leaf in a tree. Useful for non-parametric bootstraping trees since it emulates what would have happened if the tree was constructed based on a row-sample with replacments from the original data matrix.
sample.dendrogram( dend, replace = FALSE, dend_labels, sampled_labels, fix_members = TRUE, fix_order = TRUE, fix_midpoint = TRUE, ... )
sample.dendrogram( dend, replace = FALSE, dend_labels, sampled_labels, fix_members = TRUE, fix_order = TRUE, fix_midpoint = TRUE, ... )
dend |
a dendrogram object |
replace |
logical (FALSE). Should we shuffle the labels (if FALSE), or should we replicate the same leaf over and over, while omitting other leaves? (this is when set to TRUE). |
dend_labels |
a character vector of the tree's labels. This can save the time it takes for getting the tree labels (in case we run a simulating, computing this once might save some running time). If missing, it uses labels in order to get the labels. |
sampled_labels |
a character vector of the tree's sampled labels. This can help us if we wish to compare two trees. In such a case we'd like to be able to have the same sample of labels used on both trees. If missing, it uses sample in order to get the sampled labels. Only works when replace=TRUE! |
fix_members |
logical (TRUE). Fix the number of members in attr using fix_members_attr.dendrogram |
fix_order |
logical (TRUE). Fix the leaves order |
fix_midpoint |
logical (TRUE). Fix the midpoint value. If TRUE, it overrides "fix_members" and turns it into TRUE (since it must have a correct number of members in order to work). values using rank_order.dendrogram |
... |
not used |
A dendrogram, after "sampling" its leaves.
## Not run: # define dendrogram object to play with: dend <- USArrests[1:5, ] %>% dist() %>% hclust(method = "ave") %>% as.dendrogram() plot(dend) # # same tree, with different order of labels plot(sample.dendrogram(dend, replace = FALSE)) # # A different tree (!), with some labels duplicated, # while others are pruned plot(sample.dendrogram(dend, replace = TRUE)) ## End(Not run)
## Not run: # define dendrogram object to play with: dend <- USArrests[1:5, ] %>% dist() %>% hclust(method = "ave") %>% as.dendrogram() plot(dend) # # same tree, with different order of labels plot(sample.dendrogram(dend, replace = FALSE)) # # A different tree (!), with some labels duplicated, # while others are pruned plot(sample.dendrogram(dend, replace = TRUE)) ## End(Not run)
Rotates a dendrogram so it confirms to an order of a provided distance object. The seriation algorithm is based on seriate, which tries to find a linear order for objects using data in form of a dissimilarity matrix (one mode data).
This is useful for heatmap visualization.
seriate_dendrogram(dend, x, method = c("OLO", "GW"), ...)
seriate_dendrogram(dend, x, method = c("OLO", "GW"), ...)
dend |
An object of class dendrogram or hclust |
x |
a dist object. |
method |
a character vector of either "OLO" or "GW": "OLO" - Optimal leaf ordering, optimzes the Hamiltonian path length that is restricted by the dendrogram structure - works in O(n^4) "GW" - Gruvaeus and Wainer heuristic to optimze the Hamiltonian path length that is restricted by the dendrogram structure |
... |
parameters passed to seriate |
A dendrogram that is rotated based on the optimal ordering of the distance matrix
## Not run: # library(dendextend) d <- dist(USArrests) hc <- hclust(d, "ave") dend <- as.dendrogram(hc) heatmap(as.matrix(USArrests)) dend2 <- seriate_dendrogram(dend, d) heatmap(as.matrix(USArrests), Rowv = dend) ## End(Not run)
## Not run: # library(dendextend) d <- dist(USArrests) hc <- hclust(d, "ave") dend <- as.dendrogram(hc) heatmap(as.matrix(USArrests)) dend2 <- seriate_dendrogram(dend, d) heatmap(as.matrix(USArrests), Rowv = dend) ## End(Not run)
a master function for updating various attributes and features of dendrogram objects.
set(dend, ...) ## S3 method for class 'dendrogram' set( dend, what = c("labels", "labels_colors", "labels_cex", "labels_to_character", "leaves_pch", "leaves_cex", "leaves_col", "leaves_bg", "nodes_pch", "nodes_cex", "nodes_col", "nodes_bg", "hang_leaves", "rank_branches", "branches_k_color", "branches_k_lty", "branches_col", "branches_lwd", "branches_lty", "by_labels_branches_col", "by_labels_branches_lwd", "by_labels_branches_lty", "by_lists_branches_col", "by_lists_branches_lwd", "by_lists_branches_lty", "highlight_branches_col", "highlight_branches_lwd", "clear_branches", "clear_leaves"), value, order_value = FALSE, ... ) ## S3 method for class 'dendlist' set(dend, ..., which) ## S3 method for class 'data.table' set(...)
set(dend, ...) ## S3 method for class 'dendrogram' set( dend, what = c("labels", "labels_colors", "labels_cex", "labels_to_character", "leaves_pch", "leaves_cex", "leaves_col", "leaves_bg", "nodes_pch", "nodes_cex", "nodes_col", "nodes_bg", "hang_leaves", "rank_branches", "branches_k_color", "branches_k_lty", "branches_col", "branches_lwd", "branches_lty", "by_labels_branches_col", "by_labels_branches_lwd", "by_labels_branches_lty", "by_lists_branches_col", "by_lists_branches_lwd", "by_lists_branches_lty", "highlight_branches_col", "highlight_branches_lwd", "clear_branches", "clear_leaves"), value, order_value = FALSE, ... ) ## S3 method for class 'dendlist' set(dend, ..., which) ## S3 method for class 'data.table' set(...)
dend |
a tree (dendrogram, or dendlist) |
... |
passed to the specific function for more options. |
what |
a character indicating what is the property of the tree that should be set/updated. (see the usage and the example section for the different options) |
value |
an object with the value to set in the dendrogram tree. (the type of the value depends on the "what") |
order_value |
logical. Default is FALSE. If TRUE, it means the order of the value is in the order of the data which produced the hclust or dendrogram - and will reorder the value to conform with the order of the labels in the dendrogram. |
which |
an integer vector indicating, in the case "dend" is a dendlist, on which of the trees should the modification be performed. If missing - the change will be performed on all of dends in the dendlist. |
This is a wrapper function for many of the main tasks we might wish to perform on a dendrogram before plotting.
The options of by_labels_branches_col, by_labels_branches_lwd, by_labels_branches_lty have extra parameters: type, attr, TF_value, and by_lists_branches_col, by_lists_branches_lwd, by_lists_branches_lty have extra parameters: attr, TF_value. You can read more about them here: branches_attr_by_labels and branches_attr_by_lists
The "what" parameter" can accept the following options:
labels - set the labels (labels<-.dendrogram)
labels_colors - set the labels' colors (color_labels)
labels_cex - set the labels' size (assign_values_to_leaves_nodePar)
labels_to_character - set the labels' to be characters
leaves_pch - set the leaves' point type (assign_values_to_leaves_nodePar). A leave is the terminal node of the tree.
leaves_cex - set the leaves' point size (assign_values_to_leaves_nodePar). For using this you MUST also set leaves_pch, a good value to use is 19.
leaves_col - set the leaves' point color (assign_values_to_leaves_nodePar). For using this you MUST also set leaves_pch, a good value to use is 19.
leaves_bg - set the leaves' point fill color (assign_values_to_leaves_nodePar). For using this you MUST also set leaves_pch with values from 21-25.
nodes_pch - set the nodes' point type (assign_values_to_nodes_nodePar)
nodes_cex - set the nodes' point size (assign_values_to_nodes_nodePar)
nodes_col - set the nodes' point color (assign_values_to_nodes_nodePar)
nodes_bg - set the nodes' point fill color (assign_values_to_nodes_nodePar). For using this you MUST also set leaves_pch with values from 21-25.
hang_leaves - hang the leaves (hang.dendrogram)
branches_k_color - color the branches (color_branches), a k
parameter needs to be supplied.
branches_k_lty - updates the lwd of the branches (similar to branches_k_color), a k
parameter needs to be supplied.
branches_col - set the color of branches (assign_values_to_branches_edgePar)
branches_lwd - set the line width of branches (assign_values_to_branches_edgePar)
branches_lty - set the line type of branches (assign_values_to_branches_edgePar)
by_labels_branches_col - set the color of branches with specific labels (branches_attr_by_labels)
by_labels_branches_lwd - set the line width of branches with specific labels (branches_attr_by_labels)
by_labels_branches_lty - set the line type of branches with specific labels (branches_attr_by_labels)
by_lists_branches_col - set the color of branches from the root of the tree down to (possibly inner) nodes with specified members (branches_attr_by_lists)
by_lists_branches_lwd - set the line width of branches from the root of the tree down to (possibly inner) nodes with specified members (branches_attr_by_lists)
by_lists_branches_lty - set the line type of branches from the root of the tree down to (possibly inner) nodes with specified members (branches_attr_by_lists)
highlight_branches_col - highlight branches color based on branches' heights (highlight_branches_col)
highlight_branches_lwd - highlight branches line-width based on branches' heights (highlight_branches_lwd)
clear_branches - clear branches' attributes (remove_branches_edgePar)
clear_leaves - clear leaves' attributes (remove_branches_edgePar)
An updated dendrogram (or dendlist), with some change to the parameters of it
labels<-.dendrogram, labels_colors<-, hang.dendrogram, color_branches, assign_values_to_leaves_nodePar, assign_values_to_branches_edgePar, remove_branches_edgePar, remove_leaves_nodePar, noded_with_condition, branches_attr_by_labels, branches_attr_by_lists, dendrogram
## Not run: set.seed(23235) ss <- sample(1:150, 10) # Getting the dend object dend <- iris[ss, -5] %>% dist() %>% hclust() %>% as.dendrogram() dend %>% plot() dend %>% labels() dend %>% set("labels", 1:10) %>% labels() dend %>% set("labels", 1:10) %>% plot() dend %>% set("labels_color") %>% plot() dend %>% set("labels_col", c(1, 2)) %>% plot() # Works also with partial matching :) dend %>% set("labels_cex", c(1, 1.2)) %>% plot() dend %>% set("leaves_pch", NA) %>% plot() dend %>% set("leaves_pch", c(1:5)) %>% plot() dend %>% set("leaves_pch", c(19, 19, NA)) %>% set("leaves_cex", c(1, 2)) %>% plot() dend %>% set("leaves_pch", c(19, 19, NA)) %>% set("leaves_cex", c(1, 2)) %>% set("leaves_col", c(1, 1, 2, 2)) %>% plot() dend %>% set("hang") %>% plot() # using bg for leaves and nodes set.seed(23235) ss <- sample(1:150, 25) # Getting the dend object dend25 <- iris[ss, -5] %>% dist() %>% hclust() %>% as.dendrogram() dend25 %>% set("labels", 1:25) %>% set("nodes_pch", 21) %>% # set all nodes to be pch 21 set("nodes_col", "darkred") %>% set("nodes_bg", "gold") %>% set("leaves_pch", 1:25) %>% # Change the leaves pch to move from 1 to 25 set("leaves_col", "darkred") %>% set("leaves_bg", "gold") %>% plot(main = "pch 21 to 25 supports the\nnodes_bg and leaves_bg parameters") dend %>% set("branches_k_col") %>% plot() dend %>% set("branches_k_col", c(1, 2)) %>% plot() dend %>% set("branches_k_col", c(1, 2, 3), k = 3) %>% plot() dend %>% set("branches_k_col", k = 3) %>% plot() dend %>% set("branches_k_lty", k = 3) %>% plot() dend %>% set("branches_k_col", k = 3) %>% set("branches_k_lty", k = 3) %>% plot() dend %>% set("branches_col", c(1, 2, 1, 2, NA)) %>% plot() dend %>% set("branches_lwd", c(2, 1, 2)) %>% plot() dend %>% set("branches_lty", c(1, 2, 1)) %>% plot() # clears all of the things added to the leaves dend %>% set("labels_color", c(19, 19, NA)) %>% set("leaves_pch", c(19, 19, NA)) %>% # plot set("clear_leaves") %>% # remove all of what was done until this point plot() # Different order dend %>% set("leaves_pch", c(19, 19, NA)) %>% set("labels_color", c(19, 19, NA)) %>% set("clear_leaves") %>% plot() # doing this without chaining (%>%) will NOT be fun: dend %>% set("labels", 1:10) %>% set("labels_color") %>% set("branches_col", c(1, 2, 1, 2, NA)) %>% set("branches_lwd", c(2, 1, 2)) %>% set("branches_lty", c(1, 2, 1)) %>% set("hang") %>% plot() par(mfrow = c(1, 3)) dend %>% set("highlight_branches_col") %>% plot() dend %>% set("highlight_branches_lwd") %>% plot() dend %>% set("highlight_branches_col") %>% set("highlight_branches_lwd") %>% plot() par(mfrow = c(1, 1)) #---------------------------- # Examples for: by_labels_branches_col, by_labels_branches_lwd, by_labels_branches_lty old_labels <- labels(dend) dend %>% set("labels", seq_len(nleaves(dend))) %>% set("by_labels_branches_col", c(1:4, 7)) %>% set("by_labels_branches_lwd", c(1:4, 7)) %>% set("by_labels_branches_lty", c(1:4, 7)) %>% set("labels", old_labels) %>% plot() dend %>% set("labels", seq_len(nleaves(dend))) %>% set("by_labels_branches_col", c(1:4, 7), type = "any", TF_values = c(4, 2)) %>% set("by_labels_branches_lwd", c(1:4, 7), type = "all", TF_values = c(4, 1)) %>% set("by_labels_branches_lty", c(1:4, 7), TF_values = c(4, 1)) %>% plot() #---- using order_value # This is probably not what you want, since cutree # returns clusters in the order of the original data: dend %>% set("labels_colors", cutree(dend, k = 3)) %>% plot() # The way to fix it, is to use order_value = TRUE # so that value is assumed to be in the order of the data: dend %>% set("labels_colors", cutree(dend, k = 3), order_value = TRUE) %>% plot() #---------------------------- # Example for: by_lists_branches_col, by_lists_branches_lwd, by_lists_branches_lty L <- list(c("109", "123", "126", "145"), "29", c("59", "67", "97")) dend %>% set("by_lists_branches_col", L, TF_value = "blue") %>% set("by_lists_branches_lwd", L, TF_value = 4) %>% set("by_lists_branches_lty", L, TF_value = 3) %>% plot() #---------------------------- # A few dendlist examples: dendlist(dend, dend) %>% set("hang") %>% plot() dendlist(dend, dend) %>% set("branches_k_col", k = 3) %>% plot() dendlist(dend, dend) %>% set("labels_col", c(1, 2)) %>% plot() dendlist(dend, dend) %>% set("hang") %>% set("labels_col", c(1, 2), which = 1) %>% set("branches_k_col", k = 3, which = 2) %>% set("labels_cex", 1.2) %>% plot() #---------------------------- # example of modifying the dendrogram in a heatmap: library(gplots) data(mtcars) x <- as.matrix(mtcars) rc <- rainbow(nrow(x), start = 0, end = .3) cc <- rainbow(ncol(x), start = 0, end = .3) ## ##' demonstrate the effect of row and column dendrogram options ## Rowv_dend <- x %>% dist() %>% hclust() %>% as.dendrogram() %>% set("branches_k", k = 3) %>% set("branches_lwd", 2) %>% ladderize() # rotate_DendSer Colv_dend <- t(x) %>% dist() %>% hclust() %>% as.dendrogram() %>% set("branches_k", k = 3) %>% set("branches_lwd", 2) %>% ladderize() # rotate_DendSer heatmap.2(x, Rowv = Rowv_dend, Colv = Colv_dend) ## End(Not run)
## Not run: set.seed(23235) ss <- sample(1:150, 10) # Getting the dend object dend <- iris[ss, -5] %>% dist() %>% hclust() %>% as.dendrogram() dend %>% plot() dend %>% labels() dend %>% set("labels", 1:10) %>% labels() dend %>% set("labels", 1:10) %>% plot() dend %>% set("labels_color") %>% plot() dend %>% set("labels_col", c(1, 2)) %>% plot() # Works also with partial matching :) dend %>% set("labels_cex", c(1, 1.2)) %>% plot() dend %>% set("leaves_pch", NA) %>% plot() dend %>% set("leaves_pch", c(1:5)) %>% plot() dend %>% set("leaves_pch", c(19, 19, NA)) %>% set("leaves_cex", c(1, 2)) %>% plot() dend %>% set("leaves_pch", c(19, 19, NA)) %>% set("leaves_cex", c(1, 2)) %>% set("leaves_col", c(1, 1, 2, 2)) %>% plot() dend %>% set("hang") %>% plot() # using bg for leaves and nodes set.seed(23235) ss <- sample(1:150, 25) # Getting the dend object dend25 <- iris[ss, -5] %>% dist() %>% hclust() %>% as.dendrogram() dend25 %>% set("labels", 1:25) %>% set("nodes_pch", 21) %>% # set all nodes to be pch 21 set("nodes_col", "darkred") %>% set("nodes_bg", "gold") %>% set("leaves_pch", 1:25) %>% # Change the leaves pch to move from 1 to 25 set("leaves_col", "darkred") %>% set("leaves_bg", "gold") %>% plot(main = "pch 21 to 25 supports the\nnodes_bg and leaves_bg parameters") dend %>% set("branches_k_col") %>% plot() dend %>% set("branches_k_col", c(1, 2)) %>% plot() dend %>% set("branches_k_col", c(1, 2, 3), k = 3) %>% plot() dend %>% set("branches_k_col", k = 3) %>% plot() dend %>% set("branches_k_lty", k = 3) %>% plot() dend %>% set("branches_k_col", k = 3) %>% set("branches_k_lty", k = 3) %>% plot() dend %>% set("branches_col", c(1, 2, 1, 2, NA)) %>% plot() dend %>% set("branches_lwd", c(2, 1, 2)) %>% plot() dend %>% set("branches_lty", c(1, 2, 1)) %>% plot() # clears all of the things added to the leaves dend %>% set("labels_color", c(19, 19, NA)) %>% set("leaves_pch", c(19, 19, NA)) %>% # plot set("clear_leaves") %>% # remove all of what was done until this point plot() # Different order dend %>% set("leaves_pch", c(19, 19, NA)) %>% set("labels_color", c(19, 19, NA)) %>% set("clear_leaves") %>% plot() # doing this without chaining (%>%) will NOT be fun: dend %>% set("labels", 1:10) %>% set("labels_color") %>% set("branches_col", c(1, 2, 1, 2, NA)) %>% set("branches_lwd", c(2, 1, 2)) %>% set("branches_lty", c(1, 2, 1)) %>% set("hang") %>% plot() par(mfrow = c(1, 3)) dend %>% set("highlight_branches_col") %>% plot() dend %>% set("highlight_branches_lwd") %>% plot() dend %>% set("highlight_branches_col") %>% set("highlight_branches_lwd") %>% plot() par(mfrow = c(1, 1)) #---------------------------- # Examples for: by_labels_branches_col, by_labels_branches_lwd, by_labels_branches_lty old_labels <- labels(dend) dend %>% set("labels", seq_len(nleaves(dend))) %>% set("by_labels_branches_col", c(1:4, 7)) %>% set("by_labels_branches_lwd", c(1:4, 7)) %>% set("by_labels_branches_lty", c(1:4, 7)) %>% set("labels", old_labels) %>% plot() dend %>% set("labels", seq_len(nleaves(dend))) %>% set("by_labels_branches_col", c(1:4, 7), type = "any", TF_values = c(4, 2)) %>% set("by_labels_branches_lwd", c(1:4, 7), type = "all", TF_values = c(4, 1)) %>% set("by_labels_branches_lty", c(1:4, 7), TF_values = c(4, 1)) %>% plot() #---- using order_value # This is probably not what you want, since cutree # returns clusters in the order of the original data: dend %>% set("labels_colors", cutree(dend, k = 3)) %>% plot() # The way to fix it, is to use order_value = TRUE # so that value is assumed to be in the order of the data: dend %>% set("labels_colors", cutree(dend, k = 3), order_value = TRUE) %>% plot() #---------------------------- # Example for: by_lists_branches_col, by_lists_branches_lwd, by_lists_branches_lty L <- list(c("109", "123", "126", "145"), "29", c("59", "67", "97")) dend %>% set("by_lists_branches_col", L, TF_value = "blue") %>% set("by_lists_branches_lwd", L, TF_value = 4) %>% set("by_lists_branches_lty", L, TF_value = 3) %>% plot() #---------------------------- # A few dendlist examples: dendlist(dend, dend) %>% set("hang") %>% plot() dendlist(dend, dend) %>% set("branches_k_col", k = 3) %>% plot() dendlist(dend, dend) %>% set("labels_col", c(1, 2)) %>% plot() dendlist(dend, dend) %>% set("hang") %>% set("labels_col", c(1, 2), which = 1) %>% set("branches_k_col", k = 3, which = 2) %>% set("labels_cex", 1.2) %>% plot() #---------------------------- # example of modifying the dendrogram in a heatmap: library(gplots) data(mtcars) x <- as.matrix(mtcars) rc <- rainbow(nrow(x), start = 0, end = .3) cc <- rainbow(ncol(x), start = 0, end = .3) ## ##' demonstrate the effect of row and column dendrogram options ## Rowv_dend <- x %>% dist() %>% hclust() %>% as.dendrogram() %>% set("branches_k", k = 3) %>% set("branches_lwd", 2) %>% ladderize() # rotate_DendSer Colv_dend <- t(x) %>% dist() %>% hclust() %>% as.dendrogram() %>% set("branches_k", k = 3) %>% set("branches_lwd", 2) %>% ladderize() # rotate_DendSer heatmap.2(x, Rowv = Rowv_dend, Colv = Colv_dend) ## End(Not run)
Convenience functions for updating the labels of a dendrogram. set_labels and place_labels differs in their assumption about the order of the labels. * set_labels assumes the labels are in the same order as that of the labels in the dendrogram. * place_labels assumes the labels has the same order as that of the items in the original data matrix. This is useful for renaming labels based on some other columns in the data matrix.
set_labels(dend, labels, ...)
set_labels(dend, labels, ...)
dend |
a dendrogram object |
labels |
A vector of values to insert in the labels of a dendrogram. |
... |
Currently ignored. |
The updated dendrogram object
Tal Galili, Garrett Grolemund
ss <- c( 50, 114, 17, 102, 76, 10, 107, 84, 31, 37, 49, 106, 44, 119, 104, 145, 67, 85, 12, 77, 22, 136, 38, 135, 70 ) small_iris <- iris[ss, ] small_iris[, -5] %>% dist() %>% hclust(method = "complete") %>% as.dendrogram() %>% color_branches(k = 3) %>% color_labels(k = 3) %>% plot() # example for using place_labels small_iris[, -5] %>% dist() %>% hclust(method = "complete") %>% as.dendrogram() %>% color_branches(k = 3) %>% color_labels(k = 3) %>% place_labels(paste(small_iris$Species, 1:25, sep = "_")) %>% plot() # example for using set_labels small_iris[, -5] %>% dist() %>% hclust(method = "complete") %>% as.dendrogram() %>% color_branches(k = 3) %>% color_labels(k = 3) %>% set_labels(1:25) %>% plot()
ss <- c( 50, 114, 17, 102, 76, 10, 107, 84, 31, 37, 49, 106, 44, 119, 104, 145, 67, 85, 12, 77, 22, 136, 38, 135, 70 ) small_iris <- iris[ss, ] small_iris[, -5] %>% dist() %>% hclust(method = "complete") %>% as.dendrogram() %>% color_branches(k = 3) %>% color_labels(k = 3) %>% plot() # example for using place_labels small_iris[, -5] %>% dist() %>% hclust(method = "complete") %>% as.dendrogram() %>% color_branches(k = 3) %>% color_labels(k = 3) %>% place_labels(paste(small_iris$Species, 1:25, sep = "_")) %>% plot() # example for using set_labels small_iris[, -5] %>% dist() %>% hclust(method = "complete") %>% as.dendrogram() %>% color_branches(k = 3) %>% color_labels(k = 3) %>% set_labels(1:25) %>% plot()
'shuffle' randomilly rotates ("shuffles") a tree, changing its presentation while preserving its topolgoy. 'shuffle' is based on rotate and through its methods can work for any of the major tree objects in R (dendrogram/hclust/phylo).
This function is useful in combination with tanglegram and entanglement.
shuffle(dend, ...) ## Default S3 method: shuffle(dend, ...) ## S3 method for class 'dendrogram' shuffle(dend, ...) ## S3 method for class 'dendlist' shuffle(dend, which, ...) ## S3 method for class 'hclust' shuffle(dend, ...) ## S3 method for class 'phylo' shuffle(dend, ...)
shuffle(dend, ...) ## Default S3 method: shuffle(dend, ...) ## S3 method for class 'dendrogram' shuffle(dend, ...) ## S3 method for class 'dendlist' shuffle(dend, which, ...) ## S3 method for class 'hclust' shuffle(dend, ...) ## S3 method for class 'phylo' shuffle(dend, ...)
dend |
a tree object (dendrogram/hclust/phylo) |
... |
Ignored. |
which |
an integer vector for indicating which of the trees in the dendlist object should be plotted default is missing, in which case all the dends in dendlist will be shuffled |
'shuffle' is a function that randomilly rotates ("shuffles") a tree. a dendrogram leaves order (by means of rotation)
A randomlly rotated tree object
tanglegram
, entanglement
,
rotate
dend <- USArrests %>% dist() %>% hclust() %>% as.dendrogram() set.seed(234238) dend2 <- shuffle(dend) tanglegram(dend, dend2, margin_inner = 7) entanglement(dend, dend2) # 0.3983 # although these ARE the SAME tree: tanglegram(sort(dend), sort(dend2), margin_inner = 7)
dend <- USArrests %>% dist() %>% hclust() %>% as.dendrogram() set.seed(234238) dend2 <- shuffle(dend) tanglegram(dend, dend2, margin_inner = 7) entanglement(dend, dend2) # 0.3983 # although these ARE the SAME tree: tanglegram(sort(dend), sort(dend2), margin_inner = 7)
Sorts two clusters vector by their names and returns a list with the sorted vectors.
sort_2_clusters_vectors( A1_clusters, A2_clusters, assume_sorted_vectors = FALSE, warn = dendextend_options("warn"), ... )
sort_2_clusters_vectors( A1_clusters, A2_clusters, assume_sorted_vectors = FALSE, warn = dendextend_options("warn"), ... )
A1_clusters |
a numeric vector of cluster grouping (numeric) of items, with a name attribute of item name for each element from group A1. These are often obtained by using some k cut on a dendrogram. |
A2_clusters |
a numeric vector of cluster grouping (numeric) of items, with a name attribute of item name for each element from group A2. These are often obtained by using some k cut on a dendrogram. |
assume_sorted_vectors |
logical (FALSE). Can we assume to two group vectors are sorter so that they have the same order of items? IF FALSE (default), then the vectors will be sorted based on their name attribute. |
warn |
logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE. |
... |
Ignored. |
A list with two elements, corresponding to the two clustering vectors.
## Not run: set.seed(23235) ss <- sample(1:150, 4) hc1 <- hclust(dist(iris[ss, -5]), "com") hc2 <- hclust(dist(iris[ss, -5]), "single") # dend1 <- as.dendrogram(hc1) # dend2 <- as.dendrogram(hc2) # cutree(dend1) A1_clusters <- cutree(hc1, k = 3) A2_clusters <- sample(cutree(hc1, k = 3)) sort_2_clusters_vectors(A1_clusters, A2_clusters, assume_sorted_vectors = TRUE) # no sorting sort_2_clusters_vectors(A1_clusters, A2_clusters, assume_sorted_vectors = FALSE) # Sorted ## End(Not run)
## Not run: set.seed(23235) ss <- sample(1:150, 4) hc1 <- hclust(dist(iris[ss, -5]), "com") hc2 <- hclust(dist(iris[ss, -5]), "single") # dend1 <- as.dendrogram(hc1) # dend2 <- as.dendrogram(hc2) # cutree(dend1) A1_clusters <- cutree(hc1, k = 3) A2_clusters <- sample(cutree(hc1, k = 3)) sort_2_clusters_vectors(A1_clusters, A2_clusters, assume_sorted_vectors = TRUE) # no sorting sort_2_clusters_vectors(A1_clusters, A2_clusters, assume_sorted_vectors = FALSE) # Sorted ## End(Not run)
Sorts a distance matrix by the names of the rows and columns.
sort_dist_mat(dist_mat, by_rows = TRUE, by_cols = TRUE, ...)
sort_dist_mat(dist_mat, by_rows = TRUE, by_cols = TRUE, ...)
dist_mat |
a distance matrix. |
by_rows |
logical (TRUE). Sort the distance matrix by rows? |
by_cols |
logical (TRUE). Sort the distance matrix by columns? |
... |
Ignored. |
A distance matrix (after sorting)
Takes a numeric vector and sort its values so that they
would be increasing from left to right.
It is different from sort
in that the function
will only "sort" the values levels, and not the vector itself.
This function is useful for cutree - making the sort_cluster_numbers parameter possible. Using that parameter with TRUE makes the clusters id's from cutree to be ordered from left to right. e.g: the left most cluster in the tree will be numbered "1", the one after it will be "2" etc...).
sort_levels_values( x, MARGIN = 2, decreasing = FALSE, force_integer = FALSE, warn = dendextend_options("warn"), ... )
sort_levels_values( x, MARGIN = 2, decreasing = FALSE, force_integer = FALSE, warn = dendextend_options("warn"), ... )
x |
a numeric vector. |
MARGIN |
passed to apply. It is a vector giving the subscripts which the function will be applied over. E.g., for a matrix 1 indicates rows, 2 indicates columns, c(1, 2) indicates rows and columns. Where X has named dimnames, it can be a character vector selecting dimension names. |
decreasing |
logical (FALSE). Should the sort be increasing or decreasing? |
force_integer |
logical (FALSE). Should the values returned be integers? |
warn |
logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE. (for example when x had NA values in it) |
... |
ignored. |
if x is an object - it returns logical - is the object of class dendrogram.
x <- 1:4 sort_levels_values(x) # 1 2 3 4 x <- c(4:1) names(x) <- letters[x] attr(x, "keep_me") <- "a cat" sort_levels_values(x) # 1 2 3 4 x <- c(4:1, 4, 2) sort_levels_values(x) # 1 2 3 4 1 3 x <- c(2, 2, 3, 2, 1) sort_levels_values(x) # 1 1 2 1 3 x <- matrix(16:1, 4, 4) rownames(x) <- letters[1:4] x apply(x, 2, sort_levels_values)
x <- 1:4 sort_levels_values(x) # 1 2 3 4 x <- c(4:1) names(x) <- letters[x] attr(x, "keep_me") <- "a cat" sort_levels_values(x) # 1 2 3 4 x <- c(4:1, 4, 2) sort_levels_values(x) # 1 2 3 4 1 3 x <- c(2, 2, 3, 2, 1) sort_levels_values(x) # 1 1 2 1 3 x <- matrix(16:1, 4, 4) rownames(x) <- letters[1:4] x apply(x, 2, sort_levels_values)
Plots a tanglegram plot of a side by side trees.
tanglegram(dend1, ...) ## Default S3 method: tanglegram(dend1, ...) ## S3 method for class 'hclust' tanglegram(dend1, ...) ## S3 method for class 'phylo' tanglegram(dend1, ...) ## S3 method for class 'dendlist' tanglegram( dend1, which = c(1L, 2L), main_left, main_right, just_one = TRUE, ... ) ## S3 method for class 'dendrogram' tanglegram( dend1, dend2, sort = FALSE, color_lines, lwd = 3.5, edge.lwd = NULL, columns_width = c(5, 3, 5), margin_top = 3, margin_bottom = 2.5, margin_inner = 3, margin_outer = 0.5, left_dendo_mar = c(margin_bottom, margin_outer, margin_top, margin_inner), right_dendo_mar = c(margin_bottom, margin_inner, margin_top, margin_outer), intersecting = TRUE, dLeaf = NULL, dLeaf_left = dLeaf, dLeaf_right = dLeaf, axes = TRUE, type = "r", lab.cex = NULL, remove_nodePar = FALSE, main = "", main_left = "", main_right = "", sub = "", k_labels = NULL, k_branches = NULL, rank_branches = FALSE, hang = FALSE, match_order_by_labels = TRUE, cex_main = 2, cex_main_left = cex_main, cex_main_right = cex_main, cex_sub = cex_main, highlight_distinct_edges = TRUE, common_subtrees_color_lines = TRUE, common_subtrees_color_lines_default_single_leaf_color = "grey", common_subtrees_color_branches = FALSE, highlight_branches_col = FALSE, highlight_branches_lwd = TRUE, faster = FALSE, just_one = TRUE, ... ) dendbackback( dend1, dend2, sort = FALSE, color_lines, lwd = 3.5, edge.lwd = NULL, columns_width = c(5, 3, 5), margin_top = 3, margin_bottom = 2.5, margin_inner = 3, margin_outer = 0.5, left_dendo_mar = c(margin_bottom, margin_outer, margin_top, margin_inner), right_dendo_mar = c(margin_bottom, margin_inner, margin_top, margin_outer), intersecting = TRUE, dLeaf = NULL, dLeaf_left = dLeaf, dLeaf_right = dLeaf, axes = TRUE, type = "r", lab.cex = NULL, remove_nodePar = FALSE, main = "", main_left = "", main_right = "", sub = "", k_labels = NULL, k_branches = NULL, rank_branches = FALSE, hang = FALSE, match_order_by_labels = TRUE, cex_main = 2, cex_main_left = cex_main, cex_main_right = cex_main, cex_sub = cex_main, highlight_distinct_edges = TRUE, common_subtrees_color_lines = TRUE, common_subtrees_color_lines_default_single_leaf_color = "grey", common_subtrees_color_branches = FALSE, highlight_branches_col = FALSE, highlight_branches_lwd = TRUE, faster = FALSE, just_one = TRUE, ... )
tanglegram(dend1, ...) ## Default S3 method: tanglegram(dend1, ...) ## S3 method for class 'hclust' tanglegram(dend1, ...) ## S3 method for class 'phylo' tanglegram(dend1, ...) ## S3 method for class 'dendlist' tanglegram( dend1, which = c(1L, 2L), main_left, main_right, just_one = TRUE, ... ) ## S3 method for class 'dendrogram' tanglegram( dend1, dend2, sort = FALSE, color_lines, lwd = 3.5, edge.lwd = NULL, columns_width = c(5, 3, 5), margin_top = 3, margin_bottom = 2.5, margin_inner = 3, margin_outer = 0.5, left_dendo_mar = c(margin_bottom, margin_outer, margin_top, margin_inner), right_dendo_mar = c(margin_bottom, margin_inner, margin_top, margin_outer), intersecting = TRUE, dLeaf = NULL, dLeaf_left = dLeaf, dLeaf_right = dLeaf, axes = TRUE, type = "r", lab.cex = NULL, remove_nodePar = FALSE, main = "", main_left = "", main_right = "", sub = "", k_labels = NULL, k_branches = NULL, rank_branches = FALSE, hang = FALSE, match_order_by_labels = TRUE, cex_main = 2, cex_main_left = cex_main, cex_main_right = cex_main, cex_sub = cex_main, highlight_distinct_edges = TRUE, common_subtrees_color_lines = TRUE, common_subtrees_color_lines_default_single_leaf_color = "grey", common_subtrees_color_branches = FALSE, highlight_branches_col = FALSE, highlight_branches_lwd = TRUE, faster = FALSE, just_one = TRUE, ... ) dendbackback( dend1, dend2, sort = FALSE, color_lines, lwd = 3.5, edge.lwd = NULL, columns_width = c(5, 3, 5), margin_top = 3, margin_bottom = 2.5, margin_inner = 3, margin_outer = 0.5, left_dendo_mar = c(margin_bottom, margin_outer, margin_top, margin_inner), right_dendo_mar = c(margin_bottom, margin_inner, margin_top, margin_outer), intersecting = TRUE, dLeaf = NULL, dLeaf_left = dLeaf, dLeaf_right = dLeaf, axes = TRUE, type = "r", lab.cex = NULL, remove_nodePar = FALSE, main = "", main_left = "", main_right = "", sub = "", k_labels = NULL, k_branches = NULL, rank_branches = FALSE, hang = FALSE, match_order_by_labels = TRUE, cex_main = 2, cex_main_left = cex_main, cex_main_right = cex_main, cex_sub = cex_main, highlight_distinct_edges = TRUE, common_subtrees_color_lines = TRUE, common_subtrees_color_lines_default_single_leaf_color = "grey", common_subtrees_color_branches = FALSE, highlight_branches_col = FALSE, highlight_branches_lwd = TRUE, faster = FALSE, just_one = TRUE, ... )
dend1 |
tree object (dendrogram/dendlist/hclust/phylo), plotted on the left |
... |
not used. |
which |
an integer vector of length 2, indicating which of the trees in the dendlist object should be plotted |
main_left |
Character. Title of the left dendrogram. |
main_right |
Character. Title of the right dendrogram. |
just_one |
logical (TRUE). If FALSE, it means at least two tanglegrams will be plotted on the same page and so layout is not passed. See: https://stackoverflow.com/q/39784746/4137985 |
dend2 |
tree object (dendrogram/hclust/phylo), plotted on the right |
sort |
logical (FALSE). Should the dendrogram's labels be "sorted"? (might give a better tree in some cases). |
color_lines |
a vector of colors for the lines connected the labels. If the colors are shorter than the number of labels, they are recycled (and a warning is issued). The colors in the vector are applied on the lines from the bottom up. |
lwd |
width of the lines connecting the labels. (default is 3.5) |
edge.lwd |
width of the dendrograms lines. Default is NULL. If set, then it switches 'highlight_branches_lwd' to FALSE. If you want thicker lines which reflect the height, please use highlight_branches_lwd on the dendrograms/dendlist. |
columns_width |
a vector with three elements, giving the relative sizes of the the three plots (left dendrogram, connecting lines, right dendrogram). This is passed to layout if parameter just_one is TRUE. The default is: c(5,3,5) |
margin_top |
the number of lines of margin to be specified on the top of the plots. |
margin_bottom |
the number of lines of margin to be specified on the bottom of the plots. |
margin_inner |
margin_bottom the number of lines of margin to be specified on the inner distence between the dendrograms and the connecting lines. |
margin_outer |
margin_bottom the number of lines of margin to be specified on the outer distence between the dendrograms and the connecting lines. |
left_dendo_mar |
mar parameters of the left dendrgoram. |
right_dendo_mar |
mar parameters of the right dendrgoram. |
intersecting |
logical (TRUE). Should the leaves of the two dendrograms be pruned so that the two trees will have the same labels? |
dLeaf |
a number specifying the distance in user coordinates between the tip of a leaf and its label. If NULL, as per default, 3/4 of a letter width or height is used. Notice that if we are comparing two dendrograms with different heights, manually changing dLeaf will affect both trees differently. In such a case, it is recommanded to manually change dLeaf_left and dLeaf_right. This can be especially important when changing the lab.cex of the dendrogram's labels. Alternatively, one could manually set the xlim parameter for both trees, which will force the proportion of distances of the labels from the trees to remain the same. |
dLeaf_left |
dLeaf of the left dendrogram, by default it is equal to dLeaf (often negative). |
dLeaf_right |
dLeaf of the right dendrogram, by default it is equal to minus dLeaf (often positive). |
axes |
logical (TRUE). Should plot axes be plotted? |
type |
type of plot ("t"/"r" = triangle or rectangle) |
lab.cex |
numeric scalar, influanicing the cex size of the labels. |
remove_nodePar |
logical (FALSE). Should the nodePar of the leaves be removed? (useful when the trees' leaves has too many parameters on them) |
main |
Character. Title above the connecting lines. |
sub |
Character. Title below the connecting lines. |
k_labels |
integer. Number of groups by which to color the leaves. |
k_branches |
integer. Number of groups by which to color the branches. |
rank_branches |
logical (FALSE). Should the branches heights be adjusted? (setting this to TRUE - can make it easier for comparing topological differences) |
hang |
logical (FALSE). Should we hang the leaves of the trees? |
match_order_by_labels |
logical (TRUE). Should the leaves value order be matched between the two trees based on labels? This is a MUST in order to have the lines connect the correct labels. Set this to FALSE if you want to make the plotting a bit faster, and only after you are sure the labels and orders are correctly aligned. |
cex_main |
A numerical value giving the amount by which plotting title should be magnified relative to the default. |
cex_main_left |
see cex_main. |
cex_main_right |
see cex_main. |
cex_sub |
see cex_main. |
highlight_distinct_edges |
logical (default is TRUE). If to highlight distinct edges in each tree (by changing their line types to 2). (notice that this can be slow on large trees) This parameter will automatically be turned off if the tree already comes with a "lty" edgePar (this is checked using has_edgePar). A "lty" can be removed by using set("clear_branches"), by removing all of the edgePar parameters of the dendrogram. |
common_subtrees_color_lines |
logical (default is TRUE). color the connecting line based on the common subtrees of both dends. This only works if (notice that this can be slow on large trees) |
common_subtrees_color_lines_default_single_leaf_color |
When representing edges between common subtrees (i.e. common_subtrees_color_branches = TRUE), this parameter sets the color of edges for subtrees that are NOT common. Default is "grey" |
common_subtrees_color_branches |
logical (default is FALSE). Color the branches of both dends based on the common subtrees. (notice that this can be slow on large trees) This is FALSE by default since it will override the colors of the existing tree. |
highlight_branches_col |
logical (default is FALSE). Should highlight_branches_col be used on the dendrograms. This parameter will automatically be turned off if the tree already comes with a "col" edgePar (this is checked using has_edgePar). A "lty" can be removed by using set("clear_branches"), by removing all of the edgePar parameters of the dendrogram. |
highlight_branches_lwd |
logical (default is TRUE). Should highlight_branches_lwd be used on the dendrograms. This parameter will automatically be turned off if the tree already comes with a "lwd" edgePar (this is checked using has_edgePar). A "lty" can be removed by using set("clear_branches"), by removing all of the edgePar parameters of the dendrogram. |
faster |
logical (FALSE). If TRUE, it overrides some other parameters to have them turned off so that the plotting will go a tiny bit faster. |
Notice that tanglegram does not "resize" well. In case you are resizing your window you would need to re-run the function.
An invisible dendlist, with two trees after being modified during the creation of the tanglegram.
Tal Galili, Johan Renaudie
The function is based on code from Johan Renaudie (plannapus), after major revisions. See: https://stackoverflow.com/questions/12456768/duelling-dendrograms-in-r-placing-dendrograms-back-to-back-in-r
As far as I could tell, this code was originally inspired by Dylan Beaudette
function dueling.dendrograms
from the sharpshootR package:
https://CRAN.R-project.org/package=sharpshootR
tanglegram
remove_leaves_nodePar, plot_horiz.dendrogram, rank_branches, hang.dendrogram
## Not run: set.seed(23235) ss <- sample(1:150, 10) dend1 <- iris[ss, -5] %>% dist() %>% hclust("com") %>% as.dendrogram() dend2 <- iris[ss, -5] %>% dist() %>% hclust("sin") %>% as.dendrogram() dend12 <- dendlist(dend1, dend2) dend12 %>% tanglegram() tanglegram(dend1, dend2) tanglegram(dend1, dend2, sort = TRUE) tanglegram(dend1, dend2, remove_nodePar = TRUE) tanglegram(dend1, dend2, k_labels = 6, k_branches = 4) tanglegram(dend1, dend2, lab.cex = 2, edge.lwd = 3, margin_inner = 5, type = "t", center = TRUE ) ## works nicely: tanglegram(dend1, dend2, lab.cex = 2, edge.lwd = 3, margin_inner = 3.5, type = "t", center = TRUE, dLeaf = -0.1, xlim = c(7, 0), k_branches = 3 ) # using rank_branches can make the comparison even easier tanglegram(rank_branches(dend1), rank_branches(dend2), lab.cex = 2, edge.lwd = 3, margin_inner = 3.5, type = "t", center = TRUE, dLeaf = -0.1, xlim = c(5.1, 0), columns_width = c(5, 1, 5), k_branches = 3 ) ######## ## Nice example of some colored trees # see the coloring of common sub trees: set.seed(23235) ss <- sample(1:150, 10) dend1 <- iris[ss, -5] %>% dist() %>% hclust("com") %>% as.dendrogram() dend2 <- iris[ss, -5] %>% dist() %>% hclust("sin") %>% as.dendrogram() dend12 <- dendlist(dend1, dend2) # dend12 %>% untangle %>% tanglegram dend12 %>% tanglegram(common_subtrees_color_branches = TRUE) set.seed(22133513) ss <- sample(1:150, 10) dend1 <- iris[ss, -5] %>% dist() %>% hclust("com") %>% as.dendrogram() dend2 <- iris[ss, -5] %>% dist() %>% hclust("sin") %>% as.dendrogram() dend12 <- dendlist(dend1, dend2) # dend12 %>% untangle %>% tanglegram dend12 %>% tanglegram(common_subtrees_color_branches = TRUE) dend12 %>% tanglegram() ## End(Not run)
## Not run: set.seed(23235) ss <- sample(1:150, 10) dend1 <- iris[ss, -5] %>% dist() %>% hclust("com") %>% as.dendrogram() dend2 <- iris[ss, -5] %>% dist() %>% hclust("sin") %>% as.dendrogram() dend12 <- dendlist(dend1, dend2) dend12 %>% tanglegram() tanglegram(dend1, dend2) tanglegram(dend1, dend2, sort = TRUE) tanglegram(dend1, dend2, remove_nodePar = TRUE) tanglegram(dend1, dend2, k_labels = 6, k_branches = 4) tanglegram(dend1, dend2, lab.cex = 2, edge.lwd = 3, margin_inner = 5, type = "t", center = TRUE ) ## works nicely: tanglegram(dend1, dend2, lab.cex = 2, edge.lwd = 3, margin_inner = 3.5, type = "t", center = TRUE, dLeaf = -0.1, xlim = c(7, 0), k_branches = 3 ) # using rank_branches can make the comparison even easier tanglegram(rank_branches(dend1), rank_branches(dend2), lab.cex = 2, edge.lwd = 3, margin_inner = 3.5, type = "t", center = TRUE, dLeaf = -0.1, xlim = c(5.1, 0), columns_width = c(5, 1, 5), k_branches = 3 ) ######## ## Nice example of some colored trees # see the coloring of common sub trees: set.seed(23235) ss <- sample(1:150, 10) dend1 <- iris[ss, -5] %>% dist() %>% hclust("com") %>% as.dendrogram() dend2 <- iris[ss, -5] %>% dist() %>% hclust("sin") %>% as.dendrogram() dend12 <- dendlist(dend1, dend2) # dend12 %>% untangle %>% tanglegram dend12 %>% tanglegram(common_subtrees_color_branches = TRUE) set.seed(22133513) ss <- sample(1:150, 10) dend1 <- iris[ss, -5] %>% dist() %>% hclust("com") %>% as.dendrogram() dend2 <- iris[ss, -5] %>% dist() %>% hclust("sin") %>% as.dendrogram() dend12 <- dendlist(dend1, dend2) # dend12 %>% untangle %>% tanglegram dend12 %>% tanglegram(common_subtrees_color_branches = TRUE) dend12 %>% tanglegram() ## End(Not run)
Sets most of the ggplot options to blank, by returning blank theme elements for the panel grid, panel background, axis title, axis text, axis line and axis ticks.
theme_dendro()
theme_dendro()
Andrie de Vries
This function is from Andrie de Vries's ggdendro package.
The motivation for this fork is the need to add more graphical parameters to the plotted tree. This required a strong mixter of functions from ggdendro and dendextend (to the point that it seemed better to just fork the code into its current form)
unbranch trees and merges the subtree to the parent node.
unbranch(dend, ...) ## Default S3 method: unbranch(dend, ...) ## S3 method for class 'dendrogram' unbranch(dend, branch_becoming_root = 1, new_root_height, ...) ## S3 method for class 'hclust' unbranch(dend, branch_becoming_root = 1, new_root_height, ...) ## S3 method for class 'phylo' unbranch(dend, ...)
unbranch(dend, ...) ## Default S3 method: unbranch(dend, ...) ## S3 method for class 'dendrogram' unbranch(dend, branch_becoming_root = 1, new_root_height, ...) ## S3 method for class 'hclust' unbranch(dend, branch_becoming_root = 1, new_root_height, ...) ## S3 method for class 'phylo' unbranch(dend, ...)
dend |
a dendrogram (or hclust) object |
... |
passed on |
branch_becoming_root |
a numeric choosing the branch of the root which will become the new root (from left to right) |
new_root_height |
the new height of the branch which will become the new root. If the parameter is not given - the height of the original root is used. |
An unbranched dendrogram
hc <- hclust(dist(USArrests[2:9, ]), "com") dend <- as.dendrogram(hc) par(mfrow = c(1, 3)) plot(dend, main = "original tree") plot(unbranch(dend, 1), main = "unbranched tree (left branch)") plot(unbranch(dend, 2), main = "tree without (right branch)")
hc <- hclust(dist(USArrests[2:9, ]), "com") dend <- as.dendrogram(hc) par(mfrow = c(1, 3)) plot(dend, main = "original tree") plot(unbranch(dend, 1), main = "unbranched tree (left branch)") plot(unbranch(dend, 2), main = "tree without (right branch)")
unclass all the nodes in a dendrogram tree. (Helps in cases when a dendrapply function was used wrongly)
unclass_dend(dend, ...)
unclass_dend(dend, ...)
dend |
a dendrogram object |
... |
not used |
The list which was the dendrogram (but without a class)
# define dendrogram object to play with: hc <- hclust(dist(USArrests[1:3, ]), "ave") dend <- as.dendrogram(hc) itself <- function(x) x dend <- dendrapply(dend, itself) unclass(dend) # this only returns a list with # two dendrogram objects inside it. str(dend) # this is a great way to show a dendrogram, # but it doesn't help us understand how the R object is built. str(unclass(dend)) # this is a great way to show a dendrogram, # but it doesn't help us understand how the R object is built. unclass_dend(dend) # this only returns a list # with two dendrogram objects inside it. str(unclass_dend(dend)) # NOW we can more easily understand # how the dendrogram object is structured...
# define dendrogram object to play with: hc <- hclust(dist(USArrests[1:3, ]), "ave") dend <- as.dendrogram(hc) itself <- function(x) x dend <- dendrapply(dend, itself) unclass(dend) # this only returns a list with # two dendrogram objects inside it. str(dend) # this is a great way to show a dendrogram, # but it doesn't help us understand how the R object is built. str(unclass(dend)) # this is a great way to show a dendrogram, # but it doesn't help us understand how the R object is built. unclass_dend(dend) # this only returns a list # with two dendrogram objects inside it. str(unclass_dend(dend)) # NOW we can more easily understand # how the dendrogram object is structured...
One untangle function to rule them all.
This function untangles dendrogram lists (dendlist), Using various heuristics.
untangle(dend1, ...) ## Default S3 method: untangle(dend1, ...) untangle_labels(dend1, dend2, ...) ## S3 method for class 'dendrogram' untangle( dend1, dend2, method = c("labels", "ladderize", "random", "step1side", "step2side", "stepBothSides", "DendSer"), ... ) ## S3 method for class 'dendlist' untangle( dend1, method = c("labels", "ladderize", "random", "step1side", "step2side", "DendSer"), which = c(1L, 2L), ... )
untangle(dend1, ...) ## Default S3 method: untangle(dend1, ...) untangle_labels(dend1, dend2, ...) ## S3 method for class 'dendrogram' untangle( dend1, dend2, method = c("labels", "ladderize", "random", "step1side", "step2side", "stepBothSides", "DendSer"), ... ) ## S3 method for class 'dendlist' untangle( dend1, method = c("labels", "ladderize", "random", "step1side", "step2side", "DendSer"), which = c(1L, 2L), ... )
dend1 |
a dendrogram or a dendlist object |
... |
passed to the relevant untangle function |
dend2 |
A second dendrogram (to untangle against) |
method |
a character indicating the type of untangle heuristic to use. The options are: ("labels", "ladderize", "random", "step1side", "step2side", "stepBothSides", "DendSer") |
which |
an integer vector of length 2, indicating which of the trees in the dendlist object should be plotted |
This function wraps all of the untangle functions, in order to make it easier to find our about (and use) them.
A dendlist, with two trees after they have been untangled.
If the dendlist was originally larger than 2, it will return the original dendlist but with the relevant trees properly rotate.
Tal Galili
tanglegram, untangle_random_search, untangle_step_rotate_1side, untangle_step_rotate_2side, untangle_DendSer, entanglement
## Not run: set.seed(23235) ss <- sample(1:150, 10) dend1 <- iris[ss, -5] %>% dist() %>% hclust("com") %>% as.dendrogram() dend2 <- iris[ss, -5] %>% dist() %>% hclust("sin") %>% as.dendrogram() dend12 <- dendlist(dend1, dend2) dend12 %>% tanglegram() untangle(dend1, dend2, method = "random", R = 5) %>% tanglegram() # it works, and we get something different: set.seed(1234) dend12 %>% untangle(method = "random", R = 5) %>% tanglegram() set.seed(1234) # fixes it completely: dend12 %>% untangle(method = "random", R = 5) %>% untangle(method = "step1") %>% tanglegram() # not good enough dend12 %>% untangle(method = "step1") %>% tanglegram() # not good enough dend12 %>% untangle(method = "step2") %>% tanglegram() # How we might wish to use it: set.seed(12777) dend12 %>% untangle(method = "random", R = 1) %>% untangle(method = "step2") %>% tanglegram() ## End(Not run)
## Not run: set.seed(23235) ss <- sample(1:150, 10) dend1 <- iris[ss, -5] %>% dist() %>% hclust("com") %>% as.dendrogram() dend2 <- iris[ss, -5] %>% dist() %>% hclust("sin") %>% as.dendrogram() dend12 <- dendlist(dend1, dend2) dend12 %>% tanglegram() untangle(dend1, dend2, method = "random", R = 5) %>% tanglegram() # it works, and we get something different: set.seed(1234) dend12 %>% untangle(method = "random", R = 5) %>% tanglegram() set.seed(1234) # fixes it completely: dend12 %>% untangle(method = "random", R = 5) %>% untangle(method = "step1") %>% tanglegram() # not good enough dend12 %>% untangle(method = "step1") %>% tanglegram() # not good enough dend12 %>% untangle(method = "step2") %>% tanglegram() # How we might wish to use it: set.seed(12777) dend12 %>% untangle(method = "random", R = 1) %>% untangle(method = "step2") %>% tanglegram() ## End(Not run)
The function tries to turn the dend into hclust. It then uses the cophenetic distance matrix for optimizing the tree's rotation.
This is a good (and fast) starting point for untangle_step_rotate_2side
untangle_DendSer(dend, ...)
untangle_DendSer(dend, ...)
dend |
An object of class dendlist |
... |
NOT USED |
A dendlist object with ordered dends
DendSer
, DendSer.dendrogram ,
untangle_DendSer, rotate_DendSer
## Not run: set.seed(232) ss <- sample(1:150, 20) dend1 <- iris[ss, -5] %>% dist() %>% hclust("com") %>% as.dendrogram() dend2 <- iris[ss, -5] %>% dist() %>% hclust("sin") %>% as.dendrogram() dend12 <- dendlist(dend1, dend2) # bad solutions dend12 %>% tanglegram() dend12 %>% untangle("step2") %>% tanglegram() dend12 %>% untangle_DendSer() %>% tanglegram() # but the combination is quite awsome: dend12 %>% untangle_DendSer() %>% untangle("step2") %>% tanglegram() ## End(Not run)
## Not run: set.seed(232) ss <- sample(1:150, 20) dend1 <- iris[ss, -5] %>% dist() %>% hclust("com") %>% as.dendrogram() dend2 <- iris[ss, -5] %>% dist() %>% hclust("sin") %>% as.dendrogram() dend12 <- dendlist(dend1, dend2) # bad solutions dend12 %>% tanglegram() dend12 %>% untangle("step2") %>% tanglegram() dend12 %>% untangle_DendSer() %>% tanglegram() # but the combination is quite awsome: dend12 %>% untangle_DendSer() %>% untangle("step2") %>% tanglegram() ## End(Not run)
Searches for two untangled dendrogram by randomlly shuflling them and each time checking if their entanglement was improved.
untangle_random_search( dend1, dend2, R = 100L, L = 1, leaves_matching_method = c("labels", "order"), ... )
untangle_random_search( dend1, dend2, R = 100L, L = 1, leaves_matching_method = c("labels", "order"), ... )
dend1 |
a tree object (of class dendrogram/hclust/phylo). |
dend2 |
a tree object (of class dendrogram/hclust/phylo). |
R |
numeric (default is 100). The number of shuffles to perform. |
L |
the distance norm to use for measuring the distance between the two trees. It can be any positive number, often one will want to use 0, 1, 1.5, 2 (see 'details' for more). It is passed to entanglement. |
leaves_matching_method |
a character scalar passed to entanglement. It can be either "order" or "labels" (default). If using "labels", then we use the labels for matching the leaves order value. And if "order" then we use the old leaves order value for matching the leaves order value. Using "order" is faster, but "labels" is safer. "order" will assume that the original two trees had their labels and order values MATCHED. Hence, it is best to make sure that the trees used here have the same labels and the SAME values matched to these values - and then use "order" (for fastest results). If "order" is used, the function first calls match_order_by_labels in order to make sure that the two trees have their labels synced with their leaves order values. |
... |
not used |
Untangaling two trees is a hard combinatorical problem without a closed form solution. One way for doing it is to run through a random spectrom of options and look for the "best" two trees. This is what this function offers.
A dendlist with two trees with the best entanglement that was found.
tanglegram, match_order_by_labels, entanglement.
## Not run: dend1 <- iris[, -5] %>% dist() %>% hclust("com") %>% as.dendrogram() dend2 <- iris[, -5] %>% dist() %>% hclust("sin") %>% as.dendrogram() tanglegram(dend1, dend2) set.seed(65168) dend12 <- untangle_random_search(dend1, dend2, R = 10) tanglegram(dend12[[1]], dend12[[2]]) tanglegram(dend12) entanglement(dend1, dend2, L = 2) # 0.8894 entanglement(dend12[[1]], dend12[[2]], L = 2) # 0.0998 ## End(Not run)
## Not run: dend1 <- iris[, -5] %>% dist() %>% hclust("com") %>% as.dendrogram() dend2 <- iris[, -5] %>% dist() %>% hclust("sin") %>% as.dendrogram() tanglegram(dend1, dend2) set.seed(65168) dend12 <- untangle_random_search(dend1, dend2, R = 10) tanglegram(dend12[[1]], dend12[[2]]) tanglegram(dend12) entanglement(dend1, dend2, L = 2) # 0.8894 entanglement(dend12[[1]], dend12[[2]], L = 2) # 0.0998 ## End(Not run)
Given a fixed tree and a tree we wish to rotate, this function goes through all of the k number of clusters (from 2 onward), and each time rotates the branch which was introduced in the new k'th cluster. This rotated tree is compared with the fixed tree, and if it has a better entanglement, it will be used for the following iterations.
This is a greedy forward selection algorithm for rotating the tree and looking for a better match.
This is useful for finding good trees for a tanglegram.
untangle_step_rotate_1side( dend1, dend2_fixed, L = 1.5, direction = c("forward", "backward"), k_seq = NULL, dend_heights_per_k, leaves_matching_method = c("labels", "order"), ... )
untangle_step_rotate_1side( dend1, dend2_fixed, L = 1.5, direction = c("forward", "backward"), k_seq = NULL, dend_heights_per_k, leaves_matching_method = c("labels", "order"), ... )
dend1 |
a dendrogram object. The one we will rotate to best fit dend2_fixed. |
dend2_fixed |
a dendrogram object. This one is kept fixed. |
L |
the distance norm to use for measuring the distance between the two trees. It can be any positive number, often one will want to use 0, 1, 1.5, 2 (see 'details' in entanglement). |
direction |
a character scalar, either "forward" (default) or "backward". Impacts the direction of clustering that are tried. Either from 2 and up (in case of "forward"), or from nleaves to down (in case of "backward") If k_seq is not NULL, then it overrides "direction". |
k_seq |
a sequence of k clusters to go through for improving dend1. If NULL (default), then we use the "direction" parameter. |
dend_heights_per_k |
a numeric vector of values which indicate which height will produce which number of clusters (k) |
leaves_matching_method |
a character scalar passed to entanglement. It can be either "order" or "labels" (default). If using "labels", then we use the labels for matching the leaves order value. And if "order" then we use the old leaves order value for matching the leaves order value. Using "order" is faster, but "labels" is safer. "order" will assume that the original two trees had their labels and order values MATCHED. Hence, it is best to make sure that the trees used here have the same labels and the SAME values matched to these values - and then use "order" (for fastest results). If "order" is used, the function first calls match_order_by_labels in order to make sure that the two trees have their labels synced with their leaves order values. |
... |
not used |
A dendlist with 1) dend1 after it was rotated to best fit dend2_fixed. 2) dend2_fixed.
tanglegram, match_order_by_labels, entanglement, flip_leaves, all_couple_rotations_at_k, untangle_step_rotate_2side.
## Not run: dend1 <- USArrests[1:10, ] %>% dist() %>% hclust() %>% as.dendrogram() set.seed(3525) dend2 <- shuffle(dend1) tanglegram(dend1, dend2) entanglement(dend1, dend2, L = 2) # 0.4727 dend2_corrected <- untangle_step_rotate_1side(dend2, dend1)[[1]] tanglegram(dend1, dend2_corrected) # FIXED. entanglement(dend1, dend2_corrected, L = 2) # 0 ## End(Not run)
## Not run: dend1 <- USArrests[1:10, ] %>% dist() %>% hclust() %>% as.dendrogram() set.seed(3525) dend2 <- shuffle(dend1) tanglegram(dend1, dend2) entanglement(dend1, dend2, L = 2) # 0.4727 dend2_corrected <- untangle_step_rotate_1side(dend2, dend1)[[1]] tanglegram(dend1, dend2_corrected) # FIXED. entanglement(dend1, dend2_corrected, L = 2) # 0 ## End(Not run)
This is a greedy forward selection algorithm for rotating the tree and looking for a better match.
This is useful for finding good trees for a tanglegram.
It goes through rotating dend1, then dend2, and so on - until a locally optimal solution is found.
Similar to "step1side", one tree is held fixed and the other tree is rotated. This function goes through all of the k number of clusters (from 2 onward), and each time rotates the branch which was introduced in the new k'th cluster. This rotated tree is compared with the fixed tree, and if it has a better entanglement, it will be used for the following iterations. Once finished the rotated tree is held fixed, and the fixed tree is now rotated. This continues until a local optimal solution is reached.
untangle_step_rotate_2side( dend1, dend2, L = 1.5, direction = c("forward", "backward"), max_n_iterations = 10L, print_times = dendextend_options("warn"), k_seq = NULL, ... )
untangle_step_rotate_2side( dend1, dend2, L = 1.5, direction = c("forward", "backward"), max_n_iterations = 10L, print_times = dendextend_options("warn"), k_seq = NULL, ... )
dend1 |
a dendrogram object. The one we will rotate to best fit dend2. |
dend2 |
a dendrogram object. The one we will rotate to best fit dend1. |
L |
the distance norm to use for measuring the distance between the two trees. It can be any positive number, often one will want to use 0, 1, 1.5, 2 (see 'details' in entanglement). |
direction |
a character scalar, either "forward" (default) or "backward". Impacts the direction of clustering that are tried. Either from 2 and up (in case of "forward"), or from nleaves to down (in case of "backward") If k_seq is not NULL, then it overrides "direction". |
max_n_iterations |
integer. The maximal number of times to switch between optimizing one tree with another. |
print_times |
logical (TRUE), should we print how many times we switched between rotating the two trees? |
k_seq |
a sequence of k clusters to go through for improving dend1. If NULL (default), then we use the "direction" parameter. |
... |
not used |
A list with two dendrograms (dend1/dend2), after they are rotated to best fit one another.
tanglegram, match_order_by_labels, entanglement, flip_leaves, all_couple_rotations_at_k. untangle_step_rotate_1side.
## Not run: dend1 <- USArrests[1:20, ] %>% dist() %>% hclust() %>% as.dendrogram() dend2 <- USArrests[1:20, ] %>% dist() %>% hclust(method = "single") %>% as.dendrogram() set.seed(3525) dend2 <- shuffle(dend2) tanglegram(dend1, dend2, margin_inner = 6.5) entanglement(dend1, dend2, L = 2) # 0.79 dend2_corrected <- untangle_step_rotate_1side(dend2, dend1) tanglegram(dend1, dend2_corrected, margin_inner = 6.5) # Good. entanglement(dend1, dend2_corrected, L = 2) # 0.0067 # it is better, but not perfect. Can we improve it? dend12_corrected <- untangle_step_rotate_2side(dend1, dend2) tanglegram(dend12_corrected[[1]], dend12_corrected[[2]], margin_inner = 6.5) # Better... entanglement(dend12_corrected[[1]], dend12_corrected[[2]], L = 2) # 0.0045 # best combination: dend12_corrected_1 <- untangle_random_search(dend1, dend2) dend12_corrected_2 <- untangle_step_rotate_2side(dend12_corrected_1[[1]], dend12_corrected_1[[2]]) tanglegram(dend12_corrected_2[[1]], dend12_corrected_2[[2]], margin_inner = 6.5) # Better... entanglement(dend12_corrected_2[[1]], dend12_corrected_2[[2]], L = 2) # 0 - PERFECT. ## End(Not run)
## Not run: dend1 <- USArrests[1:20, ] %>% dist() %>% hclust() %>% as.dendrogram() dend2 <- USArrests[1:20, ] %>% dist() %>% hclust(method = "single") %>% as.dendrogram() set.seed(3525) dend2 <- shuffle(dend2) tanglegram(dend1, dend2, margin_inner = 6.5) entanglement(dend1, dend2, L = 2) # 0.79 dend2_corrected <- untangle_step_rotate_1side(dend2, dend1) tanglegram(dend1, dend2_corrected, margin_inner = 6.5) # Good. entanglement(dend1, dend2_corrected, L = 2) # 0.0067 # it is better, but not perfect. Can we improve it? dend12_corrected <- untangle_step_rotate_2side(dend1, dend2) tanglegram(dend12_corrected[[1]], dend12_corrected[[2]], margin_inner = 6.5) # Better... entanglement(dend12_corrected[[1]], dend12_corrected[[2]], L = 2) # 0.0045 # best combination: dend12_corrected_1 <- untangle_random_search(dend1, dend2) dend12_corrected_2 <- untangle_step_rotate_2side(dend12_corrected_1[[1]], dend12_corrected_1[[2]]) tanglegram(dend12_corrected_2[[1]], dend12_corrected_2[[2]], margin_inner = 6.5) # Better... entanglement(dend12_corrected_2[[1]], dend12_corrected_2[[2]], L = 2) # 0 - PERFECT. ## End(Not run)
This is a greedy forward selection algorithm for rotating the tree and looking for a better match.
This is useful for finding good trees for a tanglegram.
It goes through simultaneously rotating branches of dend1 and dend2 until a locally optimal solution is found.
Step 1: The algorithm begins by executing the 'step2side' operation on the pair of dendograms.
Step 2: The algorithm generates new alternative tanglegrams by simultaneously rotating one branch from tree 1 and one branch from tree 2. This rotation is applied to every possible combination of branches between tree 1 and tree 2, resulting in a set of new alternative tanglegrams. The tanglegram with the lowest entanglement is retained.
Step 3: Steps 1 and 2 are repeated until either a locally optimal solution is found or the maximum number of iterations is reached.
untangle_step_rotate_both_side( dend1, dend2, L = 1.5, max_n_iterations = 10L, print_times = dendextend_options("warn"), ... )
untangle_step_rotate_both_side( dend1, dend2, L = 1.5, max_n_iterations = 10L, print_times = dendextend_options("warn"), ... )
dend1 |
a dendrogram object. The one we will rotate to best fit dend2. |
dend2 |
a dendrogram object. The one we will rotate to best fit dend1. |
L |
the distance norm to use for measuring the distance between the two trees. It can be any positive number, often one will want to use 0, 1, 1.5, 2 (see 'details' in entanglement). |
max_n_iterations |
integer. The maximal number of times to switch between optimizing one tree with another. |
print_times |
logical (TRUE), should we print how many times we executed steps 1 and 2? |
... |
not used |
A list with two dendrograms (dend1/dend2), after they are rotated to best fit one another.
Nghia Nguyen, Kurdistan Chawshin, Carl Fredrik Berg, Damiano Varagnolo, Shuffle & untangle: novel untangle methods for solving the tanglegram layout problem, Bioinformatics Advances, Volume 2, Issue 1, 2022, vbac014, https://doi.org/10.1093/bioadv/vbac014
tanglegram, match_order_by_labels, entanglement, flip_leaves, all_couple_rotations_at_k. untangle_step_rotate_1side, untangle_step_rotate_2side.
## Not run: # Figures recreated from 'Shuffle & untangle: novel untangle # methods for solving the tanglegram layout problem' (Nguyen et al. 2022) library(tidyverse) example_labels <- c("Versicolor 90", "Versicolor 54", "Versicolor 81", "Versicolor 63", "Versicolor 72", "Versicolor 99", "Virginica 135", "Virginica 117", "Virginica 126", "Virginica 108", "Virginica 144", "Setosa 27", "Setosa 18", "Setosa 36", "Setosa 45", "Setosa 9") iris_modified <- iris %>% mutate(Row = row_number()) %>% mutate(Label = paste(str_to_title(Species), Row)) %>% filter(Label %in% example_labels) iris_numeric <- iris_modified[,1:4] rownames(iris_numeric) <- iris_modified$Label # Single Linkage vs. Complete Linkage comparison (Fig. 1) dend1 <- as.dendrogram(hclust(dist(iris_numeric), method = "single")) dend2 <- as.dendrogram(hclust(dist(iris_numeric), method = "complete")) tanglegram(dend1, dend2, color_lines = TRUE, lwd = 2, margin_inner = 6) # Good. entanglement(dend1, dend2, L = 2) # 0.207 # The step2side algorithm (Fig. 2) result <- untangle_step_rotate_2side(dend1, dend2) tanglegram(result[[1]], result[[2]], color_lines = TRUE, lwd = 2, margin_inner = 6) # Better... entanglement(result[[1]], result[[2]], L = 2) # 0.185 # The stepBothSides algorithm (Fig. 4) result <- untangle_step_rotate_both_side(dend1, dend2) tanglegram(result[[1]], result[[2]], color_lines = TRUE, lwd = 2, margin_inner = 6, lty = 1) # PERFECT. entanglement(result[[1]], result[[2]], L = 2) # 0.000 ## End(Not run)
## Not run: # Figures recreated from 'Shuffle & untangle: novel untangle # methods for solving the tanglegram layout problem' (Nguyen et al. 2022) library(tidyverse) example_labels <- c("Versicolor 90", "Versicolor 54", "Versicolor 81", "Versicolor 63", "Versicolor 72", "Versicolor 99", "Virginica 135", "Virginica 117", "Virginica 126", "Virginica 108", "Virginica 144", "Setosa 27", "Setosa 18", "Setosa 36", "Setosa 45", "Setosa 9") iris_modified <- iris %>% mutate(Row = row_number()) %>% mutate(Label = paste(str_to_title(Species), Row)) %>% filter(Label %in% example_labels) iris_numeric <- iris_modified[,1:4] rownames(iris_numeric) <- iris_modified$Label # Single Linkage vs. Complete Linkage comparison (Fig. 1) dend1 <- as.dendrogram(hclust(dist(iris_numeric), method = "single")) dend2 <- as.dendrogram(hclust(dist(iris_numeric), method = "complete")) tanglegram(dend1, dend2, color_lines = TRUE, lwd = 2, margin_inner = 6) # Good. entanglement(dend1, dend2, L = 2) # 0.207 # The step2side algorithm (Fig. 2) result <- untangle_step_rotate_2side(dend1, dend2) tanglegram(result[[1]], result[[2]], color_lines = TRUE, lwd = 2, margin_inner = 6) # Better... entanglement(result[[1]], result[[2]], L = 2) # 0.185 # The stepBothSides algorithm (Fig. 4) result <- untangle_step_rotate_both_side(dend1, dend2) tanglegram(result[[1]], result[[2]], color_lines = TRUE, lwd = 2, margin_inner = 6, lty = 1) # PERFECT. entanglement(result[[1]], result[[2]], L = 2) # 0.000 ## End(Not run)
Gives a vector as the number of nodes (nnodes), which gives a TRUE when a node is a leaf.
which_leaf(dend, ...)
which_leaf(dend, ...)
dend |
a dendrogram dend |
... |
ignored. |
A logical vector with the length of nnodes, which gives a TRUE when a node is a leaf.
noded_with_condition, is.leaf, nnodes
## Not run: library(dendextend) # Getting the dend dend set.seed(23235) ss <- sample(1:150, 10) dend <- iris[ss, -5] %>% dist() %>% hclust() %>% as.dendrogram() dend %>% plot() which_leaf(dend) ## End(Not run)
## Not run: library(dendextend) # Getting the dend dend set.seed(23235) ss <- sample(1:150, 10) dend <- iris[ss, -5] %>% dist() %>% hclust() %>% as.dendrogram() dend %>% plot() which_leaf(dend) ## End(Not run)
This function identifies which edge(s) in a tree has group of labels ("tips") in common. By default it only returns the edge (node) with the heighest id.
which_node(dend, labels, max_id = TRUE, ...)
which_node(dend, labels, max_id = TRUE, ...)
dend |
a dendrogram dend |
labels |
a character vector of labels from the tree |
max_id |
logical (TRUE) - if to return only the max id |
... |
ignored. |
An integer with the id(s) of the nodes which includes all of the labels.
noded_with_condition, branches_attr_by_clusters, nnodes, branches_attr_by_labels, get_nodes_attr which.edge
dend <- iris[1:10, -5] %>% dist() %>% hclust() %>% as.dendrogram() %>% set("labels", 1:10) dend %>% plot() which_node(dend, c(1, 2), max_id = FALSE) which_node(dend, c(2, 3), max_id = FALSE) which_node(dend, c(2, 3)) dend %>% plot() the_h <- get_nodes_attr(dend, "height", which_node(dend, c(4, 6))) the_h abline(h = the_h, lty = 2, col = 2) get_nodes_attr(dend, "height", which_node(dend, c(4, 6))) get_nodes_attr(dend, "members", which_node(dend, c(4, 6)))
dend <- iris[1:10, -5] %>% dist() %>% hclust() %>% as.dendrogram() %>% set("labels", 1:10) dend %>% plot() which_node(dend, c(1, 2), max_id = FALSE) which_node(dend, c(2, 3), max_id = FALSE) which_node(dend, c(2, 3)) dend %>% plot() the_h <- get_nodes_attr(dend, "height", which_node(dend, c(4, 6))) the_h abline(h = the_h, lty = 2, col = 2) get_nodes_attr(dend, "height", which_node(dend, c(4, 6))) get_nodes_attr(dend, "members", which_node(dend, c(4, 6)))