Library for the Development and Use of Phylogenetic Network Methods
Gene tree container and analysis utilities including consensus tree construction and concordance factors.
Error class for all errors relating to gene trees.
A container for a set of networks that are binary and represent a gene tree. Supports two strategies for grouping gene labels into species: 1. An explicit ``species_gene_mapping`` dict (species -> gene labels). 2. A ``naming_rule`` callable that derives the species key from a gene label string. When neither is provided every gene label is treated as its own species (identity mapping).
Return the explicit species-to-gene mapping, if set.
Wrapper class for a set of networks that represent gene trees.
| Parameter | Type | Description |
|---|---|---|
| gene_tree_list | list[Network], optional | A list of networks, of the binary tree variety. Defaults to None. |
| naming_rule | Callable[..., Any], optional | A function |
| f | str -> str that maps a gene label to its species key. Ignored when *species_gene_mapping* is provided. Defaults to None (identity mapping). | |
| species_gene_mapping | dict[str, list[str]], optional | Explicit mapping from species name to a list of gene/allele labels. Takes priority over *naming_rule* when both are given. |
Add a gene tree to the collection. Any new gene labels that belong to this tree will also be added to the collection of all gene tree leaf labels.
| Parameter | Type | Description |
|---|---|---|
| tree | Network | A network that is a tree, must be binary. |
Create a subgenome mapping from the stored set of gene trees. Uses the explicit species_gene_mapping if available, otherwise falls back to the naming_rule, and finally to an identity mapping.
Check the explicit species_gene_mapping against the actual taxa present in the gene trees.
Aggregate support for all rooted clusters across the gene tree set.
| Parameter | Type | Description |
|---|---|---|
| include_trivial | bool | include size-1 clusters. Defaults to False. |
| normalize | bool | return frequencies in [0,1] instead of counts. |
Aggregate support for unrooted splits (bipartitions), canonicalized to the smaller side of the split. Note: Only non-trivial splits (both parts size >= 2) are counted.
| Parameter | Type | Description |
|---|---|---|
| normalize | bool | return frequencies in [0,1] instead of counts. |
Compute support of each rooted cluster present in a reference tree.
| Parameter | Type | Description |
|---|---|---|
| ref_tree | Network | reference binary tree. |
| include_trivial | bool | include size-1 clusters. Defaults to False. |
| normalize | bool | return frequencies in [0,1]. |
Annotate the reference tree's internal edges with support values stored in the edge weight field. For edge (u->v), the cluster is the set of leaf descendants of v.
| Parameter | Type | Description |
|---|---|---|
| ref_tree | Network | reference binary tree to annotate in place. |
| include_trivial | bool | include size-1 clusters. Defaults to False. |
| normalize | bool | frequencies vs counts. |
Return the set of rooted clusters whose support >= threshold. Note: This does not resolve incompatibilities; it is a simple filter.
| Parameter | Type | Description |
|---|---|---|
| threshold | float | minimum frequency in [0,1]. |
| include_trivial | bool | include size-1 clusters. |
Compute the average Robinson-Foulds distance (rooted, using clusters) between each gene tree and the reference.
| Parameter | Type | Description |
|---|---|---|
| ref_tree | Network | reference tree. |
| normalize | bool | if True, divide by the maximum possible RF for the shared taxon set of each comparison. |
Construct a majority-rule (or threshold) consensus tree from the gene trees. Greedy compatibility: sort clusters by support descending; add if compatible with current set; then realize the set into a (possibly multifurcating) tree over the union of taxa.
Compute a split-based concordance factor per internal edge of the reference. For each edge (u->v) with split (A | B), count across gene trees the fraction of informative trees (with at least one taxon from A and B) that contain the induced split on their leaf set. Returns a map keyed by (min(child_label), max(child_label)) of the edge child cluster's name representative to gCF. If the child is an internal node, the key uses a canonical name derived from its cluster (string-joined).
Infer a species tree using ASTRAL from the stored gene trees. Requires a path to the ASTRAL .jar. We write trees to a temp file and a multi-allele mapping, then call ASTRAL and parse the result. The mapping is resolved in this order: 1. The instance-level ``species_gene_mapping`` (if set). 2. The *mapping_rule* argument passed here. 3. The instance-level ``naming_rule``. 4. Identity mapping (each gene = its own species).
| Parameter | Type | Description |
|---|---|---|
| astral_jar_path | str | Path to the ASTRAL jar file. |
| mapping_rule | Callable, optional | Override callable for deriving species from gene labels. Ignored when an explicit species_gene_mapping is available. |
| extra_args | list[str], optional | Additional CLI args for ASTRAL. |
Reconcile each gene tree against a species tree using LCA mapping and report total duplications and losses (parsimony-based estimate). The gene-to-species mapping is resolved in the same priority order as :meth:`_resolve_mapping`: 1. Explicit ``species_gene_mapping``. 2. *naming_rule* argument (if provided and no explicit mapping). 3. Instance ``naming_rule``. 4. Identity mapping.
The default method for sorting taxa labels into groups
| Parameter | Type | Description |
|---|---|---|
| taxa_name | str | a taxa label from a nexus file |
GeneTreeError: if there is a problem applying the naming ruleTODO: Examine the need for this function and remove if not needed.