PhyNetPy Documentation

Library for the Development and Use of Phylogenetic Network Methods

GeneTrees Module v2.0.0

Gene tree container and analysis utilities including consensus tree construction and concordance factors.

Author:
Mark Kessler
Last Edit:
9/16/25
Source:
GeneTrees.py

Exceptions

exception GeneTreeError(Exception)

Error class for all errors relating to gene trees.

GeneTrees

class GeneTrees

A container for a set of networks that are binary and represent a gene tree. Supports two strategies for grouping gene labels into species: 1. An explicit ``species_gene_mapping`` dict (species -> gene labels). 2. A ``naming_rule`` callable that derives the species key from a gene label string. When neither is provided every gene label is treated as its own species (identity mapping).

Properties

species_gene_mapping -> Optional[Dict[str, List[str]]] property

Return the explicit species-to-gene mapping, if set.

Constructor

__init__(gene_tree_list: Optional[List[Network]] = None, naming_rule: Optional[Callable[..., Any]] = None, species_gene_mapping: Optional[Dict[str, List[str]]] = None) -> None

Wrapper class for a set of networks that represent gene trees.

Parameter Type Description
gene_tree_list list[Network], optional A list of networks, of the binary tree variety. Defaults to None.
naming_rule Callable[..., Any], optional A function
f str -> str that maps a gene label to its species key. Ignored when *species_gene_mapping* is provided. Defaults to None (identity mapping).
species_gene_mapping dict[str, list[str]], optional Explicit mapping from species name to a list of gene/allele labels. Takes priority over *naming_rule* when both are given.

Methods

add(tree: Network) -> None

Add a gene tree to the collection. Any new gene labels that belong to this tree will also be added to the collection of all gene tree leaf labels.

Parameter Type Description
tree Network A network that is a tree, must be binary.
species_gene_mapping(value: Optional[Dict[str, List[str]]]) -> None
mp_allop_map -> Dict[str, List[str]]

Create a subgenome mapping from the stored set of gene trees. Uses the explicit species_gene_mapping if available, otherwise falls back to the naming_rule, and finally to an identity mapping.

Returns: dict[str, list[str]]: subgenome mapping
validate_mapping -> List[str]

Check the explicit species_gene_mapping against the actual taxa present in the gene trees.

Returns: list[str]: A list of warning/error messages. Empty means valid.
cluster_support(include_trivial: bool = False, normalize: bool = True) -> Dict[FrozenSet[str], float]

Aggregate support for all rooted clusters across the gene tree set.

Parameter Type Description
include_trivial bool include size-1 clusters. Defaults to False.
normalize bool return frequencies in [0,1] instead of counts.
Returns: dict[FrozenSet[str], float]: map cluster (as frozenset of leaf labels) to count or frequency.
split_support(normalize: bool = True) -> Dict[FrozenSet[str], float]

Aggregate support for unrooted splits (bipartitions), canonicalized to the smaller side of the split. Note: Only non-trivial splits (both parts size >= 2) are counted.

Parameter Type Description
normalize bool return frequencies in [0,1] instead of counts.
Returns: dict[FrozenSet[str], float]: map smaller-side cluster to count/freq.
support_on_reference(ref_tree: Network, include_trivial: bool = False, normalize: bool = True) -> Dict[FrozenSet[str], float]

Compute support of each rooted cluster present in a reference tree.

Parameter Type Description
ref_tree Network reference binary tree.
include_trivial bool include size-1 clusters. Defaults to False.
normalize bool return frequencies in [0,1].
Returns: dict[FrozenSet[str], float]: map of ref clusters to support.
annotate_reference_support(ref_tree: Network, include_trivial: bool = False, normalize: bool = True) -> None

Annotate the reference tree's internal edges with support values stored in the edge weight field. For edge (u->v), the cluster is the set of leaf descendants of v.

Parameter Type Description
ref_tree Network reference binary tree to annotate in place.
include_trivial bool include size-1 clusters. Defaults to False.
normalize bool frequencies vs counts.
consensus_clusters(threshold: float = 0.5, include_trivial: bool = False) -> List[Set[str]]

Return the set of rooted clusters whose support >= threshold. Note: This does not resolve incompatibilities; it is a simple filter.

Parameter Type Description
threshold float minimum frequency in [0,1].
include_trivial bool include size-1 clusters.
Returns: list[Set[str]]: clusters passing threshold.
rf_distance(ref_tree: Network, normalize: bool = False) -> float

Compute the average Robinson-Foulds distance (rooted, using clusters) between each gene tree and the reference.

Parameter Type Description
ref_tree Network reference tree.
normalize bool if True, divide by the maximum possible RF for the shared taxon set of each comparison.
Returns: float: average RF distance across the gene trees.
build_majority_rule_consensus_tree(threshold: float = 0.5) -> Network

Construct a majority-rule (or threshold) consensus tree from the gene trees. Greedy compatibility: sort clusters by support descending; add if compatible with current set; then realize the set into a (possibly multifurcating) tree over the union of taxa.

gene_concordance_factors(ref_tree: Network) -> Dict[Tuple[str, str], float]

Compute a split-based concordance factor per internal edge of the reference. For each edge (u->v) with split (A | B), count across gene trees the fraction of informative trees (with at least one taxon from A and B) that contain the induced split on their leaf set. Returns a map keyed by (min(child_label), max(child_label)) of the edge child cluster's name representative to gCF. If the child is an internal node, the key uses a canonical name derived from its cluster (string-joined).

astral(astral_jar_path: str, mapping_rule: Optional[Callable[[str], str]] = None, extra_args: Optional[List[str]] = None) -> Network

Infer a species tree using ASTRAL from the stored gene trees. Requires a path to the ASTRAL .jar. We write trees to a temp file and a multi-allele mapping, then call ASTRAL and parse the result. The mapping is resolved in this order: 1. The instance-level ``species_gene_mapping`` (if set). 2. The *mapping_rule* argument passed here. 3. The instance-level ``naming_rule``. 4. Identity mapping (each gene = its own species).

Parameter Type Description
astral_jar_path str Path to the ASTRAL jar file.
mapping_rule Callable, optional Override callable for deriving species from gene labels. Ignored when an explicit species_gene_mapping is available.
extra_args list[str], optional Additional CLI args for ASTRAL.
duplication_loss_summary(species_tree: Network, naming_rule: Optional[Callable[[str], str]] = None) -> Dict[str, Any]

Reconcile each gene tree against a species tree using LCA mapping and report total duplications and losses (parsimony-based estimate). The gene-to-species mapping is resolved in the same priority order as :meth:`_resolve_mapping`: 1. Explicit ``species_gene_mapping``. 2. *naming_rule* argument (if provided and no explicit mapping). 3. Instance ``naming_rule``. 4. Identity mapping.

Returns: dict with ``"totals"`` and ``"per_tree"`` breakdowns.

Module Functions

phynetpy_naming(taxa_name: str) -> str

The default method for sorting taxa labels into groups

Parameter Type Description
taxa_name str a taxa label from a nexus file
Returns: str: a string that is the key for this label
Raises: GeneTreeError: if there is a problem applying the naming rule
external_naming(taxa_name: str) -> str

TODO: Examine the need for this function and remove if not needed.

Navigation

Modules

This Page