PhyNetPy Documentation

Library for the Development and Use of Phylogenetic Network Methods

GraphUtils Module v1.0.0

Graph and network utility functions for topology analysis, manipulation, and ASCII rendering.

Author:
Mark Kessler
Last Edit:
5/12/26
Source:
GraphUtils.py

Contents

Module Functions

get_all_clusters(net: Network, node: Union[Node, None] = None, include_trivial: bool = False) -> set[frozenset[Node]]

Compile a list of clusters that make up this graph. Ie: for a graph ((A, B)C, D); , set of all clusters is {(A,B), (A,B,C)}. Can optionally allow the trivial leaf clusters in the set as desired.

Parameter Type Description
net Network the network to operate on
node Node For any user call, this should be the root. For internal calls, it is the starting point for search.
include_trivial bool If set to True, includes clusters of size 1. Defaults to False.
Returns: set: A set of all clusters in this graph. Each cluster is represented as a set of either names or nodes.
merge_networks(left: Network, right: Network) -> Network

Combine two networks into a single network object by making the roots of each network the children of a new root node.

Parameter Type Description
left Network Left subnetwork
right Network Right subnetwork
Returns: Network: The resulting network object, containing copies of the nodes and edges of the original networks.
subnet_given_leaves(net: Network, leaf_set: list[Node]) -> Network

Compute the minimally sized subnetwork of a network such that the leaf set of the subnetwork is a subset of the original network's leaf set.

Parameter Type Description
net Network A network
leaf_set list[Node] A set of leaf nodes of the given network
Returns: Network: A new Network object with node and edge copies of the original.
get_all_subtrees(net: Network) -> list[Network]

Generate all possible trees that can be derived from the given network by removing hybrid edges and creating copies with subtrees that start at each non-reticulation node.

Parameter Type Description
net Network A network object
Returns: list[Network]: A list of network objects, each representing a tree that is derived from the original network.
dominant_tree(net: Network) -> Network

Generate the dominant tree from a given network by retaining only the reticulation edges with the highest inheritance probability and removing all other reticulation edges.

Parameter Type Description
net Network A network object
Returns: Network: A new network object representing the dominant tree derived from the original network.
count_reticulations(net: Network) -> int

Count the number of reticulation nodes in a network.

Parameter Type Description
net Network A network object.
Returns: int: Number of nodes with indegree >= 2 (reticulations by flag).
validate_binary(net: Network, require_tree_nodes: bool = True, require_retic_nodes: bool = True) -> tuple[bool, dict[str, list[Node]]]

Validate standard binary constraints. - Tree/internal nodes: indegree=1 (except root with 0), outdegree=2 - Reticulation nodes: indegree=2, outdegree=1 - Leaves: outdegree=0

Parameter Type Description
net Network A network object.
require_tree_nodes bool Enforce tree node constraints.
require_retic_nodes bool Enforce retic node constraints.
Returns: tuple[bool, dict[str, list[Node]]]: (is_valid, violations)
detect_bubbles(net: Network) -> list[tuple[Edge, Edge]]

Detect bubble structures as parallel directed edges (same (src, dest)).

Parameter Type Description
net Network A network object.
Returns: list[tuple[Edge, Edge]]: List of edge pairs representing bubbles.
sample_displayed_tree(net: Network, rng: Union[np.random.Generator, None] = None) -> Network

Sample one displayed tree by selecting, for each reticulation, a single incoming edge with probability proportional to its inheritance probability.

Parameter Type Description
net Network A network object.
rng Union[np.random.Generator, None] Optional RNG; defaults to new generator.
Returns: Network: A cleaned, displayed tree copy of the input network.
contract_edge(net: Network, edge: Edge, keep: str = 'parent') -> None

Contract an internal edge by merging its endpoints.

Parameter Type Description
net Network Network to modify in place.
edge Edge The edge (u->v) to contract.
keep str Which endpoint to keep as the merged node: "parent" keeps u, "child" keeps v.
Raises: NetworkError: On invalid keep value.
reroot(net: Network, new_root: Node) -> Network

Return a copy of the network re-rooted at `new_root` by orienting edges away from the new root using undirected BFS levels. This preserves node/edge attributes; reticulation indegrees may change depending on topology.

Parameter Type Description
net Network A network object.
new_root Node Node in `net` to be the new root.
Returns: Network: Re-rooted copy.
Raises: NetworkError: If `new_root` is not a node in the network.
pairwise_leaf_distance(net: Network, use_branch_lengths: bool = True) -> dict[tuple[str, str], float]

Compute pairwise distances between leaves on the underlying undirected graph. If use_branch_lengths is True, sum edge lengths; otherwise, use unit weights.

Parameter Type Description
net Network A network object.
use_branch_lengths bool Whether to sum edge lengths.
Returns: dict[tuple[str, str], float]: Map from sorted (leaf_i, leaf_j) to distance.
bridges_and_articulations(net: Network) -> tuple[list[tuple[str, str]], list[str]]

Compute bridges and articulation points on the underlying undirected graph using Tarjan's algorithm.

Parameter Type Description
net Network A network object.
Returns: tuple[list[tuple[str, str]], list[str]]: A pair (bridges, articulations) where bridges are (u, v) tuples with u < v by node label, and articulations are node labels.
transitive_reduction(net: Network) -> Network

Compute the transitive reduction of an acyclic network: remove edges (u,v) for which an alternate directed path u -> ... -> v exists.

Parameter Type Description
net Network Input network (must be acyclic).
Returns: Network: A reduced network with identical reachability.
Raises: NetworkError: If the network contains a directed cycle.
blobs(net: Network) -> list[set[Node]]

Return the biconnected components ("blobs") of the underlying undirected graph.

Parameter Type Description
net Network A network object.
Returns: list[set[Node]]: A list where each element is the node set of one biconnected component in the undirected view of the network.
tree_of_blobs(net: Network) -> list[Network]

Decompose a phylogenetic network into its biconnected components (blobs). Each blob is a maximal biconnected subgraph of the underlying undirected graph. Returns a list of Network objects, one per biconnected component, containing copies of the original nodes and directed edges. Node names are preserved but all objects are distinct from the original network.

Parameter Type Description
net Network A phylogenetic network.
Returns: list[Network]: One Network per biconnected component (blob).
level(net: Network) -> int

Compute the level of a phylogenetic network. The level is the maximum number of reticulation nodes contained in any biconnected component ("blob") of the underlying undirected graph. Returns 0 for trees.

Parameter Type Description
net Network Input network.
Returns: int: The network level.
count_displayed_trees(net: Network) -> int

Estimate the number of displayed trees of a network. Computed as the product, over reticulation nodes, of their inbound edge counts (typically 2). For general networks, this is an upper bound when some choices may be incompatible; for level-1 networks this often matches the exact count.

Parameter Type Description
net Network A network object.
Returns: int: Estimated number of displayed trees.
is_tree(net: Network) -> bool

Check whether a network is a tree (i.e., has no reticulation nodes).

Parameter Type Description
net Network A network object.
Returns: bool: True if there are no reticulation nodes, False otherwise.
validate_times_and_lengths(net: Network, tol: float = 1e-05) -> tuple[bool, list[str]]

Validate node times and branch lengths consistency across the network. For each edge (parent u -> child v), verify: - child_time - parent_time ≈ edge.length within tolerance. - child_time >= parent_time (monotonicity).

Parameter Type Description
net Network A network object.
tol float Allowed absolute tolerance for length mismatch.
Returns: tuple[bool, list[str]]: (is_valid, diagnostics) where diagnostics is a list of messages describing any violations or missing values encountered.
ancestors_descendants(net: Network) -> tuple[dict[Node, set[Node]], dict[Node, set[Node]]]

Compute the ancestor and descendant sets for each node in a network.

Parameter Type Description
net Network A network object.
Returns: tuple[dict[Node, set[Node]], dict[Node, set[Node]]]: A pair (ancestors, descendants) where: - ancestors[n] is the set of all nodes with a directed path to n. - descendants[n] is the set of all nodes reachable from n.
topological_order(net: Network) -> list[Node]

Compute a topological order of nodes in an acyclic network. Delegates to :pymeth:`Network.topological_order`.

Parameter Type Description
net Network A network object.
Returns: list[Node]: Nodes in topological order from roots to leaves.
Raises: NetworkError: If the network contains a directed cycle.
break_cycles(net: Network, strategy: str = 'greedy') -> Network

Produce an acyclic copy by removing a small set of cycle-forming edges. Strategy 'greedy' removes the first encountered back edge per DFS pass until the graph is acyclic. This is not guaranteed minimal but is fast and effective for cleaning up accidental cycles.

Parameter Type Description
net Network A network object.
strategy str Currently only 'greedy' is supported.
Returns: Network: An acyclic copy of the input network.
Raises: NetworkError: If an unsupported strategy is specified.
reticulation_parent_pairs(net: Network) -> dict[Node, list[Edge]]

Map each reticulation node to its inbound edges, sorted by gamma.

Parameter Type Description
net Network A network object.
Returns: dict[Node, list[Edge]]: A mapping from each reticulation node to the list of inbound edges sorted by decreasing inheritance probability (gamma).
induced_subnetwork_by_taxa(net: Network, taxa: list[str]) -> Network

Construct the subnetwork induced by a set of leaf labels. The induced subnetwork is formed by taking the MRCA of the target leaves and retaining only those descendant branches that lead to at least one target leaf. Node/edge attributes (gamma, length) are copied where applicable.

Parameter Type Description
net Network A network object.
taxa list[str] List of leaf labels to induce on.
Returns: Network: The induced subnetwork as a new Network object.
Raises: NetworkError: If any leaf name is not found in the network.
extract_topology(newick_str: str) -> str

Extract the topology from a newick string by removing branch lengths, inheritance probabilities, and other metadata. This function handles PhyloNet-style newick strings that may contain: - Branch lengths after single colons (:) - Inheritance probabilities after double colons (::) - Additional metadata after triple colons (:::) - Comments in square brackets [...]

Parameter Type Description
newick_str str A newick string potentially with PhyloNet extensions
Returns: str: A cleaned newick string showing only the topology Examples: >>> extract_topology("(A:0.1,B:0.2)C:0.3;") '(A,B)C;' >>> extract_topology("((A:1.0,B:2.0)#H1:3.0::0.4,C:4.0)D;") '((A,B)#H1,C)D;' >>> extract_topology("(A:1.0[&comment],B:2.0)C;") '(A,B)C;'
extract_topology_from_file(filename: str, output_filename: str = None) -> list[str]

Extract topology from all newick strings in a file.

Parameter Type Description
filename str Path to file containing newick strings.
output_filename str, optional Path to save cleaned newick strings. If None, results are only returned without writing to disk.
Returns: list[str]: List of topology-only newick strings.
Raises: FileNotFoundError: If `filename` does not exist., IOError: If reading or writing the file fails.
is_isomorphic(net1: Network, net2: Network) -> bool

Returns True, if net1 and net2 are topologically identical networks. Even if branch lengths are different, or node labels are different, networks can be isomorphic. ie-- all trees with 3 taxa are isomorphic. quartet trees have only a few different variations. Networks of any size, have infinitely many possible topologies. (A,B)C; is isomorphic with (D,E)F; (A,B),(C,D); is not isomorphic with ((A,B)C, D);

Parameter Type Description
net1 Network A Network object
net2 Network Another Network object
Returns: bool: True if the networks are topologically isomorphic, False otherwise
hardwired_cluster_distance(net1: Network, net2: Network, normalize: bool = False) -> Union[int, float]

Hardwired cluster distance between two phylogenetic networks. The hardwired cluster of a node v is the set of all leaf labels reachable from v by following every directed edge. The distance is the cardinality of the symmetric difference of the two hardwired cluster sets: d_HC(N1, N2) = |HC(N1) triangle HC(N2)| When *normalize* is True the result is divided by |HC(N1) | HC(N2)|, yielding a value in [0, 1]. Reference --------- Huson, Rupp & Scornavacca. *Phylogenetic Networks: Concepts, Algorithms and Applications* (2010), Chapter 7.

Parameter Type Description
net1 First network.
net2 Second network.
normalize Return a value in [0, 1] instead of a raw count.
Returns: The hardwired cluster distance (int, or float when normalized).
softwired_cluster_distance(net1: Network, net2: Network, normalize: bool = False) -> Union[int, float]

Softwired cluster distance between two phylogenetic networks. The softwired cluster set of a network is the union of cluster sets across all of its displayed trees. The distance is the cardinality of the symmetric difference: d_SC(N1, N2) = |SC(N1) triangle SC(N2)| .. note:: This enumerates all 2^k displayed trees where k is the number of reticulations. For networks with many reticulations the cost is exponential. When *normalize* is True the result is divided by |SC(N1) | SC(N2)|, yielding a value in [0, 1]. Reference --------- Huson, Rupp & Scornavacca. *Phylogenetic Networks: Concepts, Algorithms and Applications* (2010), Chapter 7.

Parameter Type Description
net1 First network.
net2 Second network.
normalize Return a value in [0, 1] instead of a raw count.
Returns: The softwired cluster distance (int, or float when normalized).
mu_distance(net1: Network, net2: Network) -> int

Mu-representation distance between two phylogenetic networks. For every node v in a network with leaf set X = {l_1, ..., l_n} the *mu-vector* is mu(v) = (m(v, l_1), ..., m(v, l_n)) where m(v, l_i) is the number of directed paths from v to leaf l_i. The *mu-representation* of a network is the multiset of mu-vectors over all nodes. The distance is the size of the multiset symmetric difference of the two mu-representations. This is a metric on the space of *reduced* phylogenetic networks (networks with no degree-2 tree nodes and no parallel edges). It is also referred to as the *topological distance for reduced networks*. For trees it reduces to the Robinson--Foulds distance. Both networks must share the same leaf label set. Reference --------- Cardona, Llabres, Rossello & Valiente. *Metrics for Phylogenetic Networks I: Generalizations of the Robinson-Foulds Metric*. IEEE/ACM Trans. Comput. Biol. Bioinformatics **6** (2009), 46--61.

Parameter Type Description
net1 First network.
net2 Second network.
Returns: The mu-distance (non-negative integer).
Raises: NetworkError: If the two networks have different leaf label sets.
tripartition_distance(net1: Network, net2: Network, normalize: bool = False) -> Union[int, float]

Tripartition-based distance between two phylogenetic networks. For each tree node u with exactly two children c_1, c_2 the tripartition is (desc(c_1), desc(c_2), X \ (desc(c_1) | desc(c_2))) where X is the full leaf set. The distance is the cardinality of the symmetric difference of the tripartition sets: d_T(N1, N2) = |T(N1) triangle T(N2)| Reticulation nodes (in-degree >= 2) are excluded from tripartition generation. When *normalize* is True the result is divided by |T(N1) | T(N2)|, yielding a value in [0, 1]. Reference --------- Moret *et al.*; Huson, Rupp & Scornavacca. *Phylogenetic Networks: Concepts, Algorithms and Applications* (2010), Chapter 7.

Parameter Type Description
net1 First network.
net2 Second network.
normalize Return a value in [0, 1] instead of a raw count.
Returns: The tripartition distance (int, or float when normalized).
displayed_tree_distance(net1: Network, net2: Network, normalize: bool = False) -> Union[int, float]

Tree-based distance between two phylogenetic networks. Each network's set of unique displayed-tree topologies (identified by cluster representation) is computed. The distance is the cardinality of the symmetric difference: d_DT(N1, N2) = |DT(N1) triangle DT(N2)| .. note:: This enumerates all 2^k displayed trees per network. For networks with many reticulations the cost is exponential. When *normalize* is True the result is divided by |DT(N1) | DT(N2)|, yielding a value in [0, 1].

Parameter Type Description
net1 First network.
net2 Second network.
normalize Return a value in [0, 1] instead of a raw count.
Returns: The displayed-tree distance (int, or float when normalized).
robinson_foulds_distance(net1: Network, net2: Network, normalize: bool = False) -> Union[int, float]

Robinson--Foulds distance between two phylogenetic trees or networks. For rooted trees the RF distance equals the cardinality of the symmetric difference of the non-trivial cluster sets (clusters of size >= 2 and < n, where n is the number of leaves). For networks the function uses hardwired clusters, which is the natural generalization of RF to DAGs. Both inputs must share the same leaf label set. When *normalize* is True the result is divided by |C(N1)| + |C(N2)| (the maximum possible RF for the given cluster counts), yielding a value in [0, 1]. Reference --------- Robinson & Foulds. *Comparison of Phylogenetic Trees*. Mathematical Biosciences **53** (1981), 131--147. Cardona, Llabres, Rossello & Valiente. *Metrics for Phylogenetic Networks I: Generalizations of the Robinson-Foulds Metric*. IEEE/ACM Trans. Comput. Biol. Bioinformatics **6** (2009), 46--61.

Parameter Type Description
net1 First tree / network.
net2 Second tree / network.
normalize Return a value in [0, 1] instead of a raw count.
Returns: The RF distance (int, or float when normalized).
Raises: NetworkError: If the two inputs have different leaf label sets.
average_path_distance(net1: Network, net2: Network, normalize: bool = False) -> float

Average Path Distance (APD) between two phylogenetic networks. For each network a pairwise leaf distance matrix is computed by averaging the path lengths across all displayed trees (uniform weighting). The APD is the L1 norm of the difference between the two matrices: APD(N1, N2) = sum_{i<j} |D_N1(l_i,l_j) - D_N2(l_i,l_j)| When *normalize* is True the sum is divided by the number of leaf pairs C(n, 2). Branch lengths on every edge are required. .. note:: This enumerates all 2^k displayed trees per network. For networks with many reticulations the cost is exponential. Reference --------- Yakici, Ogilvie & Nakhleh. *Phylogenetic Network Dissimilarity Measures that Take Branch Lengths into Account*. RECOMB-CG 2022, LNCS 13234, pp. 86--102.

Parameter Type Description
net1 First network.
net2 Second network.
normalize Divide by C(n, 2) to get a per-pair average.
Returns: The APD (float).
Raises: NetworkError: If the two networks have different leaf label sets.
weighted_average_path_distance(net1: Network, net2: Network, normalize: bool = False) -> float

Weighted Average Path Distance (WAPD) between two networks. Identical to :func:`average_path_distance` except that the average over displayed trees is weighted by each tree's probability (the product of the gamma values of the kept reticulation edges). WAPD(N1, N2) = sum_{i<j} |WD_N1(l_i,l_j) - WD_N2(l_i,l_j)| When *normalize* is True the sum is divided by the number of leaf pairs C(n, 2). Branch lengths and gamma values on every reticulation edge are required. .. note:: This enumerates all 2^k displayed trees per network. For networks with many reticulations the cost is exponential. Reference --------- Yakici, Ogilvie & Nakhleh. *Phylogenetic Network Dissimilarity Measures that Take Branch Lengths into Account*. RECOMB-CG 2022, LNCS 13234, pp. 86--102.

Parameter Type Description
net1 First network.
net2 Second network.
normalize Divide by C(n, 2) to get a per-pair average.
Returns: The WAPD (float).
Raises: NetworkError: If the two networks have different leaf label sets.
ascii(net: Network, show_edge_lengths: bool = False) -> str

Prints out an ascii art depiction of this Network object as a vertical tree/network with the root at the top and leaves at the bottom.

Parameter Type Description
net Network A Network
show_edge_lengths bool If True, display edge lengths. Defaults to False.
Returns: str: The ASCII art representation of the network Example: For newick string "((C,D)A, E)Root;", outputs:: Root / \ A E / \ C D
ascii_extended(net: Network) -> str

Prints a more detailed vertical ASCII art depiction of the Network with extended connecting lines between levels for wide trees.

Parameter Type Description
net Network A Network
Returns: str: The ASCII art representation of the network

Navigation

Modules

This Page