class GeneTrees
A container for a set of networks that are binary and represent a gene tree.
__init__(self, gene_tree_list, naming_rule) -> None:
Wrapper class for a set of networks that represent gene trees
Inputs:
gene_tree_list (list[Network], optional): A list of networks, of the binary tree variety. Defaults to None.
naming_rule (Callable[..., Any], optional): A function f : str -> str. Defaults to phynetpy_naming.
Returns:
(None): --
add(self, tree) -> None:
Add a gene tree to the collection. Any new gene labels that belong to this tree will also be added to the collection of all gene tree leaf labels.
Inputs:
tree (Network): A network that is a tree, must be binary.
Returns:
(None): --
mp_allop_map(self) -> dict[str, list[str]]:
Create a subgenome mapping from the stored set of gene trees
Inputs:
N/A
Returns:
dict[str, list[str]]: subgenome mapping
class Node
Node class that provides support for managing network constructs like reticulation nodes and other phylogenetic attributes.
__init__(self, name, is_reticulation, attr) -> None:
Initialize a node with a name, attribute mapping, and a hybrid flag.
Inputs:
name (str): A Node label.
is_reticulation (bool, optional): Flag that marks a node as a reticulation node if set to True. Defaults to False.
attr (dict[Union[str, Any], Any], optional): Fill a mapping with any other user defined values. Defaults to an empty dictionary.
Returns:
(None): --
get_attributes(self) -> dict[Union[str, Any], Any]:
Retrieve the attribute mapping.
Inputs:
N/A
Returns:
dict[Union[str, Any], Any]: A storage of key value pairs that correspond to user defined node attributes. Typically keys are string labels, but they can be anything.
set_attributes(self, new_attr) -> None:
Set a node's attributes to a mapping of key labels to attribute values.
Inputs:
new_attr (dict[Union[str, Any], Any]): Attribute storage mapping.
Returns:
(None): --
get_time(self) -> float:
Get the speciation time for this node. Closer to 0 implies a time closer to the origin (the root). A larger time implies a time closer to the present (leaves).
Inputs:
N/A
Returns:
float: Speciation time, typically in coalescent units.
set_time(self, new_t) -> None:
Set the speciation time for this node. The arg 't' must be a non-negative number.
Inputs:
new_t (float): The new speciation/hybridization time for this node.
Returns:
(None): --
to_string(self) -> str:
Create a description of a node and summarize its attributes.
Inputs:
N/A
Returns:
str: A string description of the node.
label(self) -> str:
Returns the name of the node
Inputs:
N/A
Returns:
str: Node label.
set_name(self, new_name) -> None:
Sets the name of the node to new_name.
Inputs:
new_name (str): A new string label for this node.
Returns:
(None): --
set_is_reticulation(self, new_is_retic) -> None:
Sets whether a node is a reticulation Node (or not).
Inputs:
new_is_retic (bool): Hybrid flag. True if this node is a reticulation node, false otherwise
Returns:
(None): --
is_reticulation(self) -> bool:
Retrieves whether a node is a reticulation Node (or not)
Inputs:
N/A
Returns:
bool: True, if this node is a reticulation. False otherwise.
add_attribute(self, key, value, append) -> None:
Put a key and value pair into the node attribute dictionary. If the key is already present, it will overwrite the old value.
Inputs:
key (Any): Attribute key.
value (Any): Attribute value for the key.
append (bool, optional): If True, appends the given value to the existing value for the key. If false, simply replaces. Defaults to False.
Returns:
(None): --
attribute_value(self, key) -> object:
If key is a key in the attributes mapping, then its value will be returned. Otherwise, returns None.
Inputs:
key (Any): A lookup key.
Returns:
object: The value of key, if key is present.
set_seq(self, new_sequence) -> None:
Associate a data sequence with this node, if this node is a leaf in a network.
Inputs:
new_sequence (DataSequence): A data sequence wrapper. Grab from MSA object upon parsing.
Returns:
(None): --
get_seq(self) -> DataSequence:
Gets the data sequence associated with this node.
Inputs:
N/A
Returns:
DataSequence: Data sequence wrapper.
copy(self) -> Node:
Duplicate this node by copying all data into a separate Node object. Useful for crafting copies of networks without having to deep copy.
Inputs:
N/A
Returns:
Node: An equivalent node to this node, with all the same data but technically are not "=="
class Network(Graph)
This class represents a directed (and potentially acyclic) graph containing nodes and edges. An 'Edge' object is a wrapper class for a tuple of two nodes, (a, b), where a and b are Node objects, and the direction of the edge is from a to b (a is b's parent) -- thus (a, b) is NOT the same as (b, a). Use of 'UEdge' objects is strictly prohibited here. Notes and Allowances: 1) You may create cycles -- however we have provided a method to check if this graph object is acyclic. This method is internally called on methods that assume that a network has no cycles, so be mindful of the state of networks that are passed as arguments. 2) You may have multiple roots. Be mindful of whether this graph is connected and what root you wish to operate on. 3) You may end up with floater nodes/edges, ie this may be an unconnected network with multiple connected components. We will provide a method to check for whether your object is one single connected component. We have also provided methods to remove such artifacts.
__init__(self, edges, nodes) -> None:
Initialize a Network object. You may initialize with any combination of edges/nodes, or provide none at all. If you provide an EdgeSet and no nodes, each node present in the EdgeSet *WILL* be added to the network.
Inputs:
edges (EdgeSet, optional): A set of Edges. Defaults to None.
nodes (NodeSet, optional): A set of Nodes. Defaults to None.
Returns:
(None): --
add_edges(self, edges) -> None:
If edges is a list of Edges, then add each Edge to the list of edges. If edges is a singleton Edge then just add to the edge array. Note: Each edge that you attempt to add must be between two nodes that exist in the network. Otherwise, an error will be thrown. Raises: NetworkError: if input edge/edges are malformed in any way
Inputs:
edges (Edge | list[Edge]): a single edge, or multiple.
Returns:
(None): --
remove_nodes(self, node) -> None:
Removes node from the list of nodes. Also prunes all edges from the graph that are connected to the node. Has no effect if node is not in this network.
Inputs:
node (Node): a Node obj
Returns:
(None): --
remove_edge(self, edge) -> None:
Removes edge from the list of edges. Does not delete nodes with no edges Has no effect if 'edge' is not in the graph.
Inputs:
edge (Edge | UEdge | list[Node]): an edge to remove from the graph gamma (float): an inheritance probability from [0,1], if the edge is provided as a list of nodes, and there is an identifiability issue that needs resolving (ie, the edge that needs to be removed is a bubble edge). Optional. Defaults to None.
Returns:
(None): --
get_edge(self, n1, n2, gamma, tag) -> Edge:
Note, that in the event of bubbles, 2 edges will exist with the same source and destination. If this is possible, please supply the inheritance probability of the correct branch. If both edges are known to be identical (gamma = 0.5), then one will be chosen at random.
Inputs:
n1 (Node): parent node
n2 (Node): child node
gamma (float): inheritance probability. Optional. Defaults to None
tag (str): A name/identifiability tag for hybrid edges should both gammas be = .5. Optional. Defaults to None.
Returns:
Edge: the edge containing n1 and n2 and has the proper gamma value (if applicable).
root(self) -> Node:
Return the root of the Network. Phylogenetic networks only have one root, but for generality and practical use, multiple roots have been allowed. To get all roots, should multiple exist, call the function "roots". This function only returns 1 root. Raises: NetworkError: If there are no roots in the network (cycle, or empty) Warning: If there is more than 1 root in the network.
Inputs:
N/A
Returns:
Node: root Node object
roots(self) -> list[Node]:
Return the root(s) of the Network. Phylogenetic networks only have one root, but for generality and practical use, multiple roots have been allowed. Raises: NetworkError: If there are no roots in the network (cycle, or empty)
Inputs:
N/A
Returns:
list[Node]: a list of root Node objects.
get_leaves(self) -> list[Node]:
Returns the set X (a subset of V), the set of all leaves (nodes with out-degree 0). Only returns the leaves that are connected/reachable from the root.
Inputs:
N/A
Returns:
list[Node]: the connected elements of X, in list format.
get_parents(self, node) -> list[Node]:
Returns a list of the parents of a node. There are no assumptions placed on the length of this array. Raises: NetworkError: if the node is not in the network.
Inputs:
node (Node): any node in V.
Returns:
list[Node]: the list of nodes in the network that have the given node as a child node via an edge.
get_children(self, node) -> list[Node]:
Returns a list of the children of a node. There are no assumptions placed on the length of this array. Raises: NetworkError: if the node is not in the network.
Inputs:
node (Node): any node in V.
Returns:
list[Node]: the list of nodes in the network are child nodes of the given node.
clean(self, options) -> None:
All the various ways that the graph can be cleaned up and streamlined while not altering topology or results of algorithms. Algorithm Indeces: 0) Remove nodes that have in/out degree of 0 (floater nodes) 1) Remove a spurious root/root edge (root node with only one out edge) 2) Consolidate all chains of nodes with in/out degree of 1 into 1 edge. Default behavior is to run all three. To not run a certain routine, set the options list at the indeces listed above to False. Ie. To run the first and third algo, use [True, False, True].
Inputs:
options (list[bool], optional): a list of booleans that designate which of the cleaning algorithms to run. Defaults to [True, True, True], aka runs all 3.
Returns:
(None): --
mrca(self, set_of_nodes) -> Node:
Computes the Least Common Ancestor of a set of graph nodes
Inputs:
set_of_nodes (set[Node] | set[str]): A set of Nodes, or node names.
Returns:
Node: The node that is the LCA of the set.
leaf_descendants(self, node) -> set[Node]:
Compute the set of all leaf nodes that are descendants of the parameter node. Uses DFS to find paths to leaves.
Inputs:
node (Node): The node for which to compute leaf children
Returns:
set[Node]: The list of all leaves that descend from 'node'
diff_subtree_edges(self, rng) -> list[Edge]:
Returns 2 random edges such that there does not exist a directed path from one edge source node to the other edge source node.
Inputs:
rng (np.random.Generator): an rng object.
Returns:
list[Edge]: a list of 2 edges such that neither edge is reachable from either starting point.
subgenome_count(self, n) -> int:
Given a node in this graph, return the subgenome count.
Inputs:
n (Node): Any node in the graph. It is an error to input a node that is not in the graph.
Returns:
int: subgenome count
edges_downstream_of_node(self, n) -> list[Edge]:
Returns the set (as a list) of edges that are in the subgraph of a node.
Inputs:
n (Node): A node in a graph.
Returns:
list[Edge]: The set of all edges in the subgraph of n.
edges_upstream_of_node(self, n) -> list[Edge]:
Returns the set (as a list) of edges that are in all paths from the root to this node. Useful in avoiding the creation of cycles when adding edges.
Inputs:
n (Node): A node in a graph.
Returns:
list[Edge]: The set of all edges in the subgraph of n.
subgenome_ct_edges(self, downstream_node, delta, start_node) -> dict[Edge, int]:
Maps edges to their subgenome counts. Raises: NetworkError: If the graph has more than one root to start.
Inputs:
downstream_node (Node, optional): No edges will be included in the map that are in a subgraph of this node. Defaults to None.
delta (float, optional): Only include edges in the mapping that have subgenome counts <= delta. Defaults to math.inf.
start_node (Node, optional): Provide a node only if you don't want to start at the root. Defaults to None.
Returns:
dict[Edge, int]: a map from edges to subgenome counts
edges_to_subgenome_count(self, downstream_node, delta, start_node) -> dict[int, list[Edge]]:
Maps edges to their subgenome counts. Raises: NetworkError: If the graph has more than one root to start.
Inputs:
downstream_node (Node, optional): No edges will be included in the map that are in a subgraph of this node. Defaults to None.
delta (float, optional): Only include edges in the mapping that have subgenome counts <= delta. Defaults to math.inf.
start_node (Node, optional): Provide a node only if you don't want to start at the root. Defaults to None.
Returns:
dict[Edge, int]: a map from edges to subgenome counts
leaf_descendants_all(self) -> dict[Node, set[Node]]:
Map each node in the graph to its set of leaf descendants
Inputs:
N/A
Returns:
dict[Node, set[Node]]: map from graph nodes to their leaf descendants
newick(self) -> str:
Generates the extended newick representation of this Network
Inputs:
N/A
Returns:
str: a newick string
is_acyclic(self) -> bool:
Checks if each of this graph's connected components is acyclic
Inputs:
N/A
Returns:
bool: True if acyclic, False if cyclic.
bfs_dfs(self, start_node, dfs, is_connected, accumulator, accumulated) -> tuple[dict[Node, int], Any]:
General bfs-dfs routine, with the added utility of checking whether or not this graph is made up of multiple connected components.
Inputs:
start_node (Node, optional): Give a node to start the search from. Defaults to None, in which case the search will start at the root.
dfs (bool, optional): Flag that specifies whether to use bfs or dfs. Defaults to False (bfs), if true is passed, will run dfs.
is_connected (bool, optional): Flag that, if enabled, will check for the connected component status. Defaults to False (won't run).
accumulator (Callable, optional): A function that takes the currently searched Node in the graph and does some sort of bookkeeping.
accumulated (Any): Any type of structure that stores the data given by the accumulator function.
Returns:
dict[Node, int]: Mapping from nodes to their distance from the start node.
rootpaths(self, start) -> list[list[Edge]]:
# Get all paths (list of edges) #
Inputs:
start (Node): Start the search from this node #
Returns:
# list[list[Edge]]: a list of all paths (lists of edges) to the root # from 'start' #
subnet(self, retic_node) -> Network:
Make a copy of a subnetwork of this Network, rooted at 'retic_node', with unique node names.
Inputs:
retic_node (Node): A node in this network that is a reticulation node
Returns:
Network: A subnetwork of the DAG being operated on
copy(self) -> tuple[Network, dict[Node, Node]]:
Copy this network into a new network object, also with new node and edge objects.
Inputs:
N/A
Returns:
tuple[Network, dict[Node, Node]]: A carbon copy of this Network, along with a mapping from the old network's nodes to the nodes in the new network.
to_networkx(self) -> nx.Graph:
Generates a networkx object from this network, for viewing purposes!
Inputs:
N/A
Returns:
nx.Graph: _description_
compare_network(self, net, measure) -> float:
Compares the topology of this network compared to another network. The following are options for the distance measure: [tree|tri|cluster|luay|rnbs|apd|normapd|wapd|normwapd] If [tree | tri | cluster] is used, the return value will be the average of the false positive and false negative rates. If [luay] is used, the return value will be the distance between the two networks. If [rnbs | apd | normapd | wapd | normwapd] is used, the return value is the dissimilarity between the two networks.
Inputs:
net (Network): The other network.
measure (str): The key for using one of the available distance measures, must be selected from [tree|tri|cluster|luay|rnbs|apd|normapd|wapd|normwapd]
Returns:
float: Measure of the two networks similarity/distance.
class Edge
Class for directed edges. Instead of being a wrapper for a "set" of member nodes, we can now think about an Edge as a wrapper for a tuple of member nodes (a, b), where now the direction is encoded in the ordering (where a is the source, and b the destination).
Network.Edge.__init__(self, source : Node, destination : Node, length : float | None = None, gamma : float | None = None, weight : float | None = None, tag : str | None = None ) -> None:
An Edge has a source (parent) and a destination (child). Edges in a phylogenetic context are *generally* directed. source -----> destination Raises: ValueError: If @gamma is not a probabilistic value between 0 and 1 (inclusive, but if a hybrid edge pair has 0 and 1 then those hybrid edges may as well not exist). EdgeError: If @length does not match the difference between @source.get_time() and @destination.get_time().
Inputs:
source (Node): The parent node. destination (Node): The child node. length (float, optional): Branch length value. Defaults to None. gamma (float, optional): Inheritance Probability, MUST be from [0,1]. Defaults to None. weight (float, optional): Edge weight, can be any real number. Defaults to None. tag (str, optional): A name tag for identifiability of hybrid edges should each have a gamma of 0.5. Defaults to None.
Returns:
N/A
Network.Edge.src(self) -> Node:
Get the source (parent) node.
Inputs:
N/A
Returns:
Node: source Node obj
Network.Edge.dest(self) -> Node:
Get the dest (child) node.
Inputs:
N/A
Returns:
Node: destination Node obj
Network.Edge.set_tag(self, new_tag : str) -> None:
Set the name/identifiability tag of this edge.
Inputs:
new_tag (str): a unique string identifier.
Returns:
N/A
Network.Edge.get_tag(self) -> str:
Get the name/identifiability tag for this edge.
Inputs:
N/A
Returns:
str: The name/identifiabity tag for this edge.
Network.Edge.set_gamma(self, gamma : float) -> None:
Set the inheritance probability of this edge. Only applicable to hybrid edges, but no warning will be raised if you attempt to set the probability of a non-hybrid edge.
Inputs:
gamma (float): A probability (between 0 and 1 inclusive).
Returns:
N/A
Network.Edge.get_gamma(self) -> float:
Gets the inheritance probability for this edge.
Inputs:
N/A
Returns:
float: A probability (between 0 and 1).
Network.Edge.copy(self, new_src : Node | None = None, new_dest : Node | None = None) -> Edge:
Craft an identical edge to this edge object, just in a new object. Useful in building subnetworks of a network that is in hand.
Inputs:
new_src (Node | None, optional): _description_. Defaults to None. new_dest (Node | None, optional): _description_. Defaults to None.
Returns:
Edge: An identical edge to this one, with respect to the data they hold.
Network.Edge.set_length(self, branch_length : float, warn_times : bool = False, enforce_times : bool = False) -> None:
Set the length of this Edge, and optionally let the user decide if they allow possible discrepancies between Edge lengths and the Edge's src/dest .t (time) attribute values. Raises: EdgeError: if user chooses to enforce that the branch length and .t attributes of src and dest match, but they don't.
Inputs:
branch_length (float): The new length/weight of the Edge warn_times (bool, optional): Option that, if enabled, warns instead of crashes the program in the event that the branch_length parameter does not match the time attributes of the edge's src and dest Node objects. Defaults to False (will not warn). enforce_times (bool, optional): Option that, if enabled, will allow this method to raise an exception if the branch_length parameter does not match the time attributes of the edge's src and dest Node objects. Defaults to False (will not throw).
Returns:
N/A
Network.Edge.get_length(self) -> float:
Get the Edge length.
Inputs:
N/A
Returns:
float: Edge length/branch length
Network.Edge.set_weight(self, new_weight : float) -> None:
Set a weight value for this Edge (not! equivalent to its length in a phylogenetic context, this is for any other potential use for the inclusion of weighting edges).
Inputs:
new_weight (float): A new weight value
Returns:
N/A
Network.Edge.get_weight(self) -> float:
Get the weight value for this Edge (not! equivalent to its length in a phylogenetic context, this is for any other potential use for the inclusion of weighting edges).
Inputs:
N/A
Returns:
float: The weight of this Edge
Network.Edge.to_names(self) -> tuple[str, str]:
Return this Edge as a two-tuple where the first element is the src Node object's label, and the second element is the dest Node object's label
Inputs:
N/A
Returns:
tuple[str, str]: a two-tuple of names, (src name , dest name)
class MUL(Network)
A subclass of a Network, that is a binary tree that results from the transformation of a standard network into a Multilabeled Species Tree.
__init__(self, gene_map) -> None:
Initialize a MUL tree (Multilabeled Species Tree) with a map of species to the genes associates with them, and a rng.
Inputs:
gene_map (dict[str, list[str]]): A subgenome mapping. rng (np.random.Generator): A random seed generator
Returns:
N/A
to_mul(self, net) -> Network:
Creates a (MU)lti-(L)abeled Species Tree from a network Raises: NetworkError: If the network is malformed with regards to ploidy
Inputs:
net (Network): A Network
Returns:
Network: a MUL tree (as a Network obj)
class UEdge
Undirected dge class that is essentially a wrapper class for a set {a, b} where a and b are Node objects. There is no ordering/direction.
__init__(self, n1, n2, length, weight) -> None:
Initialize an undirected Edge. Raises: EdgeError: If 'length' does not match the difference in node times should both n1 and n2 have set time attributes.
Inputs:
n1 (Node): A Node (designated member #1 for efficient retrieval) n2 (Node): A Node (designated member #2 for efficient retrieval) length (float, optional): _description_. Defaults to None. weight (float, optional): _description_. Defaults to None.
Returns:
N/A
n1(self) -> Node:
Retrieves the first node given as a parameter to this undirected edge
Inputs:
N/A
Returns:
Node: The first node given as a parameter to this undirected edge
n2(self) -> Node:
Retrieves the second node given as a parameter to this undirected edge
Inputs:
N/A
Returns:
Node: The second node given as a parameter to this undirected edge
__contains__(self, x) -> bool:
An undirected edge contains two member nodes. IE. e : UEdge n : Node "n in e" gives True or False.
Inputs:
x (Node): A Node
Returns:
bool: True if 'x' is a member node of this UEdge, False if not.
get_length(self) -> float:
Gets the branch length of this edge. If it is not equivalent to the current times of the source and destination nodes, then a warning is raised that the branch lengths are not equivalent, and that either the times are outdated or the branch length field is outdated.
Inputs:
N/A
Returns:
float: branch length.
set_length(self, branch_length, enforce_times) -> None:
Sets the branch length of this edge.
Inputs:
length (float): a branch length value (>0).
Returns:
N/A
copy(self, new_n1, new_n2) -> UEdge:
Make an equivalent UEdge (same as a deepcopy essentially).
Inputs:
new_n1 (Node | None, optional): Provide new nodes in case you'd like finer control over naming. Defaults to None. new_n2 (Node | None, optional): Provide new nodes in case you'd like finer control over naming. Defaults to None.
Returns:
UEdge: A new but functionally equivalent UEdge
to_directed(self, src) -> Edge:
Convert this edge to a directed edge. Raises: EdgeError: If 'src' is not a node in this Edge.
Inputs:
src (Node): Must be one of the Nodes that is in this edge. Sets this Node as the source node, and the other node as the destination
Returns:
Edge: An equivalent edge to this undirected edge, but with a direction now enforced.
get_weight(self) -> float:
Get the weight value for this UEdge (not! equivalent to its length in a phylogenetic context, this is for any other potential use for the inclusion of weighting edges).
Inputs:
N/A
Returns:
float: The weight of this UEdge
set_weight(self, new_weight) -> None:
Set a weight value for this UEdge (not! equivalent to its length in a phylogenetic context, this is for any other potential use for the inclusion of weighting edges).
Inputs:
new_weight (float): A new weight value
Returns:
N/A
to_names(self) -> tuple[str, str]:
Return this UEdge as a two-tuple of the two member nodes' labels. While you should not presume an order, the order will be n1 then n2 if you remember which order you passed them in as during initialization.
Inputs:
N/A
Returns:
tuple[str, str]: a two-tuple of names, (n1 name , n2 name)