PhyNetPy Graph and Network Tutorial

There are two main objects in PhyNetPy used to represent phylogenetic relationships -- the Graph object and the Network object. The Graph object is both unrooted and undirected, while the Network object has 4 different "levels", but is always rooted and directed. Notice that there is no object level distinction between trees and networks, algorithms in PhyNetPy apply to either rooted and directed networks or unrooted and undirected graphs; under the hood, methods will recognize shortcuts they can take based on topology or network designation (if applicable). This way of dealing with the structures has the benefit of abstracting away the details of implementation and grants the user utmost flexibility.

To create a Network is very easy and straightforward:

                        from PhyNetPy.Network import Network, DiEdge, Node

                        # Initialize as an empty network
                        phynet : Network = Network()

                        # To build the network, simply create node and edge
                        # relationships

                        a : Node = Node("A")
                        b : Node = Node("B")
                        c : Node = Node("C")
                        h0 : Node = Node("#H0")
                        i1 : Node = Node("A-Parent")
                        i2 : Node = Node("C-Parent")
                        i3 : Node = Node("Root")
                  
                        # Parent is listed first, then child, for directed edges
                        e1 : DiEdge = DiEdge(i1, a)
                        e2 : DiEdge = DiEdge(h0, b)
                        e3 : DiEdge = DiEdge(i2, c)
                        e4 : DiEdge = DiEdge(i1, h0)
                        e5 : DiEdge = DiEdge(i2, h0)
                        e6 : DiEdge = DiEdge(i3, i2)
                        e7 : DiEdge = DiEdge(i3, i1)

                        # Add all nodes first, then edges
                        phynet.add_nodes([a, b, c, h0, i1, i2, i3])
                        phynet.add_edges([e1, e2, e3, e4, e5, e6, e7])
                        
                        # Now you have a complete network!! :D 
                        # Will look like this:
                        
                        """
                                  i3
                                 /  \
                                /    \
                               i1     i2
                              /  \   /  \
                             /     h0    \
                            /      |      \
                           a       b       c
                        """

                        # Some built in methods :D

                        # Most recent common ancestor (returns i1)
                        should_be_i1 : Node = phynet.mrca(a, b) 

                        # Leaf Descendant Set
                        should_be_bc : set[Node] = phynet.leaf_descendants(i2)

                        # Newick string representation
                        newick_str : str = phynet.newick()

                        # Cycle check
                        has_cycle : bool = phynet.is_acyclic()

                        # BFS/DFS with custom accumulators. Calling the function
                        # mutates the data variable, but the function by default
                        # returns the distance from each node to the root too.
                        
                        accumulated_v1 : dict = dict()
                        accumulated_v2 : list = list()
                        accumulated_v3 : set = set()
                        accumulated_v4 : Network = Network()
                       
                        def myfunc(net : Network, cur_node : Node, data) -> None:
                            """
                            Counts the number of paths from the network's root to
                            the current node that is being searched from within
                            the bfs/dfs function. Mutates the data parameter!

                            Args:
                                net (Network): The network that cur_node is in.
                                cur_node (Node): The node currently being 
                                                  searched.
                                data (Any) : Any data structure of your choice!
                              
                            """
                            #Get some sort of data from the current state of the
                            #search and add it to the accumulated data.
                            
                            # IE, count the number of paths from the root to the
                            # current node
                            if net.in_degree(cur_node) == 0:
                              data[cur_node] = 1
                            else:
                              # The number of paths to a node from the root is the 
                              # sum of the number of paths to each parent node.
                              data[cur_node] = sum([data[par] for par in net.get_parents(cur_node)])
                          

                        #use dfs kwarg = True to use DFS. This version runs BFS
                        phynet.bfs_dfs(start_node = i3, 
                                       accumulator = myfunc, 
                                       accumulated = accumulated_v1)

                        # accumulated_v1 = {i3 : 1, i2 : 1, i1 : 1, h0 : 2, 
                        #                   b : 2, a : 1, c : 1}

                        # Subnetwork Copying!

                        # Copies the whole network. Faster than running a 
                        # copy.deepcopy(phynet)
                        copy_net : Network = phynet.subnet_copy(start_node = i3)
                        
                        # Copy only a portion
                        copy_subnet : Network = phynet.subnet_copy(start_node = i1)

                        # In this instance, copy_subnet will have a node that has
                        # one parent and one child. Such nodes are topologically
                        # pointless. To remove this artifact...
                        
                        copy_subnet.clean()

                        # copy_subnet will now look like: 
                        """
                            i1
                           /  \
                          /    \
                         a      b 

                         instead of
                             i1
                            /  \
                           /    h0
                          /      \
                         a        b 
                         
                        """

                        # Network copies copy all data and retain the same node 
                        # labels. However, they are different python objects, 
                        # exactly as would be accomplished with a deepcopy/memcopy.

Nodes and Edges are the internal backbones of a Network. The following code snippets show how to initialize and work with these objects.

                        from PhyNetPy.Network import Node, DiEdge, Edge
                        from PhyNetPy.MSA import MSA, SeqRecord

                        #########################
                        ######### NODES #########
                        #########################

                        # The minimum requirement, when initializing nodes, is 
                        # to provide a label. 
                        new_node : Node = Node("my_node")

                        # There are additional fields, however, for which there 
                        # are getters and setters provided:
                        
                        new_node_2 : Node = Node("reticulation", is_reticulation = True)
                        
                        new_node_3 : Node = Node("store anything", attr = {"color" : blue})
                        
                        new_node_4 : Node = Node("speciation time", t = 3.1415)

                        req : SeqRecord = SeqRecord("CCGTAACA", "Dwayne (THE ROCK) Johnson's DNA")
                        new_node_5 : Node = Node("leaf w/ seq data", seq = req)
                        
                        ###############################################
                        ######### DIRECTED EDGES AND NETWORKS #########
                        ###############################################

                        # Edges can be directed or undirected-- most contexts 
                        # require directed edges, however the differences in 
                        # usage are minimal

                        # Let's use the following network structure to illustrate...
                        """
                                  i3
                                 /  \
                                /    \
                               i1     i2
                              /  \   /  \
                             /     h0    \
                            /      |      \
                           a       b       c
                        """

                        a : Node = Node("A")
                        b : Node = Node("B")
                        c : Node = Node("C")
                        h0 : Node = Node("#H0")
                        i1 : Node = Node("A-Parent")
                        i2 : Node = Node("C-Parent")
                        i3 : Node = Node("Root")

                        # In a directed context, the parent is listed first, then child
                        new_edge : DiEdge = DiEdge(i1, a)

                        # Suppose we have:
                        c.set_time(10)
                        i2.set_time(5)
                        
                        # Root time is always 0 in PhyNetPy, and child times are 
                        # always greater than their parent's times.

                        # Then we should do the following:
                        new_edge_2 : DiEdge = DiEdge(i2, c)
                        
                        branch_length : float = new_edge_2.get_length()
                      
                        print(branch_length) 
                        # Will print '5.0'. If both nodes have defined lengths 
                        #prior to initialization, it will be auto calculated!

                        # Times for node "A" and "A-Parent" have not been set. 
                        # Times in general do not need to be set. In this case,
                        # simply do the following:

                        new_edge.set_length(5.0)

                        # Or, provide the length during initialization.
                      
                        new_edge_3 : DiEdge(i3, i1, 5.0)

                        # To access the source (parent) and destination (child) 
                        # of a directed edge, use the following getter methods:
                        src : Node = new_edge.get_src() # "A-Parent" / i1
                        dest : Node = new_edge.get_dest() # "A" / a


                        # Edges also may have weights associated with them 
                        # (such as support values). 
                        new_edge.set_weight(.99)
                        w : float = new_edge.get_weight() 

                        # To check/view an edge's node members, use:
                        edge_node_names : tuple[str] = new_edge.to_names() # ("A-Parent" , "A")
                      
                        ###############################################
                        ######### DIRECTED EDGES AND NETWORKS #########
                        ###############################################

                        """
                                  m
                                 /  \
                                /    \
                               n      o
                              /  \   /  \
                             /     p     \
                            /      |      \
                           q       r       s
                        """

                        m : Node = Node("M")
                        n : Node = Node("N")
                        o : Node = Node("O")
                        p : Node = Node("P")
                        q : Node = Node("Q")
                        r : Node = Node("R")
                        s : Node = Node("S")

                        #Order of inputs for these edges does not matter.
                        e1 : Edge = Edge(q, n)
                        e2 : Edge = Edge(o, s)
                        e3 : Edge = Edge(p, r)
                        e4 : Edge = Edge(n, p)
                        e5 : Edge = Edge(p, o)
                        e6 : Edge = Edge(m, n)
                        e7 : Edge = Edge(o, m)

                        # Undirected Graphs (and therefore also unrooted) have
                        # undirected edges. 

                        # Two ways to initialize (Networks are the same way)
                        
                        # 1) empty constructor 2) use the add_nodes() and add_edges() methods
                        g_1 : Graph = Graph()

                        g_1.add_nodes(m, n, o, p, q, r, s)
                        # g_1.add_nodes([m, n, o, p, q, r, s]) #this works too

                        g_1.add_edges(e1, e2, e3, e4, e5, e6, e7)
                        # g_1.add_edges([e1, e2, e3, e4, e5, e6, e7]) #this works too

                        # Or... build and EdgeSet and NodeSet and pass those as parameters
                        ns : NodeSet = NodeSet(directed = False) #True is default
                        es : EdgeSet = EdgeSet(directed = False) #True is default

                        ns.add