PhyNetPy Documentation

Library for the Development and Use of Phylogenetic Network Methods

BiMarkers Module v1.0.0

SNP (biallelic marker) likelihood computation for phylogenetic networks, with optional GPU acceleration.

Author:
Mark Kessler
Last Edit:
2/5/26
Source:
BiMarkers.py

Constants

CUPY_AVAILABLE = False
CUPY_RUNTIME_OK = False
NUMBA_CUDA_AVAILABLE = False
CUDA_IMPORTS_AVAILABLE = CUPY_RUNTIME_OK or NUMBA_CUDA_AVAILABLE
GPU_SPECS = _detect_gpu()
THREADS_PER_BLOCK = 256
MAX_BLOCKS = 65535
GPU_THRESHOLD = {0: float('inf'), 1: 20, 2: 12, 3: 8}
GPU_VRAM_SAFETY_FACTOR = 0.8

Exceptions

exception SNPResourceError(RuntimeError)

Raised when the SNP likelihood computation would exceed available hardware resources (GPU VRAM or system RAM).

GPUSpecs

class GPUSpecs

Hardware specs for the user's GPU (detected at import time).

Properties

vram_gb -> float property

Methods

__repr__ -> str

NodeVPI

class NodeVPI

Partial-likelihood vector bundle carried by a model node. Attributes: tensor: Partial-likelihood array of shape ``(S, d1, d2, ...)``, where *S* is the number of sites and each subsequent axis corresponds to an interface. interfaces: Labels for each tensor axis beyond the site axis (length equals ``tensor.ndim - 1``). max_lineages: Maximum lineage count per interface. log_scale: Per-site log rescaling factors, shape ``(S,)``.

SparseMerge

class SparseMerge

Sparse COO representation of the merge tensor M[i, j, k]. For each non-zero entry: stores (i_arr[n], j_arr[n], k_arr[n], coeff_arr[n]). There is exactly one k for each (i, j) pair (since nz = nx + ny is unique). nnz = state_dim(mx) * state_dim(my) (one per valid (i,j) pair)

SparseSplit

class SparseSplit

Sparse COO representation of the split tensor S[i, j, k]. For each non-zero entry: stores (i_arr[n], j_arr[n], k_arr[n], coeff_arr[n]). Multiple (j, k) pairs map to each i (since n = nb + nd has many valid splits).

BiMarkersTransition

class BiMarkersTransition

Class that encodes the probabilities of transitioning from one (n,r) pair to another under a Biallelic model. Includes method for efficiently computing e^Qt Inputs: 1) n-- the total number of samples in the species tree 2) u-- the probability of going from the red allele to the green one 3) v-- the probability of going from the green allele to the red one 4) coal-- the coalescent rate constant, theta Assumption: Matrix indexes start with n=1, r=0, so Q[0][0] is Q(1,0);(1,0) Q Matrix is given by Equation 15 from: David Bryant, Remco Bouckaert, Joseph Felsenstein, Noah A. Rosenberg, Arindam RoyChoudhury, Inferring Species Trees Directly from Biallelic Genetic Markers: Bypassing Gene Trees in a Full Coalescent Analysis, Molecular Biology and Evolution, Volume 29, Issue 8, August 2012, Pages 1...

Constructor

__init__(n: int, u: float, v: float, coal: float) -> None

Initialize the Q matrix

Parameter Type Description
n int sample count
u float probability of a lineage going from red to green
v float probability of a lineage going from green to red
coal float coal rate, theta.

Methods

expt(t: float = 1) -> np.ndarray

Compute e^(Q*t) efficiently.

Parameter Type Description
t float time, generally in coalescent units. Optional, defaults to 1, in which case e^Q is computed.
Returns: np.ndarray: e^(Q*t).
cols -> int

return the dimension of the Q matrix

Returns: int: the number of columns in the Q matrix
getQ -> np.ndarray

Retrieve the Q matrix.

Returns: np.ndarray: The Q matrix.

SNPStrategy

class SNPStrategy(Strategy)

Visitor for the SNP model. Supports both CPU (NumPy) and GPU (CuPy) execution. When use_gpu=True, all tensor operations run on the GPU via CuPy's drop-in NumPy API.

Constructor

__init__(q: BiMarkersTransition, u: float, v: float, coal: float, sites: int, max_samples: int, site_slice: tuple[int, int] | None = None, use_gpu: bool = False) -> None

Methods

compute_at_leaf(n: LeafNode) -> NodeVPI

Compute the partial likelihoods at a leaf node. The format for the partial likelihoods is a two dimensional array where the first dimension (rows) is the site index and the second dimension (columns) is the number of samples for this leaf. Uses vectorized indexing for GPU efficiency (no Python per-site loop).

compute_at_internal(n: InternalNode, x: NodeVPI, y: NodeVPI) -> NodeVPI

Compute the partial likelihoods at an internal node.

compute_at_reticulation(n: ReticulationNode, x: NodeVPI) -> NodeVPI

Compute the partial likelihoods at a reticulation node. The split creates two new interfaces (one per parent branch). During the two transition steps, we use global per-site rescaling (axis-order agnostic) to prevent intermediate underflow. After both transitions, we do a final per-vector rescale. GPU-compatible via self.xp.

compute_at_root(n: RootNode, x: NodeVPI, y: NodeVPI) -> NodeVPI

Compute the partial likelihoods at the root node.

compute_at_aggregator(n: RootAggregatorNode, root: NodeVPI) -> None

Compute the partial likelihoods at a root aggregator node. Uses the accumulated per-vector log_scale factors to recover the true log-likelihood without numerical underflow. The root VPI should have shape (S, d) with log_scale shape (S,) after equalization at the root node. GPU-compatible: pi is built on CPU and transferred; final log-likelihood is brought back to CPU as a Python float.

SNPModelVisitor

class SNPModelVisitor(Visitor)

Visitor for the SNP model.

Constructor

__init__(strategy: SNPStrategy) -> None

Methods

visit_leaf(n: LeafNode) -> None
visit_internal(n: InternalNode) -> None
visit_reticulation(n: ReticulationNode) -> None
visit_root(n: RootNode) -> None
visit_aggregator(n: RootAggregatorNode) -> None
visit(n: ModelNode) -> None

Visit a node.

Module Functions

n_to_index(n: int) -> int

Computes the starting index in computing a linear index for an (n,r) pair. Returns the index, if r is 0. i.e n=1 returns 0, since (1,0) is index 0 i.e n=3 returns 5 since (3,0) is preceded by (1,0), (1,1), (2,0), (2,1), and (2,2)

Parameter Type Description
n int an n value (number of lineages) from an (n,r) pair
Returns: int: starting index for that block of n values
index_to_nr(index: int) -> list[int]

Takes an index from the linear vector and turns it into an (n,r) pair i.e 7 -> [3,2]

Parameter Type Description
index int the index
Returns: list[int]: a 2-tuple (n,r)
nr_to_index(n: int, r: int) -> int

Takes an (n,r) pair and maps it to a 1d vector index (1,0) -> 0 (1,1) -> 1 (2,0) -> 2 ...

Parameter Type Description
n int the number of lineages
r int the number of red lineages (<= n)
Returns: int: the index into the linear vector, that represents by (n, r)
state_dim(m: int) -> int

Return the state-space dimension for *m* lineages. The dimension equals the number of valid ``(n, r)`` pairs where ``1 <= n <= m`` and ``0 <= r <= n``.

build_split_tensor(m: int, gamma: float) -> np.ndarray

Build the split coefficient tensor S for a reticulation node with max lineages m and inheritance probability gamma. S[i, j, k] = C(n, n_b) * C(r, r_b) * gamma^{n_b} * (1-gamma)^{n_d} where i ↔ (n, r), j ↔ (n_b, r_b), k ↔ (n_d, r_d) and n = n_b + n_d, r = r_b + r_d (zero otherwise). Shape: [dim(m), dim(m), dim(m)]

build_merge_tensor(mx: int, my: int) -> np.ndarray

Build the merge coefficient tensor M for combining two interfaces with max lineages mx and my. M[i, j, k] = C(rz, rx) * C(nz-rz, nx-rx) / C(nz, nx) where i ↔ (nx, rx), j ↔ (ny, ry), k ↔ (nz, rz) and nz = nx + ny, rz = rx + ry (zero otherwise). Used by both Rule 2 (disjoint merge) and Rule 4 (overlapping merge). Shape: [dim(mx), dim(my), dim(mx + my)]

build_sparse_merge(mx: int, my: int) -> SparseMerge

Build sparse COO merge tensor. ~50× less memory and dim_z × faster contraction than the dense version.

Parameter Type Description
mx Max lineages for the left interface.
my Max lineages for the right interface.
Returns: SparseMerge with coordinate arrays.
build_sparse_split(m: int, gamma: float) -> SparseSplit

Build sparse COO split tensor.

Parameter Type Description
m Max lineages at the node being split.
gamma Inheritance probability for the left parent branch.
Returns: SparseSplit with coordinate arrays.
get_sparse_merge(mx: int, my: int) -> SparseMerge

Get or build a cached sparse merge tensor.

get_sparse_split(m: int, gamma: float) -> SparseSplit

Get or build a cached sparse split tensor.

get_sparse_merge_gpu(mx: int, my: int) -> SparseMerge

Get or build a GPU-resident cached sparse merge tensor.

get_sparse_split_gpu(m: int, gamma: float) -> SparseSplit

Get or build a GPU-resident cached sparse split tensor.

deduplicate_vpis(vpis: list[NodeVPI]) -> list[NodeVPI]

Remove duplicates by identity (``is``), preserving order. Uses an ``id()``-based set for O(n) performance instead of O(n^2) linear scans.

Parameter Type Description
vpis list[NodeVPI] VPI objects, possibly with duplicate references.
Returns: list[NodeVPI]: De-duplicated list in original order.
MCMC_BIMARKERS(filename: str, u: float = 0.5, v: float = 0.5, coal: float = 1) -> dict[Network, float]

Given a set of taxa with SNP data, perform a Markov Chain Monte Carlo chain to infer the most likely phylogenetic network that describes the taxa and data.

Parameter Type Description
filename str string path destination of a nexus file that contains SNP data
u float, optional Parameter for the probability of an allele changing from red to green. Defaults to .5.
v float, optional Parameter for the probability of an allele changing from green to red. Defaults to .5.
coal float, optional Parameter for the rate of coalescence. Defaults to 1.
Returns: dict[Network, float]: The log likelihood (a negative number) of the most probable network, along with the network itself that achieved that score.
SNP_LIKELIHOOD(filename: str, u: float, v: float, coal: float, samples: dict[str, int], max_workers: int = 8, sequential: bool = True, executor: Executor = None) -> float

Given a set of taxa with SNP data and a phylogenetic network, calculate the likelihood of the network given the data using the SNP likelihood algorithm. Automatically selects CPU or GPU execution based on network complexity: - Level-0 (tree): always CPU - Level-1 (1 retic): GPU when taxa > 20 - Level-2 (2 retics): GPU when taxa > 12 Before starting computation, a pre-flight feasibility check predicts peak memory usage and raises SNPResourceError if the computation would exceed GPU VRAM or system RAM.

Parameter Type Description
filename str string path destination of a nexus file that contains SNP data and a network
u float Parameter for the probability of an allele changing from red to green.
v float Parameter for the probability of an allele changing from green to red.
coal float Parameter for the rate of coalescence.
samples dict[str, int] Mapping from taxon names to sample counts.
max_workers int, optional The number of workers to use for parallel computation. Only used if sequential is False. Defaults to 8.
sequential bool, optional Whether to use sequential computation. Defaults to True.
executor Executor, optional The executor to use. If None, one is auto-selected based on network complexity. Defaults to None.
Returns: float: The log likelihood (a negative number) of the network.
Raises: SNPResourceError: If the computation would exceed available hardware resources (GPU VRAM or system RAM).
build_model(filename: str, net: Network) -> Model

Build a SNP model from a data file and network.

Navigation

Modules

This Page