BiMarkers Module v1.0.0

SNP (biallelic marker) likelihood computation for phylogenetic networks, with optional GPU acceleration.

Author:: Mark Kessler
Last Edit:: 2/5/26
Source:: BiMarkers.py

Constants

CUPY_AVAILABLE = False

CUPY_RUNTIME_OK = False

NUMBA_CUDA_AVAILABLE = False

CUDA_IMPORTS_AVAILABLE = CUPY_RUNTIME_OK or NUMBA_CUDA_AVAILABLE

GPU_SPECS = _detect_gpu()

THREADS_PER_BLOCK = 256

MAX_BLOCKS = 65535

GPU_THRESHOLD = {0: float('inf'), 1: 20, 2: 12, 3: 8}

GPU_VRAM_SAFETY_FACTOR = 0.8

Exceptions

exception SNPResourceError(RuntimeError)

Raised when the SNP likelihood computation would exceed available hardware resources (GPU VRAM or system RAM).

GPUSpecs

class GPUSpecs

Hardware specs for the user's GPU (detected at import time).

Properties

vram_gb -> float property

Methods

__repr__ -> str

NodeVPI

class NodeVPI

Partial-likelihood vector bundle carried by a model node. Attributes: tensor: Partial-likelihood array of shape ``(S, d1, d2, ...)``, where *S* is the number of sites and each subsequent axis corresponds to an interface. interfaces: Labels for each tensor axis beyond the site axis (length equals ``tensor.ndim - 1``). max_lineages: Maximum lineage count per interface. log_scale: Per-site log rescaling factors, shape ``(S,)``.

SparseMerge

class SparseMerge

Sparse COO representation of the merge tensor M[i, j, k]. For each non-zero entry: stores (i_arr[n], j_arr[n], k_arr[n], coeff_arr[n]). There is exactly one k for each (i, j) pair (since nz = nx + ny is unique). nnz = state_dim(mx) * state_dim(my) (one per valid (i,j) pair)

SparseSplit

class SparseSplit

Sparse COO representation of the split tensor S[i, j, k]. For each non-zero entry: stores (i_arr[n], j_arr[n], k_arr[n], coeff_arr[n]). Multiple (j, k) pairs map to each i (since n = nb + nd has many valid splits).

BiMarkersTransition

class BiMarkersTransition

Class that encodes the probabilities of transitioning from one (n,r) pair to another under a Biallelic model. Includes method for efficiently computing e^Qt Inputs: 1) n-- the total number of samples in the species tree 2) u-- the probability of going from the red allele to the green one 3) v-- the probability of going from the green allele to the red one 4) coal-- the coalescent rate constant, theta Assumption: Matrix indexes start with n=1, r=0, so Q[0][0] is Q(1,0);(1,0) Q Matrix is given by Equation 15 from: David Bryant, Remco Bouckaert, Joseph Felsenstein, Noah A. Rosenberg, Arindam RoyChoudhury, Inferring Species Trees Directly from Biallelic Genetic Markers: Bypassing Gene Trees in a Full Coalescent Analysis, Molecular Biology and Evolution, Volume 29, Issue 8, August 2012, Pages 1...

Constructor

__init__(n: int, u: float, v: float, coal: float) -> None

Initialize the Q matrix

Parameter	Type	Description
n	int	sample count
u	float	probability of a lineage going from red to green
v	float	probability of a lineage going from green to red
coal	float	coal rate, theta.

Methods

expt(t: float = 1) -> np.ndarray

Compute e^(Q*t) efficiently.

Parameter	Type	Description
t	float	time, generally in coalescent units. Optional, defaults to 1, in which case e^Q is computed.

Returns: np.ndarray: e^(Q*t).

cols -> int

return the dimension of the Q matrix

Returns: int: the number of columns in the Q matrix

getQ -> np.ndarray

Retrieve the Q matrix.

Returns: np.ndarray: The Q matrix.

SNPStrategy

class SNPStrategy(Strategy)

Visitor for the SNP model. Supports both CPU (NumPy) and GPU (CuPy) execution. When use_gpu=True, all tensor operations run on the GPU via CuPy's drop-in NumPy API.

Constructor

__init__(q: BiMarkersTransition, u: float, v: float, coal: float, sites: int, max_samples: int, site_slice: tuple[int, int] | None = None, use_gpu: bool = False) -> None

Methods

compute_at_leaf(n: LeafNode) -> NodeVPI

Compute the partial likelihoods at a leaf node. The format for the partial likelihoods is a two dimensional array where the first dimension (rows) is the site index and the second dimension (columns) is the number of samples for this leaf. Uses vectorized indexing for GPU efficiency (no Python per-site loop).

compute_at_internal(n: InternalNode, x: NodeVPI, y: NodeVPI) -> NodeVPI

Compute the partial likelihoods at an internal node.

compute_at_reticulation(n: ReticulationNode, x: NodeVPI) -> NodeVPI

Compute the partial likelihoods at a reticulation node. The split creates two new interfaces (one per parent branch). During the two transition steps, we use global per-site rescaling (axis-order agnostic) to prevent intermediate underflow. After both transitions, we do a final per-vector rescale. GPU-compatible via self.xp.

compute_at_root(n: RootNode, x: NodeVPI, y: NodeVPI) -> NodeVPI

Compute the partial likelihoods at the root node.

compute_at_aggregator(n: RootAggregatorNode, root: NodeVPI) -> None

Compute the partial likelihoods at a root aggregator node. Uses the accumulated per-vector log_scale factors to recover the true log-likelihood without numerical underflow. The root VPI should have shape (S, d) with log_scale shape (S,) after equalization at the root node. GPU-compatible: pi is built on CPU and transferred; final log-likelihood is brought back to CPU as a Python float.

SNPModelVisitor

class SNPModelVisitor(Visitor)

Visitor for the SNP model.

Constructor

__init__(strategy: SNPStrategy) -> None

Methods

visit_leaf(n: LeafNode) -> None

visit_internal(n: InternalNode) -> None

visit_reticulation(n: ReticulationNode) -> None

visit_root(n: RootNode) -> None

visit_aggregator(n: RootAggregatorNode) -> None

visit(n: ModelNode) -> None

Visit a node.

Module Functions

n_to_index(n: int) -> int

Computes the starting index in computing a linear index for an (n,r) pair. Returns the index, if r is 0. i.e n=1 returns 0, since (1,0) is index 0 i.e n=3 returns 5 since (3,0) is preceded by (1,0), (1,1), (2,0), (2,1), and (2,2)

Parameter	Type	Description
n	int	an n value (number of lineages) from an (n,r) pair

Returns: int: starting index for that block of n values

index_to_nr(index: int) -> list[int]

Takes an index from the linear vector and turns it into an (n,r) pair i.e 7 -> [3,2]

Parameter	Type	Description
index	int	the index

Returns: list[int]: a 2-tuple (n,r)

nr_to_index(n: int, r: int) -> int

Takes an (n,r) pair and maps it to a 1d vector index (1,0) -> 0 (1,1) -> 1 (2,0) -> 2 ...

Parameter	Type	Description
n	int	the number of lineages
r	int	the number of red lineages (<= n)

Returns: int: the index into the linear vector, that represents by (n, r)

state_dim(m: int) -> int

Return the state-space dimension for *m* lineages. The dimension equals the number of valid ``(n, r)`` pairs where ``1 <= n <= m`` and ``0 <= r <= n``.

build_split_tensor(m: int, gamma: float) -> np.ndarray

Build the split coefficient tensor S for a reticulation node with max lineages m and inheritance probability gamma. S[i, j, k] = C(n, n_b) * C(r, r_b) * gamma^{n_b} * (1-gamma)^{n_d} where i ↔ (n, r), j ↔ (n_b, r_b), k ↔ (n_d, r_d) and n = n_b + n_d, r = r_b + r_d (zero otherwise). Shape: [dim(m), dim(m), dim(m)]

build_merge_tensor(mx: int, my: int) -> np.ndarray

Build the merge coefficient tensor M for combining two interfaces with max lineages mx and my. M[i, j, k] = C(rz, rx) * C(nz-rz, nx-rx) / C(nz, nx) where i ↔ (nx, rx), j ↔ (ny, ry), k ↔ (nz, rz) and nz = nx + ny, rz = rx + ry (zero otherwise). Used by both Rule 2 (disjoint merge) and Rule 4 (overlapping merge). Shape: [dim(mx), dim(my), dim(mx + my)]

build_sparse_merge(mx: int, my: int) -> SparseMerge

Build sparse COO merge tensor. ~50× less memory and dim_z × faster contraction than the dense version.

Parameter	Type	Description
mx		Max lineages for the left interface.
my		Max lineages for the right interface.

Returns: SparseMerge with coordinate arrays.

build_sparse_split(m: int, gamma: float) -> SparseSplit

Build sparse COO split tensor.

Parameter	Type	Description
m		Max lineages at the node being split.
gamma		Inheritance probability for the left parent branch.

Returns: SparseSplit with coordinate arrays.

get_sparse_merge(mx: int, my: int) -> SparseMerge

Get or build a cached sparse merge tensor.

get_sparse_split(m: int, gamma: float) -> SparseSplit

Get or build a cached sparse split tensor.

get_sparse_merge_gpu(mx: int, my: int) -> SparseMerge

Get or build a GPU-resident cached sparse merge tensor.

get_sparse_split_gpu(m: int, gamma: float) -> SparseSplit

Get or build a GPU-resident cached sparse split tensor.

deduplicate_vpis(vpis: list[NodeVPI]) -> list[NodeVPI]

Remove duplicates by identity (``is``), preserving order. Uses an ``id()``-based set for O(n) performance instead of O(n^2) linear scans.

Parameter	Type	Description
vpis	list[NodeVPI]	VPI objects, possibly with duplicate references.

Returns: list[NodeVPI]: De-duplicated list in original order.

MCMC_BIMARKERS(filename: str, u: float = 0.5, v: float = 0.5, coal: float = 1) -> dict[Network, float]

Given a set of taxa with SNP data, perform a Markov Chain Monte Carlo chain to infer the most likely phylogenetic network that describes the taxa and data.

Parameter	Type	Description
filename	str	string path destination of a nexus file that contains SNP data
u	float, optional	Parameter for the probability of an allele changing from red to green. Defaults to .5.
v	float, optional	Parameter for the probability of an allele changing from green to red. Defaults to .5.
coal	float, optional	Parameter for the rate of coalescence. Defaults to 1.

Returns: dict[Network, float]: The log likelihood (a negative number) of the most probable network, along with the network itself that achieved that score.

SNP_LIKELIHOOD(filename: str, u: float, v: float, coal: float, samples: dict[str, int], max_workers: int = 8, sequential: bool = True, executor: Executor = None) -> float

Given a set of taxa with SNP data and a phylogenetic network, calculate the likelihood of the network given the data using the SNP likelihood algorithm. Automatically selects CPU or GPU execution based on network complexity: - Level-0 (tree): always CPU - Level-1 (1 retic): GPU when taxa > 20 - Level-2 (2 retics): GPU when taxa > 12 Before starting computation, a pre-flight feasibility check predicts peak memory usage and raises SNPResourceError if the computation would exceed GPU VRAM or system RAM.

Parameter	Type	Description
filename	str	string path destination of a nexus file that contains SNP data and a network
u	float	Parameter for the probability of an allele changing from red to green.
v	float	Parameter for the probability of an allele changing from green to red.
coal	float	Parameter for the rate of coalescence.
samples	dict[str, int]	Mapping from taxon names to sample counts.
max_workers	int, optional	The number of workers to use for parallel computation. Only used if sequential is False. Defaults to 8.
sequential	bool, optional	Whether to use sequential computation. Defaults to True.
executor	Executor, optional	The executor to use. If None, one is auto-selected based on network complexity. Defaults to None.

Returns: float: The log likelihood (a negative number) of the network.

Raises: SNPResourceError: If the computation would exceed available hardware resources (GPU VRAM or system RAM).

build_model(filename: str, net: Network) -> Model

Build a SNP model from a data file and network.

PhyNetPy Documentation

BiMarkers Module v1.0.0

Contents

Constants

Exceptions

GPUSpecs

Properties

Methods

NodeVPI

SparseMerge

SparseSplit

BiMarkersTransition

Constructor

Methods

SNPStrategy

Constructor

Methods

SNPModelVisitor

Constructor

Methods

Module Functions

Navigation

Modules

This Page