Library for the Development and Use of Phylogenetic Network Methods
SNP (biallelic marker) likelihood computation for phylogenetic networks, with optional GPU acceleration.
Raised when the SNP likelihood computation would exceed available hardware resources (GPU VRAM or system RAM).
Hardware specs for the user's GPU (detected at import time).
Partial-likelihood vector bundle carried by a model node. Attributes: tensor: Partial-likelihood array of shape ``(S, d1, d2, ...)``, where *S* is the number of sites and each subsequent axis corresponds to an interface. interfaces: Labels for each tensor axis beyond the site axis (length equals ``tensor.ndim - 1``). max_lineages: Maximum lineage count per interface. log_scale: Per-site log rescaling factors, shape ``(S,)``.
Sparse COO representation of the merge tensor M[i, j, k]. For each non-zero entry: stores (i_arr[n], j_arr[n], k_arr[n], coeff_arr[n]). There is exactly one k for each (i, j) pair (since nz = nx + ny is unique). nnz = state_dim(mx) * state_dim(my) (one per valid (i,j) pair)
Sparse COO representation of the split tensor S[i, j, k]. For each non-zero entry: stores (i_arr[n], j_arr[n], k_arr[n], coeff_arr[n]). Multiple (j, k) pairs map to each i (since n = nb + nd has many valid splits).
Class that encodes the probabilities of transitioning from one (n,r) pair to another under a Biallelic model. Includes method for efficiently computing e^Qt Inputs: 1) n-- the total number of samples in the species tree 2) u-- the probability of going from the red allele to the green one 3) v-- the probability of going from the green allele to the red one 4) coal-- the coalescent rate constant, theta Assumption: Matrix indexes start with n=1, r=0, so Q[0][0] is Q(1,0);(1,0) Q Matrix is given by Equation 15 from: David Bryant, Remco Bouckaert, Joseph Felsenstein, Noah A. Rosenberg, Arindam RoyChoudhury, Inferring Species Trees Directly from Biallelic Genetic Markers: Bypassing Gene Trees in a Full Coalescent Analysis, Molecular Biology and Evolution, Volume 29, Issue 8, August 2012, Pages 1...
Initialize the Q matrix
| Parameter | Type | Description |
|---|---|---|
| n | int | sample count |
| u | float | probability of a lineage going from red to green |
| v | float | probability of a lineage going from green to red |
| coal | float | coal rate, theta. |
Compute e^(Q*t) efficiently.
| Parameter | Type | Description |
|---|---|---|
| t | float | time, generally in coalescent units. Optional, defaults to 1, in which case e^Q is computed. |
return the dimension of the Q matrix
Retrieve the Q matrix.
Visitor for the SNP model. Supports both CPU (NumPy) and GPU (CuPy) execution. When use_gpu=True, all tensor operations run on the GPU via CuPy's drop-in NumPy API.
Compute the partial likelihoods at a leaf node. The format for the partial likelihoods is a two dimensional array where the first dimension (rows) is the site index and the second dimension (columns) is the number of samples for this leaf. Uses vectorized indexing for GPU efficiency (no Python per-site loop).
Compute the partial likelihoods at an internal node.
Compute the partial likelihoods at a reticulation node. The split creates two new interfaces (one per parent branch). During the two transition steps, we use global per-site rescaling (axis-order agnostic) to prevent intermediate underflow. After both transitions, we do a final per-vector rescale. GPU-compatible via self.xp.
Compute the partial likelihoods at the root node.
Compute the partial likelihoods at a root aggregator node. Uses the accumulated per-vector log_scale factors to recover the true log-likelihood without numerical underflow. The root VPI should have shape (S, d) with log_scale shape (S,) after equalization at the root node. GPU-compatible: pi is built on CPU and transferred; final log-likelihood is brought back to CPU as a Python float.
Visitor for the SNP model.
Visit a node.
Computes the starting index in computing a linear index for an (n,r) pair. Returns the index, if r is 0. i.e n=1 returns 0, since (1,0) is index 0 i.e n=3 returns 5 since (3,0) is preceded by (1,0), (1,1), (2,0), (2,1), and (2,2)
| Parameter | Type | Description |
|---|---|---|
| n | int | an n value (number of lineages) from an (n,r) pair |
Takes an index from the linear vector and turns it into an (n,r) pair i.e 7 -> [3,2]
| Parameter | Type | Description |
|---|---|---|
| index | int | the index |
Takes an (n,r) pair and maps it to a 1d vector index (1,0) -> 0 (1,1) -> 1 (2,0) -> 2 ...
| Parameter | Type | Description |
|---|---|---|
| n | int | the number of lineages |
| r | int | the number of red lineages (<= n) |
Return the state-space dimension for *m* lineages. The dimension equals the number of valid ``(n, r)`` pairs where ``1 <= n <= m`` and ``0 <= r <= n``.
Build the split coefficient tensor S for a reticulation node with max lineages m and inheritance probability gamma. S[i, j, k] = C(n, n_b) * C(r, r_b) * gamma^{n_b} * (1-gamma)^{n_d} where i ↔ (n, r), j ↔ (n_b, r_b), k ↔ (n_d, r_d) and n = n_b + n_d, r = r_b + r_d (zero otherwise). Shape: [dim(m), dim(m), dim(m)]
Build the merge coefficient tensor M for combining two interfaces with max lineages mx and my. M[i, j, k] = C(rz, rx) * C(nz-rz, nx-rx) / C(nz, nx) where i ↔ (nx, rx), j ↔ (ny, ry), k ↔ (nz, rz) and nz = nx + ny, rz = rx + ry (zero otherwise). Used by both Rule 2 (disjoint merge) and Rule 4 (overlapping merge). Shape: [dim(mx), dim(my), dim(mx + my)]
Build sparse COO merge tensor. ~50× less memory and dim_z × faster contraction than the dense version.
| Parameter | Type | Description |
|---|---|---|
| mx | Max lineages for the left interface. | |
| my | Max lineages for the right interface. |
Build sparse COO split tensor.
| Parameter | Type | Description |
|---|---|---|
| m | Max lineages at the node being split. | |
| gamma | Inheritance probability for the left parent branch. |
Get or build a cached sparse merge tensor.
Get or build a cached sparse split tensor.
Get or build a GPU-resident cached sparse merge tensor.
Get or build a GPU-resident cached sparse split tensor.
Remove duplicates by identity (``is``), preserving order. Uses an ``id()``-based set for O(n) performance instead of O(n^2) linear scans.
| Parameter | Type | Description |
|---|---|---|
| vpis | list[NodeVPI] | VPI objects, possibly with duplicate references. |
Given a set of taxa with SNP data, perform a Markov Chain Monte Carlo chain to infer the most likely phylogenetic network that describes the taxa and data.
| Parameter | Type | Description |
|---|---|---|
| filename | str | string path destination of a nexus file that contains SNP data |
| u | float, optional | Parameter for the probability of an allele changing from red to green. Defaults to .5. |
| v | float, optional | Parameter for the probability of an allele changing from green to red. Defaults to .5. |
| coal | float, optional | Parameter for the rate of coalescence. Defaults to 1. |
Given a set of taxa with SNP data and a phylogenetic network, calculate the likelihood of the network given the data using the SNP likelihood algorithm. Automatically selects CPU or GPU execution based on network complexity: - Level-0 (tree): always CPU - Level-1 (1 retic): GPU when taxa > 20 - Level-2 (2 retics): GPU when taxa > 12 Before starting computation, a pre-flight feasibility check predicts peak memory usage and raises SNPResourceError if the computation would exceed GPU VRAM or system RAM.
| Parameter | Type | Description |
|---|---|---|
| filename | str | string path destination of a nexus file that contains SNP data and a network |
| u | float | Parameter for the probability of an allele changing from red to green. |
| v | float | Parameter for the probability of an allele changing from green to red. |
| coal | float | Parameter for the rate of coalescence. |
| samples | dict[str, int] | Mapping from taxon names to sample counts. |
| max_workers | int, optional | The number of workers to use for parallel computation. Only used if sequential is False. Defaults to 8. |
| sequential | bool, optional | Whether to use sequential computation. Defaults to True. |
| executor | Executor, optional | The executor to use. If None, one is auto-selected based on network complexity. Defaults to None. |
SNPResourceError: If the computation would exceed available hardware resources (GPU VRAM or system RAM).Build a SNP model from a data file and network.