Library for the Development and Use of Phylogenetic Network Methods
Computation backend abstraction layer providing CPU (NumPy) and GPU (CuPy) array operations.
Supported compute device types.
Configuration for executor initialization. Attributes: device_type: Target compute device device_id: GPU device ID (for multi-GPU systems) dtype: Default data type for arrays enable_jit: Enable JIT compilation (JAX only) enable_autodiff: Enable automatic differentiation (JAX only) memory_pool_size: GPU memory pool size in bytes (optional) seed: Random seed for reproducibility
Abstract base class for computation executors. An Executor provides a unified interface for array/matrix operations across different backends (NumPy, CuPy, JAX). Strategies use executors to perform computations without knowing the underlying implementation. Design Principles: - Executors are stateless computation providers - All array operations go through the executor - Strategies receive an executor at construction time - Executors handle device placement and memory management Example: >>> executor = GPUExecutor(ExecutorConfig(device_type=DeviceType.CUDA)) >>> strategy = SNPStrategy(u=0.1, v=0.2, executor=executor) >>> # Strategy now uses GPU for all computations
Human-readable device name.
Initialize executor with configuration.
| Parameter | Type | Description |
|---|---|---|
| config | ExecutorConfig specifying device and options |
Create an array on the executor's device.
| Parameter | Type | Description |
|---|---|---|
| data | Input data (list, numpy array, etc.) | |
| dtype | Optional dtype override |
Create zero-filled array.
Create one-filled array.
Create uninitialized array (faster than zeros).
Create identity matrix.
Create diagonal matrix from vector, or extract diagonal.
Create array with evenly spaced values.
Element-wise exponential.
Element-wise natural logarithm.
Element-wise square root.
Element-wise absolute value.
Sum of array elements along axis.
Product of array elements along axis.
Maximum along axis.
Minimum along axis.
Mean along axis.
Clip values to range.
Matrix multiplication.
Einstein summation - critical for phylogenetic computations. Common patterns: - 'ij,sj->si': Transform partials through P matrix - 'si,i->s': Weight by frequencies - 'ij,jk->ik': Matrix multiply - 'bij,bsj->bsi': Batched transform
Outer product of two vectors.
Dot product.
Transpose array.
Matrix inverse.
Eigendecomposition.
Solve linear system Ax = b.
Matrix exponential: P(t) = exp(Qt) This is THE critical operation for phylogenetic likelihood. Implementations should optimize this heavily.
| Parameter | Type | Description |
|---|---|---|
| Q | Rate matrix (n_states, n_states) | |
| t | Branch length (time) |
Batched matrix exponential for multiple branch lengths. Computes P(t) for all t values simultaneously - critical for GPU.
| Parameter | Type | Description |
|---|---|---|
| Q | Rate matrix (n_states, n_states) | |
| t_values | Array of branch lengths (n_branches,) |
Standard Felsenstein pruning step. Computes: L_parent[s] = ∏_c ( ∑_t P_c[s,t] × L_c[t] ) Default implementation using einsum. Subclasses may override for optimized versions.
| Parameter | Type | Description |
|---|---|---|
| child_partials | List of (n_sites, n_states) arrays | |
| transition_matrices | List of (n_states, n_states) P matrices |
Batched pruning step for multiple nodes or trees.
| Parameter | Type | Description |
|---|---|---|
| child_partials | (n_children, n_sites, n_states) | |
| transition_matrices | (n_children, n_states, n_states) |
Compute log-likelihood at root.
| Parameter | Type | Description |
|---|---|---|
| root_partials | (n_sites, n_states) | |
| frequencies | Equilibrium frequencies (n_states,) |
Reshape array.
Concatenate arrays along axis.
Stack arrays along new axis.
Split array at indices.
Remove single-dimensional entries.
Add dimension at axis.
Broadcast array to shape.
Take elements along axis.
Element-wise conditional selection.
Convert array to numpy (CPU).
Convert numpy array to executor's array type.
Create a copy of array.
Synchronize device (wait for pending operations). No-op for CPU. GPU implementations should override.
Generate uniform random values.
Generate normal random values.
Random choice from array.
CPU executor using NumPy. This serves as the reference implementation and fallback when GPU is not available.
Matrix exponential using eigendecomposition. P(t) = V @ diag(exp(λt)) @ V^{-1}
Batch matrix exponential for multiple branch lengths.
GPU executor using CuPy (NVIDIA CUDA). Provides significant speedups for: - Large alignment matrices - Batched transition matrix exponentials - Vectorized partial likelihood updates Performance Notes: - Best for n_sites > 1000 (GPU overhead dominates for small data) - Memory transfers are expensive - keep data on GPU - Use batch operations whenever possible Example: >>> config = ExecutorConfig(device_type=DeviceType.CUDA, device_id=0) >>> executor = GPUExecutor(config) >>> strategy = SNPStrategy(u=0.1, v=0.2, executor=executor)
GPU-accelerated matrix exponential. Uses eigendecomposition with caching for repeated Q matrices.
Fully vectorized batch matrix exponential. This is where GPU really shines - computing many P(t) matrices simultaneously. ┌─────────────────────────────────────────────────────────────┐ │ BATCH MATRIX EXPONENTIAL (GPU) │ │ │ │ t_values: [t₁, t₂,..., tₙ] (n_branches,) | │ │ │ │ ▼ │ │ exp(λ·t): [[e^λ₁t₁, e^λ₂t₁, ...], (n_branches, n_states)│ │ [e^λ₁t₂, e^λ₂t₂, ...], │ │ ...] │ │ │ │ │ ▼ (all done in parallel on GPU) │ │ P(t) = V @ diag(exp(λt)) @ V⁻¹ │ │ (n_branches, n_states, n_states) │ └─────────────────────────────────────────────────────────────┘
GPU-optimized pruning step. For GPU efficiency, we stack inputs and use batched operations.
Batched pruning across multiple trees.
| Parameter | Type | Description |
|---|---|---|
| child_partials | (n_trees, n_children, n_sites, n_states) | |
| transition_matrices | (n_trees, n_children, n_states, n_states) |
GPU-vectorized leaf partial initialization.
| Parameter | Type | Description |
|---|---|---|
| sequences | (n_leaves, n_sites) integer sequences | |
| n_states | Number of states (2 for SNP, 4 for DNA) |
Transfer array from GPU to CPU.
Transfer array from CPU to GPU.
Wait for all pending GPU operations to complete.
Clear eigendecomposition cache and free GPU memory.
Factory function to create an executor.
| Parameter | Type | Description |
|---|---|---|
| device | One of 'cpu', 'cuda', 'gpu', 'jax' **kwargs: Additional config options |