Executor Module v1.0.0

Computation backend abstraction layer providing CPU (NumPy) and GPU (CuPy) array operations.

Author:: Mark Kessler
Source:: Executor.py

DeviceType

class DeviceType(Enum)

Supported compute device types.

ExecutorConfig

class ExecutorConfig

Configuration for executor initialization. Attributes: device_type: Target compute device device_id: GPU device ID (for multi-GPU systems) dtype: Default data type for arrays enable_jit: Enable JIT compilation (JAX only) enable_autodiff: Enable automatic differentiation (JAX only) memory_pool_size: GPU memory pool size in bytes (optional) seed: Random seed for reproducibility

Executor

class Executor(ABC)

Abstract base class for computation executors. An Executor provides a unified interface for array/matrix operations across different backends (NumPy, CuPy, JAX). Strategies use executors to perform computations without knowing the underlying implementation. Design Principles: - Executors are stateless computation providers - All array operations go through the executor - Strategies receive an executor at construction time - Executors handle device placement and memory management Example: >>> executor = GPUExecutor(ExecutorConfig(device_type=DeviceType.CUDA)) >>> strategy = SNPStrategy(u=0.1, v=0.2, executor=executor) >>> # Strategy now uses GPU for all computations

Properties

device_name -> str property

Human-readable device name.

Constructor

__init__(config: ExecutorConfig)

Initialize executor with configuration.

Parameter	Type	Description
config		ExecutorConfig specifying device and options

Methods

array(data: Any, dtype: Optional[str] = None) -> ArrayType abstract

Create an array on the executor's device.

Parameter	Type	Description
data		Input data (list, numpy array, etc.)
dtype		Optional dtype override

Returns: Array on the target device

zeros(shape: Tuple[int, ...], dtype: Optional[str] = None) -> ArrayType abstract

Create zero-filled array.

ones(shape: Tuple[int, ...], dtype: Optional[str] = None) -> ArrayType abstract

Create one-filled array.

empty(shape: Tuple[int, ...], dtype: Optional[str] = None) -> ArrayType abstract

Create uninitialized array (faster than zeros).

eye(n: int, dtype: Optional[str] = None) -> ArrayType abstract

Create identity matrix.

diag(v: ArrayType) -> ArrayType abstract

Create diagonal matrix from vector, or extract diagonal.

arange(start: int, stop: int, step: int = 1) -> ArrayType abstract

Create array with evenly spaced values.

exp(x: ArrayType) -> ArrayType abstract

Element-wise exponential.

log(x: ArrayType) -> ArrayType abstract

Element-wise natural logarithm.

sqrt(x: ArrayType) -> ArrayType abstract

Element-wise square root.

abs(x: ArrayType) -> ArrayType abstract

Element-wise absolute value.

sum(x: ArrayType, axis: Optional[int] = None) -> ArrayType abstract

Sum of array elements along axis.

prod(x: ArrayType, axis: Optional[int] = None) -> ArrayType abstract

Product of array elements along axis.

max(x: ArrayType, axis: Optional[int] = None) -> ArrayType abstract

Maximum along axis.

min(x: ArrayType, axis: Optional[int] = None) -> ArrayType abstract

Minimum along axis.

mean(x: ArrayType, axis: Optional[int] = None) -> ArrayType abstract

Mean along axis.

clip(x: ArrayType, min_val: float, max_val: float) -> ArrayType abstract

Clip values to range.

matmul(a: ArrayType, b: ArrayType) -> ArrayType abstract

Matrix multiplication.

einsum(subscripts: str, *operands: ArrayType) -> ArrayType abstract

Einstein summation - critical for phylogenetic computations. Common patterns: - 'ij,sj->si': Transform partials through P matrix - 'si,i->s': Weight by frequencies - 'ij,jk->ik': Matrix multiply - 'bij,bsj->bsi': Batched transform

outer(a: ArrayType, b: ArrayType) -> ArrayType abstract

Outer product of two vectors.

dot(a: ArrayType, b: ArrayType) -> ArrayType abstract

Dot product.

transpose(x: ArrayType, axes: Optional[Tuple[int, ...]] = None) -> ArrayType abstract

Transpose array.

inv(x: ArrayType) -> ArrayType abstract

Matrix inverse.

eig(x: ArrayType) -> Tuple[ArrayType, ArrayType] abstract

Eigendecomposition.

Returns: Tuple of (eigenvalues, eigenvectors)

solve(a: ArrayType, b: ArrayType) -> ArrayType abstract

Solve linear system Ax = b.

expm(Q: ArrayType, t: float) -> ArrayType abstract

Matrix exponential: P(t) = exp(Qt) This is THE critical operation for phylogenetic likelihood. Implementations should optimize this heavily.

Parameter	Type	Description
Q		Rate matrix (n_states, n_states)
t		Branch length (time)

Returns: Transition probability matrix P(t)

batch_expm(Q: ArrayType, t_values: ArrayType) -> ArrayType abstract

Batched matrix exponential for multiple branch lengths. Computes P(t) for all t values simultaneously - critical for GPU.

Parameter	Type	Description
Q		Rate matrix (n_states, n_states)
t_values		Array of branch lengths (n_branches,)

Returns: Transition matrices (n_branches, n_states, n_states)

pruning_step(child_partials: List[ArrayType], transition_matrices: List[ArrayType]) -> ArrayType

Standard Felsenstein pruning step. Computes: L_parent[s] = ∏_c ( ∑_t P_c[s,t] × L_c[t] ) Default implementation using einsum. Subclasses may override for optimized versions.

Parameter	Type	Description
child_partials		List of (n_sites, n_states) arrays
transition_matrices		List of (n_states, n_states) P matrices

Returns: Parent partial likelihoods (n_sites, n_states)

batch_pruning_step(child_partials: ArrayType, transition_matrices: ArrayType) -> ArrayType

Batched pruning step for multiple nodes or trees.

Parameter	Type	Description
child_partials		(n_children, n_sites, n_states)
transition_matrices		(n_children, n_states, n_states)

Returns: Combined partials (n_sites, n_states)

root_likelihood(root_partials: ArrayType, frequencies: ArrayType) -> float

Compute log-likelihood at root.

Parameter	Type	Description
root_partials		(n_sites, n_states)
frequencies		Equilibrium frequencies (n_states,)

Returns: Log-likelihood (scalar)

reshape(x: ArrayType, shape: Tuple[int, ...]) -> ArrayType abstract

Reshape array.

concatenate(arrays: List[ArrayType], axis: int = 0) -> ArrayType abstract

Concatenate arrays along axis.

stack(arrays: List[ArrayType], axis: int = 0) -> ArrayType abstract

Stack arrays along new axis.

split(x: ArrayType, indices: List[int], axis: int = 0) -> List[ArrayType] abstract

Split array at indices.

squeeze(x: ArrayType, axis: Optional[int] = None) -> ArrayType abstract

Remove single-dimensional entries.

expand_dims(x: ArrayType, axis: int) -> ArrayType abstract

Add dimension at axis.

broadcast_to(x: ArrayType, shape: Tuple[int, ...]) -> ArrayType abstract

Broadcast array to shape.

take(x: ArrayType, indices: ArrayType, axis: int = 0) -> ArrayType abstract

Take elements along axis.

where(condition: ArrayType, x: ArrayType, y: ArrayType) -> ArrayType abstract

Element-wise conditional selection.

to_numpy(x: ArrayType) -> np.ndarray abstract

Convert array to numpy (CPU).

from_numpy(x: np.ndarray) -> ArrayType abstract

Convert numpy array to executor's array type.

copy(x: ArrayType) -> ArrayType abstract

Create a copy of array.

synchronize -> None

Synchronize device (wait for pending operations). No-op for CPU. GPU implementations should override.

random_uniform(shape: Tuple[int, ...], low: float = 0.0, high: float = 1.0) -> ArrayType abstract

Generate uniform random values.

random_normal(shape: Tuple[int, ...], mean: float = 0.0, std: float = 1.0) -> ArrayType abstract

Generate normal random values.

random_choice(a: ArrayType, size: int, p: Optional[ArrayType] = None) -> ArrayType abstract

Random choice from array.

__repr__ -> str

CPUExecutor

class CPUExecutor(Executor)

CPU executor using NumPy. This serves as the reference implementation and fallback when GPU is not available.

Constructor

__init__(config: Optional[ExecutorConfig] = None)

Methods

array(data, dtype = None)

zeros(shape, dtype = None)

ones(shape, dtype = None)

empty(shape, dtype = None)

eye(n, dtype = None)

diag(v)

arange(start, stop, step = 1)

exp(x)

log(x)

sqrt(x)

abs(x)

sum(x, axis = None)

prod(x, axis = None)

max(x, axis = None)

min(x, axis = None)

mean(x, axis = None)

clip(x, min_val, max_val)

matmul(a, b)

einsum(subscripts, *operands)

outer(a, b)

dot(a, b)

transpose(x, axes = None)

inv(x)

eig(x)

solve(a, b)

expm(Q, t)

Matrix exponential using eigendecomposition. P(t) = V @ diag(exp(λt)) @ V^{-1}

batch_expm(Q, t_values)

Batch matrix exponential for multiple branch lengths.

reshape(x, shape)

concatenate(arrays, axis = 0)

stack(arrays, axis = 0)

split(x, indices, axis = 0)

squeeze(x, axis = None)

expand_dims(x, axis)

broadcast_to(x, shape)

take(x, indices, axis = 0)

where(condition, x, y)

to_numpy(x)

from_numpy(x)

copy(x)

random_uniform(shape, low = 0.0, high = 1.0)

random_normal(shape, mean = 0.0, std = 1.0)

random_choice(a, size, p = None)

GPUExecutor

class GPUExecutor(Executor)

GPU executor using CuPy (NVIDIA CUDA). Provides significant speedups for: - Large alignment matrices - Batched transition matrix exponentials - Vectorized partial likelihood updates Performance Notes: - Best for n_sites > 1000 (GPU overhead dominates for small data) - Memory transfers are expensive - keep data on GPU - Use batch operations whenever possible Example: >>> config = ExecutorConfig(device_type=DeviceType.CUDA, device_id=0) >>> executor = GPUExecutor(config) >>> strategy = SNPStrategy(u=0.1, v=0.2, executor=executor)

Constructor

__init__(config: Optional[ExecutorConfig] = None)

Methods

array(data, dtype = None)

zeros(shape, dtype = None)

ones(shape, dtype = None)

empty(shape, dtype = None)

eye(n, dtype = None)

diag(v)

arange(start, stop, step = 1)

exp(x)

log(x)

sqrt(x)

abs(x)

sum(x, axis = None)

prod(x, axis = None)

max(x, axis = None)

min(x, axis = None)

mean(x, axis = None)

clip(x, min_val, max_val)

matmul(a, b)

einsum(subscripts, *operands)

outer(a, b)

dot(a, b)

transpose(x, axes = None)

inv(x)

eig(x)

solve(a, b)

expm(Q, t)

GPU-accelerated matrix exponential. Uses eigendecomposition with caching for repeated Q matrices.

batch_expm(Q, t_values)

Fully vectorized batch matrix exponential. This is where GPU really shines - computing many P(t) matrices simultaneously. ┌─────────────────────────────────────────────────────────────┐ │ BATCH MATRIX EXPONENTIAL (GPU) │ │ │ │ t_values: [t₁, t₂,..., tₙ] (n_branches,) | │ │ │ │ ▼ │ │ exp(λ·t): [[e^λ₁t₁, e^λ₂t₁, ...], (n_branches, n_states)│ │ [e^λ₁t₂, e^λ₂t₂, ...], │ │ ...] │ │ │ │ │ ▼ (all done in parallel on GPU) │ │ P(t) = V @ diag(exp(λt)) @ V⁻¹ │ │ (n_branches, n_states, n_states) │ └─────────────────────────────────────────────────────────────┘

pruning_step(child_partials, transition_matrices)

GPU-optimized pruning step. For GPU efficiency, we stack inputs and use batched operations.

batch_pruning_step(child_partials, transition_matrices)

Batched pruning across multiple trees.

Parameter	Type	Description
child_partials		(n_trees, n_children, n_sites, n_states)
transition_matrices		(n_trees, n_children, n_states, n_states)

Returns: (n_trees, n_sites, n_states)

initialize_leaf_partials(sequences, n_states)

GPU-vectorized leaf partial initialization.

Parameter	Type	Description
sequences		(n_leaves, n_sites) integer sequences
n_states		Number of states (2 for SNP, 4 for DNA)

Returns: (n_leaves, n_sites, n_states) one-hot encoded partials

reshape(x, shape)

concatenate(arrays, axis = 0)

stack(arrays, axis = 0)

split(x, indices, axis = 0)

squeeze(x, axis = None)

expand_dims(x, axis)

broadcast_to(x, shape)

take(x, indices, axis = 0)

where(condition, x, y)

to_numpy(x)

Transfer array from GPU to CPU.

from_numpy(x)

Transfer array from CPU to GPU.

copy(x)

synchronize

Wait for all pending GPU operations to complete.

clear_cache

Clear eigendecomposition cache and free GPU memory.

random_uniform(shape, low = 0.0, high = 1.0)

random_normal(shape, mean = 0.0, std = 1.0)

random_choice(a, size, p = None)

Module Functions

create_executor(device: str = 'cpu', **kwargs) -> Executor

Factory function to create an executor.

Parameter	Type	Description
device		One of 'cpu', 'cuda', 'gpu', 'jax' **kwargs: Additional config options

Returns: Appropriate Executor instance Example: >>> executor = create_executor('cuda', device_id=0) >>> executor = create_executor('cpu')

PhyNetPy Documentation

Executor Module v1.0.0

Contents

DeviceType

ExecutorConfig

Executor

Properties

Constructor

Methods

CPUExecutor

Constructor

Methods

GPUExecutor

Constructor

Methods

Module Functions

Navigation

Modules

This Page