PhyNetPy Documentation

Library for the Development and Use of Phylogenetic Network Methods

SNPSimulator Module v1.1.0

SNP data simulator for phylogenetic networks using a forward-in-time 2-state CTMC.

Author:
Mark Kessler
Last Edit:
2/12/26
Source:
SNPSimulator.py

SimulatedSNPData

class SimulatedSNPData

Container for simulated SNP data and the generating network. Attributes: network (Network): The phylogenetic network used for simulation. data (dict[str, list[int]]): Mapping from taxon name to a list of red allele counts per site. n_taxa (int): Number of leaf taxa. n_sites (int): Number of simulated SNP sites. samples (dict[str, int]): Number of sampled individuals per taxon. u (float): Mutation rate from red to green allele. v (float): Mutation rate from green to red allele. coal (float): Coalescent rate parameter (theta). seed (int): Random seed used for simulation.

Methods

taxa_names -> list[str]

Return sorted list of taxon names.

write_nexus(filepath: str) -> None

Write the simulated data and network to a NEXUS file compatible with SNP_LIKELIHOOD. The output file contains: - TAXA block with taxon labels - DATA block with SNP site patterns (0/1 per site for samples=1) - TREES block with the network in extended newick format (including branch lengths and gamma annotations)

Parameter Type Description
filepath str Path where the nexus file will be written.
Returns: None

Module Functions

random_network(n: int, level: int = 1, birth_rate: float = 1.0, gamma_range: tuple[float, float] = (0.2, 0.8), seed: int | None = None) -> Network

Generate a random phylogenetic network with n taxa and a given number of reticulations. First generates a random tree using a Yule (pure-birth) process, then rebuilds the topology with clean node names. Finally, grafts reticulation edges to reach the desired reticulation count. Note: The ``level`` parameter specifies the number of reticulation nodes to add, not the network level in the strict sense (max reticulations per biconnected component). Depending on placement, the resulting network's true level may be less than this value.

Parameter Type Description
n int Number of leaf taxa (must be >= 3 for level >= 1).
level int Number of reticulation nodes to add. Defaults to 1.
birth_rate float Birth rate for the Yule process. Larger values produce shorter branch lengths. Defaults to 1.0.
gamma_range tuple[float, float] Range from which inheritance probabilities (gamma) are uniformly drawn. Defaults to (0.2, 0.8).
seed int | None Random seed for reproducibility. Defaults to None.
Returns: Network: A phylogenetic network with n leaves and `level` reticulation nodes.
Raises: ValueError: If n < 2 or level < 0, or not enough edges to place all reticulations.
simulate(n: int, s: int, net: Network, samples: dict[str, int] | None = None, u: float = 1.0, v: float = 1.0, coal: float = 0.005, seed: int | None = None) -> SimulatedSNPData

Simulate SNP (biallelic marker) data over a phylogenetic network. Uses a forward-in-time simulation: at the root, an allele state (red or green) is drawn from the stationary distribution. The state is then propagated down the network along each branch using the 2-state CTMC mutation model. At reticulation nodes, the parent lineage is chosen probabilistically based on the inheritance probability (gamma). For samples=1 per taxon, this is an exact simulation under the biallelic mutation model. For samples > 1, the simulation draws each sample independently conditional on the leaf's evolved frequency — this is an approximation (a full coalescent simulation within each population branch would be needed for exact multi-sample simulation).

Parameter Type Description
n int Expected number of taxa — used only for validation. The actual taxa come from the network's leaves.
s int Number of SNP sites to simulate.
net Network The phylogenetic network to simulate data on.
samples dict[str, int] | None Number of sampled individuals per taxon. Keys must match leaf names. Defaults to 1 per taxon.
u float Mutation rate from red allele to green. Defaults to 1.0.
v float Mutation rate from green allele to red. Defaults to 1.0.
coal float Coalescent rate parameter (theta). Stored in output but not used in the forward simulation. Defaults to 0.005.
seed int | None Random seed for reproducibility. Defaults to None.
Returns: SimulatedSNPData: Container with simulated data, network, and metadata.
Raises: ValueError: If the number of leaves in the network doesn't match n, or if sample keys don't match leaf names.

Navigation

Modules

This Page