Library for the Development and Use of Phylogenetic Network Methods
SNP data simulator for phylogenetic networks using a forward-in-time 2-state CTMC.
Container for simulated SNP data and the generating network. Attributes: network (Network): The phylogenetic network used for simulation. data (dict[str, list[int]]): Mapping from taxon name to a list of red allele counts per site. n_taxa (int): Number of leaf taxa. n_sites (int): Number of simulated SNP sites. samples (dict[str, int]): Number of sampled individuals per taxon. u (float): Mutation rate from red to green allele. v (float): Mutation rate from green to red allele. coal (float): Coalescent rate parameter (theta). seed (int): Random seed used for simulation.
Return sorted list of taxon names.
Write the simulated data and network to a NEXUS file compatible with SNP_LIKELIHOOD. The output file contains: - TAXA block with taxon labels - DATA block with SNP site patterns (0/1 per site for samples=1) - TREES block with the network in extended newick format (including branch lengths and gamma annotations)
| Parameter | Type | Description |
|---|---|---|
| filepath | str | Path where the nexus file will be written. |
Generate a random phylogenetic network with n taxa and a given number of reticulations. First generates a random tree using a Yule (pure-birth) process, then rebuilds the topology with clean node names. Finally, grafts reticulation edges to reach the desired reticulation count. Note: The ``level`` parameter specifies the number of reticulation nodes to add, not the network level in the strict sense (max reticulations per biconnected component). Depending on placement, the resulting network's true level may be less than this value.
| Parameter | Type | Description |
|---|---|---|
| n | int | Number of leaf taxa (must be >= 3 for level >= 1). |
| level | int | Number of reticulation nodes to add. Defaults to 1. |
| birth_rate | float | Birth rate for the Yule process. Larger values produce shorter branch lengths. Defaults to 1.0. |
| gamma_range | tuple[float, float] | Range from which inheritance probabilities (gamma) are uniformly drawn. Defaults to (0.2, 0.8). |
| seed | int | None | Random seed for reproducibility. Defaults to None. |
ValueError: If n < 2 or level < 0, or not enough edges to place all reticulations.Simulate SNP (biallelic marker) data over a phylogenetic network. Uses a forward-in-time simulation: at the root, an allele state (red or green) is drawn from the stationary distribution. The state is then propagated down the network along each branch using the 2-state CTMC mutation model. At reticulation nodes, the parent lineage is chosen probabilistically based on the inheritance probability (gamma). For samples=1 per taxon, this is an exact simulation under the biallelic mutation model. For samples > 1, the simulation draws each sample independently conditional on the leaf's evolved frequency — this is an approximation (a full coalescent simulation within each population branch would be needed for exact multi-sample simulation).
| Parameter | Type | Description |
|---|---|---|
| n | int | Expected number of taxa — used only for validation. The actual taxa come from the network's leaves. |
| s | int | Number of SNP sites to simulate. |
| net | Network | The phylogenetic network to simulate data on. |
| samples | dict[str, int] | None | Number of sampled individuals per taxon. Keys must match leaf names. Defaults to 1 per taxon. |
| u | float | Mutation rate from red allele to green. Defaults to 1.0. |
| v | float | Mutation rate from green allele to red. Defaults to 1.0. |
| coal | float | Coalescent rate parameter (theta). Stored in output but not used in the forward simulation. Defaults to 0.005. |
| seed | int | None | Random seed for reproducibility. Defaults to None. |
ValueError: If the number of leaves in the network doesn't match n, or if sample keys don't match leaf names.