Library for the Development and Use of Phylogenetic Network Methods
Central I/O hub for reading and writing phylogenetic file formats (FASTA, VCF, Newick, Nexus).
Exception raised when file I/O operations fail within PhyNetPy.
Read a FASTA file and return a list of DataSequence objects. This is the lower-level reader that returns raw DataSequence objects without wrapping them in an MSA. Useful for attaching sequences directly to Node objects in an existing Network (via Node.set_seq()). A FASTA file looks like: >sequence_name_1 ATCGATCGATCG... >sequence_name_2 GCTAGCTAGCTA... Each record becomes a DataSequence where: - name = the FASTA header (sequence ID) - seq = list of characters from the sequence string
| Parameter | Type | Description |
|---|---|---|
| filepath | str | Path to a FASTA file (.fasta, .fas, .fa, .fna, .ffn, .faa). |
FileNotFoundError: If the file does not exist., IOError: If BioPython cannot parse the file or it contains no valid sequences.Read a FASTA file and return an MSA object containing all sequences. This function parses a FASTA file, converts each record into a DataSequence, and wraps them in an MSA for downstream phylogenetic analyses such as distance calculations, alignment inspection, or model-based inference.
| Parameter | Type | Description |
|---|---|---|
| filepath | str | Path to a FASTA file (.fasta, .fas, .fa, .fna, .ffn, .faa). |
| grouping | dict[str, list], optional | A mapping from group names to lists of sequence names that belong to that group. If |
| provided | sequences will be assigned group IDs accordingly. Defaults to None. | |
| grouping_auto_detect | bool, optional | If True, attempt to automatically group sequences by name similarity. Defaults to False. |
FileNotFoundError: If the file does not exist., IOError: If the file cannot be parsed or contains no valid sequences.Write an MSA object to a FASTA file. Each DataSequence in the MSA is written as a FASTA record: >sequence_name ATCGATCG... (wrapped at line_width characters)
| Parameter | Type | Description |
|---|---|---|
| msa | MSA | The Multiple Sequence Alignment to write. |
| filepath | str | The output file path. Will be created or overwritten. |
| line_width | int, optional | Number of characters per sequence line. Standard FASTA convention is 80. Defaults to 80. |
IOError: If the MSA has no records to write, or if the file cannot be written., ValueError: If line_width is less than 1.Extract sequences from the leaf nodes of a Network and write them to a FASTA file. Only leaf nodes that have an associated DataSequence (set via Node.set_seq()) will be written. The node label becomes the FASTA header, and the attached sequence becomes the FASTA sequence. This is useful when a Network has been annotated with molecular data and the user wants to export just the sequence data.
| Parameter | Type | Description |
|---|---|---|
| network | Network | A phylogenetic network whose leaf nodes may carry DataSequence objects. |
| filepath | str | The output FASTA file path. |
| line_width | int, optional | Characters per line for sequence wrapping. Defaults to 80. |
IOError: If no leaf nodes in the network have sequence data attached, or if the file cannot be written., ValueError: If line_width is less than 1.Read a VCF (Variant Call Format) file and return an MSA object. Each sample in the VCF becomes a DataSequence whose sequence is the vector of ALT allele counts across all variant sites. This maps directly to the SNP/BiMarkers pipeline used in PhyNetPy. A typical VCF file looks like:: ##fileformat=VCFv4.1 ##INFO=<...> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Samp1 Samp2 chr1 100 . A T 30 PASS . GT 0/0 0/1 chr1 200 . G C 50 PASS . GT 1/1 0/1 Genotype encoding: - 0/0 -> 0 (homozygous reference, 0 copies of ALT allele) - 0/1 -> 1 (heterozygous, 1 copy of ALT allele) - 1/1 -> 2 (homozygous alternate, 2 copies of ALT allele) - ./. -> missing_value (missing genotype)
| Parameter | Type | Description |
|---|---|---|
| filepath | str | Path to a VCF file (.vcf). |
| grouping | dict[str, list], optional | A mapping from group/species names to lists of sample names that belong to that group. Used for the BiMarkers pipeline where multiple individuals map to a single species. Defaults to None. |
| missing_value | str, optional | The character to use for missing genotype data (./.). Defaults to "?". |
FileNotFoundError: If the file does not exist., IOError: If the file cannot be parsed or contains no variant data.Read only the metadata and header from a VCF file without loading all variant data. Useful for inspecting what samples and fields are available before a full parse.
| Parameter | Type | Description |
|---|---|---|
| filepath | str | Path to a VCF file. |
FileNotFoundError: If the file does not exist., IOError: If the file cannot be read.Write an MSA of SNP/allele-count data to a simplified VCF file. This produces a minimal VCF where each site in the MSA becomes a variant record, and each DataSequence becomes a sample column. The allele count values (0, 1, 2, ...) are converted back to VCF genotype notation (e.g., 0/0, 0/1, 1/1). Note: Because the MSA does not store chromosome position, reference alleles, or other VCF-specific metadata, this output is a simplified reconstruction. It is suitable for round-tripping SNP data or creating test files, but will not preserve full VCF metadata from an original file.
| Parameter | Type | Description |
|---|---|---|
| msa | MSA | The MSA containing allele count data (values like |
| 0 | 1, 2 per site). | |
| filepath | str | The output VCF file path. |
| chrom | str, optional | Chromosome name for all records. Defaults to "chr1". |
| start_pos | int, optional | Starting position for the first variant. Each subsequent variant increments by 1. Defaults to 1. |
| ref_allele | str, optional | Reference allele character. Defaults to "A". |
| alt_allele | str, optional | Alternate allele character. Defaults to "T". |
IOError: If the MSA has no records, or the file cannot be written.Parse a single newick/extended-newick string into a PhyNetPy Network. Supports standard newick features (branch lengths, internal node names) as well as the extended newick format for phylogenetic networks (reticulation nodes prefixed with '#', gamma inheritance comments). Examples of accepted strings:: ((A:0.1,B:0.2):0.3,C:0.4); ((A:0.1,(B:0.2)#H1:0.3):0.4,(#H1:0.5,C:0.6):0.7);
| Parameter | Type | Description |
|---|---|---|
| newick_str | str | A newick or extended-newick string. Trailing semicolons are handled automatically. |
IOError: If the string cannot be parsed.Read a file containing one or more newick strings (one per line) and parse each into a PhyNetPy Network. Blank lines and lines starting with '#' are skipped.
| Parameter | Type | Description |
|---|---|---|
| filepath | str | Path to a file containing newick strings. |
| return_type | str | ``"networks"`` (default) returns a list of Network objects. ``"genetrees"`` validates each network as a rooted binary tree and wraps them in a GeneTrees object. |
| species_gene_mapping | dict, optional | Explicit species -> gene label mapping. Only used when *return_type* is ``"genetrees"``. |
| naming_rule | Callable, optional | Gene-label-to-species callable. Only used when *return_type* is ``"genetrees"`` and no explicit mapping is given. |
FileNotFoundError: If the file does not exist., IOError: If no valid newick strings are found, or parsing fails.Convert a PhyNetPy Network into a newick string. Delegates to the Network's built-in ``newick()`` method, which produces extended-newick notation for networks with reticulation nodes.
| Parameter | Type | Description |
|---|---|---|
| network | Network | A PhyNetPy Network object. |
Write one or more Networks to a file as newick strings, one per line.
| Parameter | Type | Description |
|---|---|---|
| networks | list[Network] | Networks to write. |
| filepath | str | Output file path. Will be created or overwritten. |
IOError: If the list is empty or the file cannot be written.Read a nexus file and parse all trees/networks in the TREES block into PhyNetPy Network objects. This replicates the core functionality of ``NetworkParser`` as a standalone function, making it easy to call without instantiating a class. A typical nexus file looks like:: #NEXUS BEGIN TAXA; DIMENSIONS NTAX=3; TAXALABELS A B C; END; BEGIN TREES; Tree t1 = ((A:0.1,B:0.2):0.3,C:0.4); Tree t2 = ((B:0.1,C:0.2):0.3,A:0.4); END;
| Parameter | Type | Description |
|---|---|---|
| filepath | str | Path to a nexus file (.nex, .nexus). |
| validate_input | bool, optional | If True, run NexusValidator on the file before parsing. Defaults to False. |
| print_validation_summary | bool, optional | If True and validate_input is True, print the validation summary. Defaults to False. |
| return_type | str | ``"networks"`` (default) returns a list of Network objects. ``"genetrees"`` validates each network as a rooted binary tree and wraps them in a GeneTrees object. |
| species_gene_mapping | dict, optional | Explicit species -> gene label mapping. Only used when *return_type* is ``"genetrees"``. |
| naming_rule | Callable, optional | Gene-label-to-species callable. Only used when *return_type* is ``"genetrees"`` and no explicit mapping is given. |
FileNotFoundError: If the file does not exist., IOError: If the file cannot be parsed or contains no trees.Read the sequence data (DATA/CHARACTERS block) from a nexus file and return it as an MSA object. This is a convenience wrapper around the MSA constructor's built-in nexus parsing. Use this when you want the alignment data rather than the tree topology.
| Parameter | Type | Description |
|---|---|---|
| filepath | str | Path to a nexus file containing a DATA or CHARACTERS block. |
FileNotFoundError: If the file does not exist., IOError: If no sequence data is found.Write one or more Networks to a nexus file with TAXA and TREES blocks. This replicates the functionality of the ``NexusTemplate`` class as a standalone function. The generated file follows the standard nexus format:: #NEXUS BEGIN TAXA; DIMENSIONS NTAX=3; TAXALABELS A B C ; END; BEGIN TREES; Tree net1 = ((A:0.1,B:0.2):0.3,C:0.4); Tree net2 = ...; END;
| Parameter | Type | Description |
|---|---|---|
| networks | list[Network] | The networks to write. |
| filepath | str | Output file path. |
| taxa | set[str], optional | An explicit set of taxa labels. If |
| None | taxa are inferred from the newick strings. Defaults to None. | |
| tree_prefix | str, optional | Label prefix for each tree line. Defaults to "net". |
| overwrite | bool, optional | If False, raises IOError if the file already exists. Defaults to True. |
| phylonet_cmds | list[str], optional | A list of PhyloNet commands to include in a PHYLONET block. Defaults to None. |
IOError: If the list is empty, or the file cannot be written, or the file already exists and overwrite is False.Auto-detect which newick convention a string uses based on its formatting. The detection heuristic is: 1. If the string contains ``#Name:len::gamma`` double-colon notation on a reticulation node → **Phylonet** 2. If the string starts with ``[&R]`` or ``[&U]`` → **Beast** 3. If the string contains ``[&...gamma=...]`` → **PhyNetPy** 4. Otherwise (plain newick) → **PhyNetPy** (default)
| Parameter | Type | Description |
|---|---|---|
| newick_str | str | A newick or extended-newick string. |
Convert a newick/extended-newick string between different software conventions. The three supported standards differ primarily in how they encode inheritance probabilities (gamma) on reticulation edges: **PhyNetPy** uses BioPython-style bracket comments:: ((C:.1,(B:.05)#H0[&gamma=.7]:.05)I1:.1,(A:.1,#H0:.05)I2:.1)I3; **Phylonet** uses Rich Newick double-colon notation:: ((C:.1,(B:.05)#H0:.05::.7)I1:.1,(A:.1,#H0:.05)I2:.1)I3; **Beast** uses the same annotation as PhyNetPy but prefixes the string with ``[&R]`` for rooted trees (or ``[&U]`` for unrooted):: [&R] ((C:.1,(B:.05)#H0[&gamma=.7]:.05)I1:.1,(A:.1,#H0:.05)I2:.1)I3; The function auto-detects the input convention and converts to the target. Non-gamma metadata (e.g. ``[&posterior=0.95]``) on non-reticulation nodes is preserved in all conversions.
| Parameter | Type | Description |
|---|---|---|
| newick_str | str | A newick or extended-newick string in any of the three conventions. |
| standard | str, optional | Target convention. One of ``"PhyNetPy"`` (default), ``"Phylonet"``, or ``"Beast"``. |
ValueError: If ``standard`` is not one of the three valid options., IOError: If the input string is empty.