Library for the Development and Use of Phylogenetic Network Methods
The MSA module provides classes for handling Multiple Sequence Alignments, including file I/O from Nexus files, sequence grouping, and distance matrix computation.
An individual sequence record defined by the data sequence, a name/identifier, and optionally a group ID.
| Parameter | Type | Description |
|---|---|---|
| sequence | list | Sequence of biological data |
| name | str | Sequence label/name |
| gid | int | Group ID (-1 means no group) |
Get the sequence name/label.
Get the sequence data (typically list of characters).
Get the sequence as hexadecimal integers (for SNP data).
Get the group ID.
Set the ploidy level (for bimarker data).
Get the ploidy level.
Calculate pairwise distance to another sequence (count of differences).
Class for managing Multiple Sequence Alignments. Handles file I/O from Nexus files, sequence grouping, and provides iteration over sequences.
| Parameter | Type | Description |
|---|---|---|
| filename | str | Path to Nexus file with matrix block |
| data | list[DataSequence] | Alternative: provide sequences directly |
| grouping | dict | Map from group names to sequence names |
| grouping_auto_detect | bool | Auto-group by name similarity |
Retrieve all sequences in the alignment.
Get a sequence by its exact name.
Add a sequence to the MSA.
Get the number of groups in the MSA.
Get all sequences with a given group ID.
Get the group ID for a sequence name.
Group sequences by name similarity.
Assign group IDs to ungrouped sequences using auto-detection.
Set ploidy for each group. If not provided, ploidy is auto-detected from max SNP value.
Total number of allele samples across all sequences.
Total samples within a specific group.
Return (num_sequences, sequence_length) dimensions.
Compute pairwise distances between all sequence pairs.
from PhyNetPy.MSA import MSA, DataSequence
# Load MSA from Nexus file
msa = MSA(filename="alignment.nex")
# Get dimensions
rows, cols = msa.dim()
print(f"Alignment: {rows} taxa, {cols} sites")
# Iterate over sequences
for seq in msa:
print(f"{seq.get_name()}: {len(seq)} bp")
# Get specific sequence
human = msa.seq_by_name("Human")
print(human.get_seq()[:10])
# Get all records
all_seqs = msa.get_records()
# Create MSA from DataSequence objects
seq1 = DataSequence(['A','C','G','T'], "Species1", gid=0)
seq2 = DataSequence(['A','C','T','T'], "Species2", gid=0)
seq3 = DataSequence(['A','T','G','T'], "Species3", gid=1)
custom_msa = MSA(data=[seq1, seq2, seq3])
# Compute pairwise distances
D = msa.distance_matrix()
for (s1, s2), dist in D.items():
print(f"{s1.get_name()} - {s2.get_name()}: {dist}")
# Auto-detect grouping by name similarity
msa_grouped = MSA(filename="data.nex", grouping_auto_detect=True)
print(f"Found {msa_grouped.num_groups()} groups")
# Set ploidy for SNP data
msa.set_sequence_ploidy([2, 2, 2]) # All diploid
# Or auto-detect from max SNP values
msa.set_sequence_ploidy()