Library for the Development and Use of Phylogenetic Network Methods
The Matrix module provides a class for storing and reducing Multiple Sequence Alignment (MSA) data. Supports site pattern compression for DNA data to improve computational efficiency.
Raised when there is an error parsing data into the matrix or during matrix operations.
Class that stores and reduces MSA data to only unique site patterns. Compression is currently implemented for DNA data only. Accepts any data type defined by the Alphabet class.
Create a Matrix from an MSA object.
| Parameter | Type | Description |
|---|---|---|
| alignment | MSA | Multiple Sequence Alignment object |
| alphabet | Alphabet | Alphabet for character mapping (default DNA) |
| Attribute | Type | Description |
|---|---|---|
| unique_sites | int | Number of distinct site patterns |
| data | np.ndarray | Compressed matrix of state values |
| locations | list | Maps original column index to unique pattern index |
| count | list | Count of each unique site pattern |
Returns the data point (state value) at row i, column j.
Get the character at row i, column j in the character matrix.
Returns the ith column of the data matrix.
Get the character array for a given taxon.
Get the numerical data for a given taxon.
Get the row index for a taxon with the given name.
Get the taxon name associated with a row index.
Returns the number of unique sites in the data.
Get the number of taxa in the matrix.
Get the data type (e.g., "DNA", "PROTEIN").
Get the character matrix from the state matrix using reverse mapping.
For DNA data, the Matrix class automatically compresses redundant site patterns:
count tracks how many times each unique pattern appearslocations maps original positions to compressed indicesfrom PhyNetPy.Matrix import Matrix
from PhyNetPy.MSA import MSA
from PhyNetPy.Alphabet import Alphabet, DNA, PROTEIN
# Create MSA from file
msa = MSA(filename="alignment.nex")
# Create Matrix with DNA alphabet (default)
matrix = Matrix(msa)
# Get data info
print(f"Taxa: {matrix.get_num_taxa()}")
print(f"Original sites: {matrix.seq_len}")
print(f"Unique sites: {matrix.site_count()}")
print(f"Data type: {matrix.get_type()}")
# Access data
state = matrix.get_ij(0, 0) # State value at row 0, col 0
char = matrix.get_ij_char(0, 0) # Character at row 0, col 0
# Get sequence for a taxon
seq = matrix.get_seq("Human")
num_seq = matrix.get_number_seq("Human")
# Get column data
col = matrix.get_column_at(0)
# Lookup by name
row_idx = matrix.row_given_name("Chimp")
name = matrix.name_given_row(0)
# Get site pattern counts (useful for likelihood calculations)
for i in range(matrix.site_count()):
pattern = matrix.get_column_at(i)
count = matrix.count[i]
print(f"Pattern {i}: appears {count} times")
# Use protein alphabet
protein_matrix = Matrix(msa, alphabet=Alphabet(PROTEIN))