← Back to PhyNetPy

PhyNetPy Documentation

Library for the Development and Use of Phylogenetic Network Methods

Matrix Module v1.0.0

The Matrix module provides a class for storing and reducing Multiple Sequence Alignment (MSA) data. Supports site pattern compression for DNA data to improve computational efficiency.

Author:
Mark Kessler
Last Edit:
3/11/25
Source:
Matrix.py

Exceptions

exception MatrixError(Exception)

Raised when there is an error parsing data into the matrix or during matrix operations.

Matrix Class

class Matrix

Class that stores and reduces MSA data to only unique site patterns. Compression is currently implemented for DNA data only. Accepts any data type defined by the Alphabet class.

Constructor

__init__(self, alignment: MSA, alphabet: Alphabet = Alphabet(DNA))

Create a Matrix from an MSA object.

Parameter Type Description
alignment MSA Multiple Sequence Alignment object
alphabet Alphabet Alphabet for character mapping (default DNA)

Attributes

Attribute Type Description
unique_sites int Number of distinct site patterns
data np.ndarray Compressed matrix of state values
locations list Maps original column index to unique pattern index
count list Count of each unique site pattern

Data Access Methods

get_ij(self, i: int, j: int) -> int

Returns the data point (state value) at row i, column j.

get_ij_char(self, i: int, j: int) -> str

Get the character at row i, column j in the character matrix.

get_column_at(self, i: int) -> np.ndarray

Returns the ith column of the data matrix.

get_seq(self, label: str) -> np.ndarray

Get the character array for a given taxon.

get_number_seq(self, label: str) -> np.ndarray

Get the numerical data for a given taxon.

Lookup Methods

row_given_name(self, label: str) -> int

Get the row index for a taxon with the given name.

name_given_row(self, index: int) -> str

Get the taxon name associated with a row index.

Info Methods

site_count(self) -> int

Returns the number of unique sites in the data.

get_num_taxa(self) -> int

Get the number of taxa in the matrix.

get_type(self) -> str

Get the data type (e.g., "DNA", "PROTEIN").

char_matrix(self) -> np.ndarray

Get the character matrix from the state matrix using reverse mapping.

Site Pattern Compression

For DNA data, the Matrix class automatically compresses redundant site patterns:

  • Identical columns are stored only once
  • count tracks how many times each unique pattern appears
  • locations maps original positions to compressed indices
  • This can significantly reduce memory and computation for large alignments

Usage Examples

from PhyNetPy.Matrix import Matrix
from PhyNetPy.MSA import MSA
from PhyNetPy.Alphabet import Alphabet, DNA, PROTEIN

# Create MSA from file
msa = MSA(filename="alignment.nex")

# Create Matrix with DNA alphabet (default)
matrix = Matrix(msa)

# Get data info
print(f"Taxa: {matrix.get_num_taxa()}")
print(f"Original sites: {matrix.seq_len}")
print(f"Unique sites: {matrix.site_count()}")
print(f"Data type: {matrix.get_type()}")

# Access data
state = matrix.get_ij(0, 0)  # State value at row 0, col 0
char = matrix.get_ij_char(0, 0)  # Character at row 0, col 0

# Get sequence for a taxon
seq = matrix.get_seq("Human")
num_seq = matrix.get_number_seq("Human")

# Get column data
col = matrix.get_column_at(0)

# Lookup by name
row_idx = matrix.row_given_name("Chimp")
name = matrix.name_given_row(0)

# Get site pattern counts (useful for likelihood calculations)
for i in range(matrix.site_count()):
    pattern = matrix.get_column_at(i)
    count = matrix.count[i]
    print(f"Pattern {i}: appears {count} times")

# Use protein alphabet
protein_matrix = Matrix(msa, alphabet=Alphabet(PROTEIN))

See Also

  • MSA - Multiple sequence alignment input
  • Alphabet - Character-state mappings
  • GTR - Substitution models for likelihood

Navigation

Modules

This Page