PhyNetPy Documentation

Library for the Development and Use of Phylogenetic Network Methods

Matrix Module v1.0.0

Data matrix storage and reduction for sequence alignments with unique site pattern compression.

Author:
Mark Kessler
Last Edit:
3/11/25
Source:
Matrix.py

Contents

Exceptions

exception MatrixError(Exception)

This exception is raised when there is an error either in parsing data into the matrix object, or if there is an error during any sort of operation

Matrix

class Matrix

Class that stores and reduces MSA data to only the relevant/unique sites that exist. The only reduction mechanism so far is applicable only to DNA. All other data types will simply be stored in a 2d numpy matrix. Accepts any data that is defined by the Alphabet class in Alphabet.py.

Constructor

__init__(alignment: MSA, alphabet: Alphabet = Alphabet(DNA)) -> None

Takes one single MSA object, along with an Alphabet object, represented as either DNA, RNA, PROTEIN, CODON, or USER. The default is DNA.

Parameter Type Description
alignment MSA Multiple Sequence Alignment (MSA) object.
alphabet Alphabet, optional An alphabet for mapping characters to numerics. Defaults to Alphabet(DNA).

Methods

populate_data -> None

Stores and simplifies the MSA data.

simplify -> None

Reduces the matrix of data by removing non-unique site patterns, and records the location and count of the unique site patterns.

get_ij(i: int, j: int) -> int

Returns the data point at row i, and column j.

Parameter Type Description
i int row index
j int column index
Returns: int: the data point.
get_ij_char(i: int, j: int) -> str

get the character at row i, column j in the character matrix that is associated with the data.

Parameter Type Description
i int row index
j int column index
Returns: str: the character at [i][j]
row_given_name(label: str) -> int

Retrieves the row index of the taxa that has name 'label'

Parameter Type Description
label str name of a taxon.
Returns: int: a row index
get_seq(label: str) -> np.ndarray

Gets the array of characters for a given taxon.

Parameter Type Description
label str the name of a taxon.
Returns: np.ndarray: an array of characters, with data type 'U1'.
get_number_seq(label: str) -> np.ndarray

Gets the numerical data for a given taxon with the name 'label'.

Parameter Type Description
label str name of a taxon
Returns: np.ndarray: a 1 dimensional array of integers, of some specific type
get_column(i: int, data: np.ndarray, sites: int) -> np.ndarray

Returns ith column of a data matrix, with 'sites' elements

Parameter Type Description
i int column index
data np.ndarray a matrix
sites int dimension of the column
Returns: np.ndarray: the data at column 'i' with length 'sites'
get_column_at(i: int) -> np.ndarray

Returns ith column of the data matrix

Parameter Type Description
i int column index
Returns: np.ndarray: the data at column i
site_count -> int

Returns the number of unique sites in the MSA/Data

Returns: int: number of unique sites
populate_counts(new_data: np.ndarray) -> None

Generates a count list that maps the ith distinct column to the number of times it appears in the original alignment matrix.

Parameter Type Description
new_data np.ndarray The simplified data matrix, that only has distinct column values.
char_matrix -> np.ndarray

Get the character matrix from the matrix of alphabet states.

Returns: np.ndarray: the character matrix, that will have equivalent dimensionality to the state matrix.
get_num_taxa -> int

Get the number of taxa represented in this matrix.

Returns: int: the number of taxa
name_given_row(index: int) -> str

Get the name of the taxa associated with the row of data at 'index'

Parameter Type Description
index int a row index
Returns: str: the taxon name
get_type -> str

Get the type of data of this matrix

Returns: str: the data type

Navigation

Modules

This Page