class SubstitutionModelError(Exception)

Class of exception that gets raised when there is an error in the formulation of a substitution model, whether it be inputs that don't adhere to requirements or there is an issue in computation.

__init__(self, message) -> None:

Create a custom SubstitutionModelError with a custom message. To be used in situations where substitution model calculations are irrecoverably in err.

Inputs:

message (str, optional): Custom error message. Defaults to "Unknown substitution model error".

Returns:

(None): --


class GTR

General superclass for time reversable substitution models. Implements Eigenvalue decomposition for computing e^(Q*t). Special case subclasses attempt to improve on the time complexity of the matrix exponential operation. This is the Generalized Time Reversible (GTR) model.

__init__(self, base_freqs) -> None:

Create a GTR substitution model object with the required/needed parameters. Raises: SubstitutionModelError: If the base frequency or transition arrays are malformed.

Inputs:

base_freqs (list[float]): An array of floats of 'states' length. Must sum to 1. transitions (list[float]): An array of floats that is ('states'^2 - 'states') / 2 long. states (int, optional): Number of possible data states. Defaults to 4 (For DNA, {A, C, G, T}).

Returns:

(None): --

getQ(self) -> np.ndarray:

Get the Q matrix.

Inputs:

N/A

Returns:

np.ndarray: numpy array object that represents the Q matrix

set_hyperparams(self, params) -> None:

Change any of the base frequencies/states/transitions parameters, and recompute the Q matrix accordingly. Raises: SubstitutionModelError: If parameters are malformed/invalid.

Inputs:

params (dict[str, Any]): A mapping from gtr parameter names to their values. For the GTR superclass, names must be limited to ["states", "base frequencies", "transitions"]. Parameter value type for "states" is an int, parameter value type for "base frequencies" and "transitions" is a list[float].

Returns:

(None): --

get_hyperparams(self) -> tuple[list[float], list[float]]:

Gets the base frequency and transition arrays.

Inputs:

N/A

Returns:

tuple[list[float], list[float]]: List that contains the base frequencies in the first element, and the transitions in the second.

state_count(self) -> int:

Get the number of states for this substitution model.

Inputs:

N/A

Returns:

int: Number of states.

buildQ(self) -> np.ndarray:

Populate the normalized Q matrix with the correct values. Based on (1)

Inputs:

N/A

Returns:

np.ndarray: A numpy ndarray that represents the just built Q matrix.

expt(self, t) -> np.ndarray:

Compute the matrix exponential e^(Q*t) and store the result. If the solution has been computed already but the Q matrix has not changed, simply return the value

Inputs:

t (float): Generally going to be a positive number for phylogenetic applications. Represents time, in coalescent units or any other unit.

Returns:

np.ndarray: A numpy ndarray that is the result of the matrix exponential with respect to Q and time t.

_is_valid(self, transitions) -> None:

Ensure frequencies and transitions are well formed. Raises: SubstitutionModelError: If transitions or frequencies are malformed.

Inputs:

transitions (list[float]): Transition list. freqs (list[float]): Base frequency list. Must sum to 1. states (int): Number of states.

Returns:

(None): --


class K80(GTR)

For DNA only (4 states, 6 transitions). Kimura 2 parameter model from (2). Also known as K80. Parameterized by alpha and beta, the transversion and transition parameters. Base frequencies are assumed to be all equal at .25. Transition probabilities are = [alpha, beta, alpha, alpha, beta, alpha]

__init__(self, alpha, beta) -> None:

Initialize K80 model. Raises: SubstitutionModelError: if alpha and beta do not sum to 1.

Inputs:

alpha (float): transversion param

beta (float): transition param

Returns:

(None): --

set_hyperparams(self, params) -> None:

Change any of the base frequencies/states/transitions parameters, and recompute the Q matrix accordingly. Raises: SubstitutionModelError: If parameters are malformed/invalid.

Inputs:

params (dict[str, float ]): A mapping from gtr parameter names to their values. For the K80 class, names must be limited to ["alpha", "beta"].

Returns:

(None): --

expt(self, t) -> np.ndarray:

Compute the matrix exponential e^(Q*t) and store the result. If the solution has been computed already but the Q matrix has not changed, simply return the value. For K2P, a closed form solution for e^(Q*t) exists and we do not need to perform any exponentiation.

Inputs:

t (float): Generally going to be a positive number for phylogenetic applications. Represents time, in coalescent units or any other unit.

Returns:

np.ndarray: A numpy ndarray that is the result of the matrix exponential with respect to Q and time t.


class F81(GTR)

For DNA only (4 states, 6 transitions). Formulated by Felsenstein in 1981, this substitution model assumes that all base frequencies are free, but all transition probabilities are equal. A closed form for the matrix (Q) exponential exists.

__init__(self, bases) -> None:

Initialize the F81 model with a list of base frequencies of length 4. Transition probabilities will all be the same. Raises: SubstitutionModelError: If the base frequencies given do not sum to 1 or if the list does not have exactly 4 elements.

Inputs:

bases (list[float]): a list of 4 base frequency values.

Returns:

(None): --

set_hyperparams(self, params) -> None:

Change the base frequency parameter, and recompute the Q matrix accordingly. Raises: SubstitutionModelError: If the base frequencies given do not sum to 1 or the list is over 4 elements long.

Inputs:

params (dict[str, list[float]]): A mapping from gtr parameter names to their values. For the F81 class, names must be limited to ["base frequencies"].

Returns:

(None): --


class JC(F81)

For DNA only (4 states, 6 transitions). The Jukes Cantor model is the simplest of all time reversible models, in which all parameters (transitions, base frequencies) are assumed to be equal. A closed form for the matrix exponential, e^(Q*t), exists.

__init__(self) -> None:

No arguments need to be provided, as the JC Q matrix is fixed.

Inputs:

N/A

Returns:

(None): --


class HKY(GTR)

For DNA only (4 states, 6 transitions). Developed by Hasegawa et al. Transversion parameters are assumed to be equal and the transition parameters are assumed to be equal. Base frequency parameters are free.

__init__(self, base_freqs) -> None:

Initialize the HKY model with 4 base frequencies that sum to 1, and a transition array of length 6 with the equivalency pattern [a, b, a, a, b, a]. Raises: SubstitutionModelError: If inputs are malformed in any way.

Inputs:

base_freqs (list[float]): Array of 4 values that sum to 1. transitions (list[float]): Array of length 6 with the equivalency pattern [a, b, a, a, b, a].

Returns:

(None): --

set_hyperparams(self, params) -> None:

Change any of the base frequencies/states/transitions parameters, and recompute the Q matrix accordingly. Raises: SubstitutionModelError: If parameters are malformed/invalid.

Inputs:

params (dict[str, list[float]]): A mapping from gtr parameter names to their values. For the HKY class, names must be limited to ["base frequencies", "transitions"]

Returns:

(None): --

_is_valid(self, transitions) -> None:

Ensure frequencies and transitions are well formed. Raises: SubstitutionModelError: If parameters are malformed/invalid.

Inputs:

transitions (list[float]): Transition list. Must be of length 6 and the transitions must all be equal, and all transversions must all be equal. freqs (list[float]): Base frequency list. Must be of length 4 and sum to 1. states (int): Number of states.

Returns:

(None): --


class K81(GTR)

For DNA only (4 states, 6 transitions). Developed by Kimura in 1981. Base frequencies are assumed to be equal, and transition probabilities are assumed to be parameterized by the pattern [a, b, c, c, b, a].

__init__(self, transitions) -> None:

Initialize with a list of 6 transition probabilities that follow the pattern [a, b, c, c, b, a]. All base frequencies are assumed to be equal. Raises: SubstitutionModelError: If the transition probabilities are not of correct pattern.

Inputs:

transitions (list[float]): A list of floats, 6 long.

Returns:

(None): --

set_hyperparams(self, params) -> None:

Change the transitions parameters, and recompute the Q matrix accordingly. Raises: SubstitutionModelError: If the parameters are malformed/invalid.

Inputs:

params (dict[str, list[float]]): A mapping from gtr parameter names to their values. For the K81 class, names must be limited to ["transitions"].

Returns:

(None): --

_is_valid(self, transitions) -> None:

Ensure frequencies and transitions are well formed. Raises: SubstitutionModelError: If the parameters are malformed/invalid.

Inputs:

transitions (list[float]): Transition list. Must be of length 6 and the transitions must follow the equivalency pattern of [a, b, c, c, b, a]. freqs (list[float]): unused for this function. states (int): unused for this function.

Returns:

(None): --


class SYM(GTR)

For DNA only (4 states, 6 transitions). Developed by Zharkikh in 1994, this model assumes that all base frequencies are equal, and all transition probabilities are free.

__init__(self, transitions) -> None:

Initialize with a list of 6 free transition probabilities. Base frequencies are all equal. Raises: SubstitutionModelError: if the transitions array is not of length 6.

Inputs:

transitions (list[float]): A list of 6 transition rates.

Returns:

(None): --

set_hyperparams(self, params) -> None:

Change any of the base frequencies/states/transitions parameters, and recompute the Q matrix accordingly. Raises: SubstitutionModelError: if the transitions array is not of length 6

Inputs:

params (dict[str, list[float]]): A mapping from gtr parameter names to their values. For the SYM class, names must be limited to ["transitions"].

Returns:

(None): --


class TN93(GTR)

For DNA only (4 states, 6 transitions). Developed by Tamura and Nei in 1993. Similar to HKY, but two different transition parameters are used instead of one (0=2=3=5, 1 != 4). Base frequency parameters are free.

__init__(self, base_freqs) -> None:

Initialize with a list of 4 free base frequencies, and 6 transitions that follow the pattern [a, b, a, a, c, a]. Raises: SubstitutionModelError: If the transitions or base frequency lists are malformed.

Inputs:

base_freqs (list[float]): A list of 4 base frequencies transitions (list[float]): A list of 6 transitions that follow the above pattern.

Returns:

(None): --

set_hyperparams(self, params) -> None:

Change any of the base frequencies/transitions parameters, and recompute the Q matrix accordingly. Raises: SubstitutionModelError: If the new parameters are invalid.

Inputs:

params (dict[str, list[float]]): A mapping from gtr parameter names to their values. For the TN93 class, names must be limited to ["base frequencies", "transitions"]

Returns:

(None): --

_is_valid(self, transitions) -> None:

Ensure frequencies and transitions are well formed. Raises: SubstitutionModelError: If any of the inputs are malformed/invalid.

Inputs:

transitions (list[float]): Transition rate list. freqs (list[float]): Base frequency list. Must sum to 1. states (int): Number of states. For DNA, 4.

Returns:

(None): --