PhyNetPy Documentation

Library for the Development and Use of Phylogenetic Network Methods

GTR Module v1.0.0

Time-reversible nucleotide substitution models (GTR, JC, K80, F81, HKY, K81, SYM, TN93).

Author:
Mark Kessler
Last Edit:
3/11/25
Source:
GTR.py

Exceptions

exception SubstitutionModelError(Exception)

Class of exception that gets raised when there is an error in the formulation of a substitution model, whether it be inputs that don't adhere to requirements or there is an issue in computation.

GTR

class GTR

General superclass for time reversable substitution models. Implements Eigenvalue decomposition for computing e^(Q*t). Special case subclasses attempt to improve on the time complexity of the matrix exponential operation. This is the Generalized Time Reversible (GTR) model.

Constructor

__init__(base_freqs: list[float], transitions: list[float], states: int = 4) -> None

Create a GTR substitution model object with the required/needed parameters.

Parameter Type Description
base_freqs list[float] An array of floats of 'states' length. Must sum to 1.
transitions list[float] An array of floats that is ('states'^2 - 'states') / 2 long.
states int, optional Number of possible data states. Defaults to 4 (For DNA, {A, C, G, T}).
Raises: SubstitutionModelError: If the base frequency or transition arrays are malformed.

Methods

getQ -> np.ndarray

Get the Q matrix.

Returns: np.ndarray: numpy array object that represents the Q matrix
set_hyperparams(params: dict[str, Any]) -> None

Change any of the base frequencies/states/transitions parameters, and recompute the Q matrix accordingly.

Parameter Type Description
params dict[str, Any] A mapping from gtr parameter names to their values. For the GTR superclass, names must be limited to ["states", "base frequencies", "transitions"]. Parameter value type for "states" is an
int parameter value type for "base frequencies" and "transitions" is a list[float].
Raises: SubstitutionModelError: If parameters are malformed/invalid.
get_hyperparams -> tuple[list[float], list[float]]

Gets the base frequency and transition arrays.

Returns: tuple[list[float], list[float]]: List that contains the base frequencies in the first element, and the transitions in the second.
state_count -> int

Get the number of states for this substitution model.

Returns: int: Number of states.
buildQ -> np.ndarray

Populate the normalized Q matrix with the correct values. Based on (1)

Returns: np.ndarray: A numpy ndarray that represents the just built Q matrix.
expt(t: float) -> np.ndarray

Compute the matrix exponential e^(Q*t) and store the result. If the solution has been computed already but the Q matrix has not changed, simply return the value

Parameter Type Description
t float Generally going to be a positive number for phylogenetic applications. Represents time, in coalescent units or any other unit.
Returns: np.ndarray: A numpy ndarray that is the result of the matrix exponential with respect to Q and time t.

K80

class K80(GTR)

For DNA only (4 states, 6 transitions). Kimura 2 parameter model from (2). Also known as K80. Parameterized by alpha and beta, the transversion and transition parameters. Base frequencies are assumed to be all equal at .25. Transition probabilities are = [alpha, beta, alpha, alpha, beta, alpha]

Constructor

__init__(alpha: float, beta: float) -> None

Initialize K80 model.

Parameter Type Description
alpha float transversion param
beta float transition param
Raises: SubstitutionModelError: if alpha and beta do not sum to 1.

Methods

set_hyperparams(params: dict[str, float]) -> None

Change any of the base frequencies/states/transitions parameters, and recompute the Q matrix accordingly.

Parameter Type Description
params dict[str, float ] A mapping from gtr parameter names to their values. For the K80 class, names must be limited to ["alpha", "beta"].
Raises: SubstitutionModelError: If parameters are malformed/invalid.
expt(t: float) -> np.ndarray

Compute the matrix exponential e^(Q*t) and store the result. If the solution has been computed already but the Q matrix has not changed, simply return the value. For K2P, a closed form solution for e^(Q*t) exists and we do not need to perform any exponentiation.

Parameter Type Description
t float Generally going to be a positive number for phylogenetic applications. Represents time, in coalescent units or any other unit.
Returns: np.ndarray: A numpy ndarray that is the result of the matrix exponential with respect to Q and time t.

F81

class F81(GTR)

For DNA only (4 states, 6 transitions). Formulated by Felsenstein in 1981, this substitution model assumes that all base frequencies are free, but all transition probabilities are equal. A closed form for the matrix (Q) exponential exists.

Constructor

__init__(bases: list[float]) -> None

Initialize the F81 model with a list of base frequencies of length 4. Transition probabilities will all be the same.

Parameter Type Description
bases list[float] a list of 4 base frequency values.
Raises: SubstitutionModelError: If the base frequencies given do not sum to 1 or if the list does not have exactly 4 elements.

Methods

set_hyperparams(params: dict[str, list[float]]) -> None

Change the base frequency parameter, and recompute the Q matrix accordingly.

Parameter Type Description
params dict[str, list[float]] A mapping from gtr parameter names to their values. For the F81
class names must be limited to ["base frequencies"].
Raises: SubstitutionModelError: If the base frequencies given do not sum to 1 or the list is over 4 elements long.

JC

class JC(GTR)

For DNA only (4 states, 6 transitions). The Jukes Cantor model is the simplest of all time reversible models, in which all parameters (transitions, base frequencies) are assumed to be equal. A closed form for the matrix exponential, e^(Q*t), exists.

Constructor

__init__() -> None

No arguments need to be provided, as the JC Q matrix is fixed.

HKY

class HKY(GTR)

For DNA only (4 states, 6 transitions). Developed by Hasegawa et al. Transversion parameters are assumed to be equal and the transition parameters are assumed to be equal. Base frequency parameters are free.

Constructor

__init__(base_freqs: list[float], transitions: list[float]) -> None

Initialize the HKY model with 4 base frequencies that sum to 1, and a transition array of length 6 with the equivalency pattern [a, b, a, a, b, a].

Parameter Type Description
base_freqs list[float] Array of 4 values that sum to 1.
transitions list[float] Array of length 6 with the equivalency pattern [a, b, a, a, b, a].
Raises: SubstitutionModelError: If inputs are malformed in any way.

Methods

set_hyperparams(params: dict[str, list[float]]) -> None

Change any of the base frequencies/states/transitions parameters, and recompute the Q matrix accordingly.

Parameter Type Description
params dict[str, list[float]] A mapping from gtr parameter names to their values. For the HKY class, names must be limited to ["base frequencies", "transitions"]
Raises: SubstitutionModelError: If parameters are malformed/invalid.

K81

class K81(GTR)

For DNA only (4 states, 6 transitions). Developed by Kimura in 1981. Base frequencies are assumed to be equal, and transition probabilities are assumed to be parameterized by the pattern [a, b, c, c, b, a].

Constructor

__init__(transitions: list[float]) -> None

Initialize with a list of 6 transition probabilities that follow the pattern [a, b, c, c, b, a]. All base frequencies are assumed to be equal.

Parameter Type Description
transitions list[float] A list of floats, 6 long.
Raises: SubstitutionModelError: If the transition probabilities are not of correct pattern.

Methods

set_hyperparams(params: dict[str, list[float]]) -> None

Change the transitions parameters, and recompute the Q matrix accordingly.

Parameter Type Description
params dict[str, list[float]] A mapping from gtr parameter names to their values. For the K81 class, names must be limited to ["transitions"].
Raises: SubstitutionModelError: If the parameters are malformed/invalid.

SYM

class SYM(GTR)

For DNA only (4 states, 6 transitions). Developed by Zharkikh in 1994, this model assumes that all base frequencies are equal, and all transition probabilities are free.

Constructor

__init__(transitions: list[float]) -> None

Initialize with a list of 6 free transition probabilities. Base frequencies are all equal.

Parameter Type Description
transitions list[float] A list of 6 transition rates.
Raises: SubstitutionModelError: if the transitions array is not of length 6.

Methods

set_hyperparams(params: dict[str, list[float]]) -> None

Change any of the base frequencies/states/transitions parameters, and recompute the Q matrix accordingly.

Parameter Type Description
params dict[str, list[float]] A mapping from gtr parameter names to their values. For the SYM class, names must be limited to ["transitions"].
Raises: SubstitutionModelError: if the transitions array is not of length 6

TN93

class TN93(GTR)

For DNA only (4 states, 6 transitions). Developed by Tamura and Nei in 1993. Similar to HKY, but two different transition parameters are used instead of one (0=2=3=5, 1 != 4). Base frequency parameters are free.

Constructor

__init__(base_freqs: list[float], transitions: list[float]) -> None

Initialize with a list of 4 free base frequencies, and 6 transitions that follow the pattern [a, b, a, a, c, a].

Parameter Type Description
base_freqs list[float] A list of 4 base frequencies
transitions list[float] A list of 6 transitions that follow the above pattern.
Raises: SubstitutionModelError: If the transitions or base frequency lists are malformed.

Methods

set_hyperparams(params: dict[str, list[float]]) -> None

Change any of the base frequencies/transitions parameters, and recompute the Q matrix accordingly.

Parameter Type Description
params dict[str, list[float]] A mapping from gtr parameter names to their values. For the TN93
class names must be limited to ["base frequencies", "transitions"]
Raises: SubstitutionModelError: If the new parameters are invalid.

Navigation

Modules

This Page