PhyNetPy Documentation

Library for the Development and Use of Phylogenetic Network Methods

Alphabet Module v1.0.0

Character-to-state mapping for biological sequence data (DNA, RNA, Protein, Codon, SNP).

Author:
Mark Kessler
Last Edit:
11/6/25
Source:
Alphabet.py

Constants

DNA : AlphabetMapping = AlphabetMapping('DNA', {'-': 0, 'A': 1, 'C': 2, 'M': 3, 'G': 4, 'R': 5, 'S': 6, 'V': 7, 'T': 8, 'W': 9, 'Y': 10, 'H': 11...
RNA : AlphabetMapping = AlphabetMapping('RNA', {'-': 0, 'A': 1, 'C': 2, 'M': 3, 'G': 4, 'R': 5, 'S': 6, 'V': 7, 'U': 8, 'W': 9, 'Y': 10, 'H': 11...
PROTEIN : AlphabetMapping = AlphabetMapping('PROTEIN', {'-': 0, 'A': 1, 'B': 2, 'C': 3, 'D': 4, 'E': 5, 'F': 6, 'G': 7, 'H': 8, 'I': 9, 'J': 10, 'K'...
CODON : AlphabetMapping = AlphabetMapping('CODON', {'-': 0, 'A': 1, 'C': 2, 'M': 3, 'G': 4, 'R': 5, 'S': 6, 'V': 7, 'T': 8, 'W': 9, 'Y': 10, 'H': ...

Exceptions

exception AlphabetError(Exception)

Error class for all errors relating to alphabet mappings.

AlphabetMapping

class AlphabetMapping

Immutable mapping from character symbols to integer indices.

Alphabet

class Alphabet

Class that deals with the mapping from characters to state values that have partial likelihood values associated with them. This state mapping is primarily based on Base10 -> Binary conversions such that the decimal numbers become a generalized version of the one-hot encoding scheme. DNA MAPPING INFORMATION Symbol(s) Name Partial Likelihood A Adenine [1,0,0,0] -> 1 C Cytosine [0,1,0,0] -> 2 G Guanine [0,0,1,0] -> 4 T U Thymine [0,0,0,1] -> 8 Symbol(s) Name Partial Likelihood X Any A C G T ([1,1,1,1] -> 15) V Not T A C G ([1,1,1,0] -> 7) H Not G A C T ([1,1,0,1] -> 11) D Not C A G T ([1,0,1,1] -> 13) B Not A C G T ([0,1,1,1] -> 14) M Amino A C ([1,1,0,0] -> 3) R Purine A G ([1,0,1,0] -> 5) W Weak A T ([1,0,0,1] -> 9) S Strong C G ([0,1,1,0] -> 6) Y ...

Constructor

__init__(mapping: AlphabetMapping) -> None

Initialize this Alphabet object with a mapping of choice. May be from any of the predefined mappings {DNA, RNA, PROTEIN, CODON}, or it can be a special user defined alphabet. For SNP alphabets, use the helper function 'snp_alphabet' with your desired ploidy upperbound and generate a custom alphabet that way.

Parameter Type Description
mapping AlphabetMapping Any of the constant type alphabets (from the set {DNA, RNA, PROTEIN, CODON}), or a user defined alphabet.

Methods

map(char: str) -> int

Return mapping for a character encountered in a nexus file

Parameter Type Description
char str nexus file matrix data point
Returns: int: the integer corresponding to char in the alphabet mapping
Raises: AlphabetError: if the char encountered is undefined for the data mapping.
get_type -> str

Returns a string that is equal to the alphabet constant name. ie. if one is using the DNA alphabet, this function will return "DNA"

Returns: str: the type of alphabet being used
reverse_map(state: int) -> str

Get the character that maps to "state" in the given alphabet

Parameter Type Description
state int a value in the alphabet map
Returns: str: the key that maps to "state"
Raises: AlphabetError: if the provided state is not a valid one in the alphabet

Module Functions

snp_alphabet(ploidy: int) -> AlphabetMapping

For SNP alphabet initialization. For data sets in which the maximum ploidy is Xn, use X as @ploidy. For phased SNP data, use 1. For unphased SNP data, use 2.

Parameter Type Description
ploidy int The ploidyness value of a species (ie, humans = 2, some plants > 2, etc)
Returns: dict[str, int]: Returns an SNP alphabet map that maps str(int)->int for 0 <= int <= ploidy, plus the various extra character mappings.

Navigation

Modules

This Page