Alphabet Module v1.0.0

Character-to-state mapping for biological sequence data (DNA, RNA, Protein, Codon, SNP).

Author:: Mark Kessler
Last Edit:: 11/6/25
Source:: Alphabet.py

Constants

DNA : AlphabetMapping = AlphabetMapping('DNA', {'-': 0, 'A': 1, 'C': 2, 'M': 3, 'G': 4, 'R': 5, 'S': 6, 'V': 7, 'T': 8, 'W': 9, 'Y': 10, 'H': 11...

RNA : AlphabetMapping = AlphabetMapping('RNA', {'-': 0, 'A': 1, 'C': 2, 'M': 3, 'G': 4, 'R': 5, 'S': 6, 'V': 7, 'U': 8, 'W': 9, 'Y': 10, 'H': 11...

PROTEIN : AlphabetMapping = AlphabetMapping('PROTEIN', {'-': 0, 'A': 1, 'B': 2, 'C': 3, 'D': 4, 'E': 5, 'F': 6, 'G': 7, 'H': 8, 'I': 9, 'J': 10, 'K'...

CODON : AlphabetMapping = AlphabetMapping('CODON', {'-': 0, 'A': 1, 'C': 2, 'M': 3, 'G': 4, 'R': 5, 'S': 6, 'V': 7, 'T': 8, 'W': 9, 'Y': 10, 'H': ...

Exceptions

exception AlphabetError(Exception)

Error class for all errors relating to alphabet mappings.

AlphabetMapping

class AlphabetMapping

Immutable mapping from character symbols to integer indices.

Alphabet

class Alphabet

Class that deals with the mapping from characters to state values that have partial likelihood values associated with them. This state mapping is primarily based on Base10 -> Binary conversions such that the decimal numbers become a generalized version of the one-hot encoding scheme. DNA MAPPING INFORMATION Symbol(s) Name Partial Likelihood A Adenine [1,0,0,0] -> 1 C Cytosine [0,1,0,0] -> 2 G Guanine [0,0,1,0] -> 4 T U Thymine [0,0,0,1] -> 8 Symbol(s) Name Partial Likelihood X Any A C G T ([1,1,1,1] -> 15) V Not T A C G ([1,1,1,0] -> 7) H Not G A C T ([1,1,0,1] -> 11) D Not C A G T ([1,0,1,1] -> 13) B Not A C G T ([0,1,1,1] -> 14) M Amino A C ([1,1,0,0] -> 3) R Purine A G ([1,0,1,0] -> 5) W Weak A T ([1,0,0,1] -> 9) S Strong C G ([0,1,1,0] -> 6) Y ...

Constructor

__init__(mapping: AlphabetMapping) -> None

Initialize this Alphabet object with a mapping of choice. May be from any of the predefined mappings {DNA, RNA, PROTEIN, CODON}, or it can be a special user defined alphabet. For SNP alphabets, use the helper function 'snp_alphabet' with your desired ploidy upperbound and generate a custom alphabet that way.

Parameter	Type	Description
mapping	AlphabetMapping	Any of the constant type alphabets (from the set {DNA, RNA, PROTEIN, CODON}), or a user defined alphabet.

Methods

map(char: str) -> int

Return mapping for a character encountered in a nexus file

Parameter	Type	Description
char	str	nexus file matrix data point

Returns: int: the integer corresponding to char in the alphabet mapping

Raises: AlphabetError: if the char encountered is undefined for the data mapping.

get_type -> str

Returns a string that is equal to the alphabet constant name. ie. if one is using the DNA alphabet, this function will return "DNA"

Returns: str: the type of alphabet being used

reverse_map(state: int) -> str

Get the character that maps to "state" in the given alphabet

Parameter	Type	Description
state	int	a value in the alphabet map

Returns: str: the key that maps to "state"

Raises: AlphabetError: if the provided state is not a valid one in the alphabet

Module Functions

snp_alphabet(ploidy: int) -> AlphabetMapping

For SNP alphabet initialization. For data sets in which the maximum ploidy is Xn, use X as @ploidy. For phased SNP data, use 1. For unphased SNP data, use 2.

Parameter	Type	Description
ploidy	int	The ploidyness value of a species (ie, humans = 2, some plants > 2, etc)

Returns: dict[str, int]: Returns an SNP alphabet map that maps str(int)->int for 0 <= int <= ploidy, plus the various extra character mappings.

PhyNetPy Documentation

Alphabet Module v1.0.0

Contents