function snp_alphabet

For SNP alphabet initialization. For data sets in which the maximum ploidy is Xn, use X as @ploidy. For phased SNP data, use 1. For unphased SNP data, use 2. Args: ploidy (int): The ploidyness value of a species (ie, humans = 2, some plants > 2, etc) Returns: dict[str, int]: Returns an SNP alphabet map that maps str(int)->int for 0 <= int <= ploidy, plus the various extra character mappings.

def snp_alphabet(ploidy) -> dict[str, int]:

For SNP alphabet initialization. For data sets in which the maximum ploidy is Xn, use X as @ploidy. For phased SNP data, use 1. For unphased SNP data, use 2.

Inputs:

ploidy (int): The ploidyness value of a species (ie, humans = 2, some plants > 2, etc)

Returns:

dict[str, int]: Returns an SNP alphabet map that maps str(int)->int for 0 <= int <= ploidy, plus the various extra character mappings.


class AlphabetError(Exception)

Error class for all errors relating to alphabet mappings.

__init__(self, message) -> None:

Initialize an AlphabetError with a message.

Inputs:

message (str): error message

Returns:

(None): --


class Alphabet

Class that deals with the mapping from characters to state values that have partial likelihood values associated with them. This state mapping is primarily based on Base10 -> Binary conversions such that the decimal numbers become a generalized version of the one-hot encoding scheme. DNA MAPPING INFORMATION Symbol(s) Name Partial Likelihood A Adenine [1,0,0,0] -> 1 C Cytosine [0,1,0,0] -> 2 G Guanine [0,0,1,0] -> 4 T U Thymine [0,0,0,1] -> 8 Symbol(s) Name Partial Likelihood N ? X Any A C G T ([1,1,1,1] -> 15) V Not T A C G ([1,1,1,0] -> 7) H Not G A C T ([1,1,0,1] -> 11) D Not C A G T ([1,0,1,1] -> 13) B Not A C G T ([0,1,1,1] -> 14) M Amino A C ([1,1,0,0] -> 3) R Purine A G ([1,0,1,0] -> 5) W Weak A T ([1,0,0,1] -> 9) S Strong C G ([0,1,1,0] -> 6) Y Pyrimidine C T ([0,1,0,1] -> 10) K Keto G T ([0,0,1,1] -> 12)

__init__(self, alphabet) -> None:

Initialize this Alphabet object with a mapping of choice. May be from any of the predefined mappings {DNA, RNA, PROTEIN, CODON}, or it can be a special user defined alphabet. For SNP alphabets, use the helper function 'snp_alphabet' with your desired ploidy upperbound and generate a custom alphabet that way.

Inputs:

alphabet (dict[str, int]): Any of the constant type alphabets (from the set {DNA, RNA, PROTEIN, CODON}), or a user defined alphabet.

Returns:

(None): --

map(self, char) -> int:

Return mapping for a character encountered in a nexus file Raises: AlphabetError: if the char encountered is undefined for the data mapping.

Inputs:

char (str): nexus file matrix data point

Returns:

int: the integer corresponding to char in the alphabet mapping

get_type(self) -> str:

Returns a string that is equal to the alphabet constant name. ie. if one is using the DNA alphabet, this function will return "DNA"

Inputs:

N/A

Returns:

str: the type of alphabet being used

reverse_map(self, state) -> str:

Get the character that maps to "state" in the given alphabet Raises: AlphabetError: if the provided state is not a valid one in the alphabet

Inputs:

state (int): a value in the alphabet map

Returns:

str: the key that maps to "state"