function snp_alphabet
For SNP alphabet initialization. For data sets in which the maximum ploidy is Xn, use X as @ploidy. For phased SNP data, use 1. For unphased SNP data, use 2. Args: ploidy (int): The ploidyness value of a species (ie, humans = 2, some plants > 2, etc) Returns: dict[str, int]: Returns an SNP alphabet map that maps str(int)->int for 0 <= int <= ploidy, plus the various extra character mappings.
def snp_alphabet(ploidy) -> dict[str, int]:
For SNP alphabet initialization. For data sets in which the maximum ploidy is Xn, use X as @ploidy. For phased SNP data, use 1. For unphased SNP data, use 2.
Inputs:
ploidy (int): The ploidyness value of a species (ie, humans = 2, some plants > 2, etc)
Returns:
dict[str, int]: Returns an SNP alphabet map that maps str(int)->int for 0 <= int <= ploidy, plus the various extra character mappings.
class AlphabetError(Exception)
Error class for all errors relating to alphabet mappings.
__init__(self, message) -> None:
Initialize an AlphabetError with a message.
Inputs:
message (str): error message
Returns:
(None): --
class Alphabet
Class that deals with the mapping from characters to state values that have partial likelihood values associated with them. This state mapping is primarily based on Base10 -> Binary conversions such that the decimal numbers become a generalized version of the one-hot encoding scheme. DNA MAPPING INFORMATION Symbol(s) Name Partial Likelihood A Adenine [1,0,0,0] -> 1 C Cytosine [0,1,0,0] -> 2 G Guanine [0,0,1,0] -> 4 T U Thymine [0,0,0,1] -> 8 Symbol(s) Name Partial Likelihood N ? X Any A C G T ([1,1,1,1] -> 15) V Not T A C G ([1,1,1,0] -> 7) H Not G A C T ([1,1,0,1] -> 11) D Not C A G T ([1,0,1,1] -> 13) B Not A C G T ([0,1,1,1] -> 14) M Amino A C ([1,1,0,0] -> 3) R Purine A G ([1,0,1,0] -> 5) W Weak A T ([1,0,0,1] -> 9) S Strong C G ([0,1,1,0] -> 6) Y Pyrimidine C T ([0,1,0,1] -> 10) K Keto G T ([0,0,1,1] -> 12)
__init__(self, alphabet) -> None:
Initialize this Alphabet object with a mapping of choice. May be from any of the predefined mappings {DNA, RNA, PROTEIN, CODON}, or it can be a special user defined alphabet. For SNP alphabets, use the helper function 'snp_alphabet' with your desired ploidy upperbound and generate a custom alphabet that way.
Inputs:
alphabet (dict[str, int]): Any of the constant type alphabets (from the set {DNA, RNA, PROTEIN, CODON}), or a user defined alphabet.
Returns:
(None): --
map(self, char) -> int:
Return mapping for a character encountered in a nexus file Raises: AlphabetError: if the char encountered is undefined for the data mapping.
Inputs:
char (str): nexus file matrix data point
Returns:
int: the integer corresponding to char in the alphabet mapping
get_type(self) -> str:
Returns a string that is equal to the alphabet constant name. ie. if one is using the DNA alphabet, this function will return "DNA"
Inputs:
N/A
Returns:
str: the type of alphabet being used
reverse_map(self, state) -> str:
Get the character that maps to "state" in the given alphabet Raises: AlphabetError: if the provided state is not a valid one in the alphabet
Inputs:
state (int): a value in the alphabet map
Returns:
str: the key that maps to "state"