← Back to PhyNetPy

PhyNetPy Documentation

Library for the Development and Use of Phylogenetic Network Methods

Alphabet Module v1.0.0

The Alphabet module provides character-to-state mappings for phylogenetic data types including DNA, RNA, Protein, and Codon sequences. Uses generalized one-hot encoding based on binary conversions.

Author:
Mark Kessler
Last Edit:
11/6/25
Source:
Alphabet.py

Module Constants

@dataclass AlphabetMapping

A frozen dataclass that stores a name and mapping dictionary.

Attribute Type Description
name str Name of the alphabet mapping
mapping dict[str, int] Character to state mapping

Predefined Alphabets

Constant Type Description
DNA AlphabetMapping DNA nucleotide mapping (A, C, G, T, ambiguity codes)
RNA AlphabetMapping RNA nucleotide mapping (A, C, G, U, ambiguity codes)
PROTEIN AlphabetMapping Amino acid mapping (20+ amino acids)
CODON AlphabetMapping Codon mapping for coding sequences

DNA Mapping Information

The DNA alphabet uses a generalized one-hot encoding scheme based on base-10 to binary conversions. IUPAC ambiguity codes are supported:

  • A (Adenine) = 1 [1,0,0,0]
  • C (Cytosine) = 2 [0,1,0,0]
  • G (Guanine) = 4 [0,0,1,0]
  • T/U (Thymine/Uracil) = 8 [0,0,0,1]
  • M (Amino: A or C) = 3
  • R (Purine: A or G) = 5
  • W (Weak: A or T) = 9
  • S (Strong: C or G) = 6
  • Y (Pyrimidine: C or T) = 10
  • K (Keto: G or T) = 12
  • X (Any) = 15 [1,1,1,1]

Exceptions

exception AlphabetError(Exception)

Error class for all errors relating to alphabet mappings.

Helper Functions

def snp_alphabet(ploidy: int) -> AlphabetMapping

Generate an SNP alphabet mapping for a given ploidy level.

Parameter Type Description
ploidy int Maximum ploidy value (e.g., 2 for diploid organisms)
Returns: AlphabetMapping - SNP alphabet mapping

Alphabet Class

class Alphabet

Class that handles mapping from characters to state values with partial likelihood values associated with them.

Constructor

__init__(self, mapping: AlphabetMapping)

Initialize with a predefined or custom alphabet mapping.

Parameter Type Description
mapping AlphabetMapping One of {DNA, RNA, PROTEIN, CODON} or a custom mapping

Methods

map(self, char: str) -> int

Return the integer state mapping for a character.

Parameter Type Description
char str A character from sequence data
Returns: int - The integer state corresponding to the character
Raises: AlphabetError - If the character is undefined for this alphabet
reverse_map(self, state: int) -> str

Get the character that maps to a given state.

Parameter Type Description
state int A state value in the alphabet
Returns: str - The character mapped to the state
Raises: AlphabetError - If the state is undefined for this alphabet
get_type(self) -> str

Returns the name of the alphabet type (e.g., "DNA", "PROTEIN").

Returns: str - The alphabet type name

Usage Examples

from PhyNetPy.Alphabet import Alphabet, DNA, RNA, PROTEIN, snp_alphabet

# Create a DNA alphabet
dna_alpha = Alphabet(DNA)

# Map characters to states
state_a = dna_alpha.map('A')  # Returns 1
state_t = dna_alpha.map('T')  # Returns 8
state_x = dna_alpha.map('X')  # Returns 15 (any nucleotide)

# Reverse mapping
char = dna_alpha.reverse_map(1)  # Returns 'A'

# Get alphabet type
print(dna_alpha.get_type())  # "DNA"

# Create SNP alphabet for diploid organisms
snp_alpha = snp_alphabet(ploidy=2)
snp = Alphabet(snp_alpha)

# Map SNP values
snp.map('0')  # 0
snp.map('1')  # 1
snp.map('2')  # 2
snp.map('-')  # 3 (missing data)

See Also

  • Matrix - Uses Alphabet for MSA data storage
  • MSA - Multiple sequence alignment handling

Navigation

Modules

This Page