PhyNetPy Documentation

Library for the Development and Use of Phylogenetic Network Methods

Validation Module v1.0.0

Comprehensive file format validation for phylogenetic data files (Newick, Nexus, FASTA, PHYLIP, etc.).

Author:
Mark Kessler
Last Edit:
2/10/26
Source:
Validation.py

Exceptions

exception ValidationError(Exception)

Base exception for validation errors.

exception FileFormatError(ValidationError)

Exception raised when file format is invalid or corrupted.

exception DataIntegrityError(ValidationError)

Exception raised when data integrity checks fail.

ValidationSummary

class ValidationSummary

Container for validation results and summary information. Tracks errors, warnings, and summary statistics produced during validation of a single phylogenetic file. A summary whose :attr:`is_valid` flag is ``False`` contains at least one error. Attributes: file_path (str): Absolute or relative path to the validated file. file_format (str): Human-readable format label (e.g. ``"Newick"``). is_valid (bool): ``True`` until :meth:`add_error` is called. errors (List[str]): Accumulated error messages. warnings (List[str]): Accumulated warning messages. summary_stats (Dict[str, Any]): Free-form key/value statistics.

Constructor

__init__(file_path: str, file_format: str)

Initialize a ValidationSummary.

Parameter Type Description
file_path str Path to the file being validated.
file_format str Name of the file format (e.g. ``"Newick"``, ``"FASTA"``).

Methods

add_error(error: str) -> None

Add an error message and mark validation as failed.

Parameter Type Description
error str Human-readable description of the validation error.
add_warning(warning: str) -> None

Add a warning message. Warnings do not affect :attr:`is_valid`; they flag non-fatal issues such as missing branch lengths or low taxa counts.

Parameter Type Description
warning str Human-readable description of the warning.
add_stat(key: str, value: Any) -> None

Add a summary statistic. Statistics are stored in :attr:`summary_stats` and rendered in the human-readable report produced by :meth:`__str__`.

Parameter Type Description
key str Label for the statistic (e.g. ``"Number of Taxa"``).
value Any The statistic value. Typically a ``str``, ``int``, ``float``, or ``list``, but any printable type is accepted.
__str__ -> str

Return formatted summary report.

Returns: str: Multi-line, human-readable report containing the file path, format, validity status, statistics, warnings, and errors.

GeneTreeReport

class GeneTreeReport

Container for per-gene-tree diagnostic results. Each gene tree parsed from a nexus file gets its own GeneTreeReport that captures rooted/unrooted status, missing/duplicate taxa, whether the tree is binary or multifurcating, branch length statistics, and basic tree size metrics. These reports are embedded within a ValidationSummary so callers can inspect them programmatically or print the human-readable summary.

Constructor

__init__(tree_index: int, tree_name: str) -> None

Initialize a GeneTreeReport for a single gene tree.

Parameter Type Description
tree_index int Zero-based index of this tree in the file.
tree_name str The label/name of this tree from the nexus file.

Methods

__str__ -> str

Return a formatted single-tree report string.

GeneTreeAggregateSummary

class GeneTreeAggregateSummary

Aggregate summary across all gene trees in a nexus file. Provides high-level statistics about the entire collection of gene trees so a biologist can quickly understand the overall quality and characteristics of their gene tree dataset.

Constructor

__init__() -> None

Initialize an empty aggregate summary.

Methods

add_report(report: GeneTreeReport) -> None

Incorporate a single GeneTreeReport into the aggregate.

Parameter Type Description
report GeneTreeReport A per-tree diagnostic report.
finalize -> None

Compute final aggregate statistics after all reports have been added. Call this after all tree reports have been incorporated.

__str__ -> str

Return a formatted aggregate summary string.

BaseValidator

class BaseValidator(ABC)

Abstract base class for file format validators. Subclasses set three class-level attributes and implement ``_parse``: * ``format_name`` – human-readable format label, e.g. ``"FASTA"`` * ``_has_dependency`` – ``True`` when the optional library needed for this format is importable at runtime. * ``_dependency_msg`` – error text shown when the library is absent. The public :meth:`validate` method is a *template method*; subclasses should not override it.

Constructor

__init__()

Initialize the validator with an empty set of supported extensions. Subclasses should populate :attr:`supported_extensions` in their own ``__init__`` after calling ``super().__init__()``.

Methods

validate(file_path: str) -> ValidationSummary

Validate a file and return a summary.

Parameter Type Description
file_path str Path to the file to validate.
Returns: ValidationSummary: Validation results and summary.

NewickValidator

class NewickValidator(BaseValidator)

Validator for Newick format files (.nwk, .newick, .tre, .tree).

Constructor

__init__()

NexusValidator

class NexusValidator(BaseValidator)

Validator for Nexus format files (.nex, .nexus).

Constructor

__init__()

FastaValidator

class FastaValidator(BaseValidator)

Validator for FASTA format files (.fasta, .fas, .fa).

Constructor

__init__()

PhylipValidator

class PhylipValidator(BaseValidator)

Validator for PHYLIP format files (.phy, .phylip).

Constructor

__init__()

ClustalValidator

class ClustalValidator(BaseValidator)

Validator for Clustal format files (.aln, .clustal).

Constructor

__init__()

XMLValidator

class XMLValidator(BaseValidator)

Validator for XML format files (.xml).

Constructor

__init__()

GenBankValidator

class GenBankValidator(BaseValidator)

Validator for GenBank format files (.gb, .gbk, .genbank).

Constructor

__init__()

PhylogeneticValidator

class PhylogeneticValidator

Main validator class that handles multiple file formats. Maintains a registry of :class:`BaseValidator` subclasses keyed by format name and a reverse mapping from file extension to format. Use :meth:`validate_file` for a single file or :meth:`validate_directory` for batch validation. Attributes: validators (Dict[str, BaseValidator]): Format name to validator instance mapping. extension_map (Dict[str, str]): File extension (e.g. ``".nwk"``) to format name mapping.

Constructor

__init__()

Initialize the validator registry. Instantiates one :class:`BaseValidator` subclass per supported format and builds the extension-to-format lookup table.

Methods

validate_file(file_path: str, format_hint: Optional[str] = None) -> ValidationSummary

Validate a phylogenetic file. Determines the appropriate :class:`BaseValidator` subclass from the file extension (or *format_hint*) and delegates to it.

Parameter Type Description
file_path str Path to the file to validate.
format_hint str, optional Override for automatic format detection. Should be a key from :attr:`validators` (e.g. ``"newick"``, ``"fasta"``).
Returns: ValidationSummary: Validation results and summary. If the format is unrecognised, the summary will contain an error and ``is_valid`` will be ``False``.
get_supported_formats -> Dict[str, List[str]]

Get dictionary of supported formats and their extensions.

Returns: Dict[str, List[str]]: Mapping of format name (e.g. ``"newick"``) to a sorted list of file extensions (e.g. ``[".newick", ".nwk", ".tre", ".tree"]``).
validate_directory(directory_path: str, recursive: bool = False) -> List[ValidationSummary]

Validate all supported files in a directory. Iterates over files whose extensions appear in :attr:`extension_map` and validates each one.

Parameter Type Description
directory_path str Path to the directory to scan.
recursive bool If ``True``, descend into subdirectories
via func:`os.walk`. Defaults to ``False``.
Returns: List[ValidationSummary]: One :class:`ValidationSummary` per file that was validated, in discovery order.
Raises: FileNotFoundError: If *directory_path* does not exist., NotADirectoryError: If *directory_path* is not a directory.

Module Functions

validate_file(file_path: str, format_hint: Optional[str] = None, print_summary: bool = True) -> ValidationSummary

Convenience function to validate a single file. Creates a :class:`PhylogeneticValidator`, validates the file, and optionally prints the human-readable summary.

Parameter Type Description
file_path str Path to the file to validate.
format_hint str, optional Override for automatic format detection (e.g. ``"newick"``). When ``None``, the format is inferred from the file extension.
print_summary bool If ``True`` (the default), print the
formatted class:`ValidationSummary` to stdout.
Returns: ValidationSummary: Validation results and summary statistics.
Raises: FileNotFoundError: If *file_path* does not exist (propagated from the underlying validator).
validate_directory(directory_path: str, recursive: bool = False, print_summaries: bool = True) -> List[ValidationSummary]

Convenience function to validate all files in a directory. Creates a :class:`PhylogeneticValidator` and validates every file in *directory_path* whose extension is recognised.

Parameter Type Description
directory_path str Path to the directory to scan.
recursive bool If ``True``, descend into subdirectories. Defaults to ``False``.
print_summaries bool If ``True`` (the default), print each :class:`ValidationSummary` to stdout.
Returns: List[ValidationSummary]: One summary per validated file, in discovery order.
Raises: FileNotFoundError: If *directory_path* does not exist., NotADirectoryError: If *directory_path* is not a directory.
get_supported_formats -> Dict[str, List[str]]

Get dictionary of supported formats and their extensions.

Returns: Dict[str, List[str]]: Mapping of format name (e.g. ``"newick"``, ``"fasta"``) to a sorted list of recognised file extensions (e.g. ``[".fas", ".fasta", ".fna"]``).

Navigation

Modules

This Page