Library for the Development and Use of Phylogenetic Network Methods
Comprehensive file format validation for phylogenetic data files (Newick, Nexus, FASTA, PHYLIP, etc.).
Base exception for validation errors.
Exception raised when file format is invalid or corrupted.
Exception raised when data integrity checks fail.
Container for validation results and summary information. Tracks errors, warnings, and summary statistics produced during validation of a single phylogenetic file. A summary whose :attr:`is_valid` flag is ``False`` contains at least one error. Attributes: file_path (str): Absolute or relative path to the validated file. file_format (str): Human-readable format label (e.g. ``"Newick"``). is_valid (bool): ``True`` until :meth:`add_error` is called. errors (List[str]): Accumulated error messages. warnings (List[str]): Accumulated warning messages. summary_stats (Dict[str, Any]): Free-form key/value statistics.
Initialize a ValidationSummary.
| Parameter | Type | Description |
|---|---|---|
| file_path | str | Path to the file being validated. |
| file_format | str | Name of the file format (e.g. ``"Newick"``, ``"FASTA"``). |
Add an error message and mark validation as failed.
| Parameter | Type | Description |
|---|---|---|
| error | str | Human-readable description of the validation error. |
Add a warning message. Warnings do not affect :attr:`is_valid`; they flag non-fatal issues such as missing branch lengths or low taxa counts.
| Parameter | Type | Description |
|---|---|---|
| warning | str | Human-readable description of the warning. |
Add a summary statistic. Statistics are stored in :attr:`summary_stats` and rendered in the human-readable report produced by :meth:`__str__`.
| Parameter | Type | Description |
|---|---|---|
| key | str | Label for the statistic (e.g. ``"Number of Taxa"``). |
| value | Any | The statistic value. Typically a ``str``, ``int``, ``float``, or ``list``, but any printable type is accepted. |
Return formatted summary report.
Container for per-gene-tree diagnostic results. Each gene tree parsed from a nexus file gets its own GeneTreeReport that captures rooted/unrooted status, missing/duplicate taxa, whether the tree is binary or multifurcating, branch length statistics, and basic tree size metrics. These reports are embedded within a ValidationSummary so callers can inspect them programmatically or print the human-readable summary.
Initialize a GeneTreeReport for a single gene tree.
| Parameter | Type | Description |
|---|---|---|
| tree_index | int | Zero-based index of this tree in the file. |
| tree_name | str | The label/name of this tree from the nexus file. |
Return a formatted single-tree report string.
Aggregate summary across all gene trees in a nexus file. Provides high-level statistics about the entire collection of gene trees so a biologist can quickly understand the overall quality and characteristics of their gene tree dataset.
Initialize an empty aggregate summary.
Incorporate a single GeneTreeReport into the aggregate.
| Parameter | Type | Description |
|---|---|---|
| report | GeneTreeReport | A per-tree diagnostic report. |
Compute final aggregate statistics after all reports have been added. Call this after all tree reports have been incorporated.
Return a formatted aggregate summary string.
Abstract base class for file format validators. Subclasses set three class-level attributes and implement ``_parse``: * ``format_name`` – human-readable format label, e.g. ``"FASTA"`` * ``_has_dependency`` – ``True`` when the optional library needed for this format is importable at runtime. * ``_dependency_msg`` – error text shown when the library is absent. The public :meth:`validate` method is a *template method*; subclasses should not override it.
Initialize the validator with an empty set of supported extensions. Subclasses should populate :attr:`supported_extensions` in their own ``__init__`` after calling ``super().__init__()``.
Validate a file and return a summary.
| Parameter | Type | Description |
|---|---|---|
| file_path | str | Path to the file to validate. |
Validator for Newick format files (.nwk, .newick, .tre, .tree).
Validator for Nexus format files (.nex, .nexus).
Validator for FASTA format files (.fasta, .fas, .fa).
Validator for PHYLIP format files (.phy, .phylip).
Validator for Clustal format files (.aln, .clustal).
Validator for XML format files (.xml).
Validator for GenBank format files (.gb, .gbk, .genbank).
Main validator class that handles multiple file formats. Maintains a registry of :class:`BaseValidator` subclasses keyed by format name and a reverse mapping from file extension to format. Use :meth:`validate_file` for a single file or :meth:`validate_directory` for batch validation. Attributes: validators (Dict[str, BaseValidator]): Format name to validator instance mapping. extension_map (Dict[str, str]): File extension (e.g. ``".nwk"``) to format name mapping.
Initialize the validator registry. Instantiates one :class:`BaseValidator` subclass per supported format and builds the extension-to-format lookup table.
Validate a phylogenetic file. Determines the appropriate :class:`BaseValidator` subclass from the file extension (or *format_hint*) and delegates to it.
| Parameter | Type | Description |
|---|---|---|
| file_path | str | Path to the file to validate. |
| format_hint | str, optional | Override for automatic format detection. Should be a key from :attr:`validators` (e.g. ``"newick"``, ``"fasta"``). |
Get dictionary of supported formats and their extensions.
Validate all supported files in a directory. Iterates over files whose extensions appear in :attr:`extension_map` and validates each one.
| Parameter | Type | Description |
|---|---|---|
| directory_path | str | Path to the directory to scan. |
| recursive | bool | If ``True``, descend into subdirectories |
| via | func:`os.walk`. Defaults to ``False``. |
FileNotFoundError: If *directory_path* does not exist., NotADirectoryError: If *directory_path* is not a directory.Convenience function to validate a single file. Creates a :class:`PhylogeneticValidator`, validates the file, and optionally prints the human-readable summary.
| Parameter | Type | Description |
|---|---|---|
| file_path | str | Path to the file to validate. |
| format_hint | str, optional | Override for automatic format detection (e.g. ``"newick"``). When ``None``, the format is inferred from the file extension. |
| print_summary | bool | If ``True`` (the default), print the |
| formatted | class:`ValidationSummary` to stdout. |
FileNotFoundError: If *file_path* does not exist (propagated from the underlying validator).Convenience function to validate all files in a directory. Creates a :class:`PhylogeneticValidator` and validates every file in *directory_path* whose extension is recognised.
| Parameter | Type | Description |
|---|---|---|
| directory_path | str | Path to the directory to scan. |
| recursive | bool | If ``True``, descend into subdirectories. Defaults to ``False``. |
| print_summaries | bool | If ``True`` (the default), print each :class:`ValidationSummary` to stdout. |
FileNotFoundError: If *directory_path* does not exist., NotADirectoryError: If *directory_path* is not a directory.Get dictionary of supported formats and their extensions.