Library for the Development and Use of Phylogenetic Network Methods
The Validation module provides comprehensive file format validation for common phylogenetic data formats including Newick, Nexus, FASTA, PHYLIP, Clustal, XML, and GenBank.
Base exception for validation errors.
Raised when file format is invalid or corrupted.
Raised when data integrity checks fail.
Container for validation results and summary information.
| Attribute | Type | Description |
|---|---|---|
| file_path | str | Path to validated file |
| file_format | str | Detected file format |
| is_valid | bool | Overall validation status |
| errors | List[str] | List of error messages |
| warnings | List[str] | List of warning messages |
| summary_stats | Dict | Summary statistics |
Add an error message and mark validation as failed.
Add a warning message (does not affect is_valid).
Add a summary statistic.
Main validator class that handles multiple file formats automatically.
Validate a single file. Format is auto-detected from extension.
Validate all supported files in a directory.
Get dictionary of supported formats and their extensions.
| Validator | Extensions | Description |
|---|---|---|
NewickValidator |
.nwk, .newick, .tre, .tree | Validates Newick tree files |
NexusValidator |
.nex, .nexus | Validates Nexus files (taxa, trees, data blocks) |
FastaValidator |
.fasta, .fas, .fa, .fna, .ffn, .faa | Validates FASTA sequence files |
PhylipValidator |
.phy, .phylip | Validates PHYLIP alignment files |
ClustalValidator |
.aln, .clustal | Validates Clustal alignment files |
XMLValidator |
.xml | Validates XML files |
GenBankValidator |
.gb, .gbk, .genbank | Validates GenBank files |
Convenience function to validate a single file.
Convenience function to validate all files in a directory.
Get dictionary of all supported formats and extensions.
from PhyNetPy.Validation import validate_file, validate_directory, get_supported_formats
# Validate a single file
summary = validate_file("network.nex")
print(summary)
# Check results programmatically
if summary.is_valid:
print("File is valid!")
else:
print("Errors found:", summary.errors)
# Access statistics
print(f"Number of taxa: {summary.summary_stats.get('Number of Taxa')}")
# Validate without printing
summary = validate_file("data.fasta", print_summary=False)
# Validate a directory
summaries = validate_directory("/path/to/data/", recursive=True)
valid_count = sum(1 for s in summaries if s.is_valid)
print(f"{valid_count}/{len(summaries)} files are valid")
# Get supported formats
formats = get_supported_formats()
for fmt, exts in formats.items():
print(f"{fmt}: {', '.join(exts)}")
# Use specific validator
from PhyNetPy.Validation import NexusValidator
validator = NexusValidator()
summary = validator.validate("network.nex")
# Output example:
# ============================================================
# VALIDATION SUMMARY: network.nex
# ============================================================
# Format: Nexus
# Status: VALID
#
# SUMMARY STATISTICS:
# --------------------
# File Size (bytes): 1234
# Has Taxa Block: True
# Has Trees Block: True
# Number of Trees/Networks: 3
#
# WARNINGS:
# ----------
# Trees contain taxa not in taxa block: ['X']
#
# ============================================================