← Back to PhyNetPy

PhyNetPy Documentation

Library for the Development and Use of Phylogenetic Network Methods

Validation Module v1.0.0

The Validation module provides comprehensive file format validation for common phylogenetic data formats including Newick, Nexus, FASTA, PHYLIP, Clustal, XML, and GenBank.

Author:
Mark Kessler
Last Edit:
3/11/25
Source:
Validation.py

Exceptions

exception ValidationError(Exception)

Base exception for validation errors.

exception FileFormatError(ValidationError)

Raised when file format is invalid or corrupted.

exception DataIntegrityError(ValidationError)

Raised when data integrity checks fail.

ValidationSummary Class

class ValidationSummary

Container for validation results and summary information.

Attributes

Attribute Type Description
file_path str Path to validated file
file_format str Detected file format
is_valid bool Overall validation status
errors List[str] List of error messages
warnings List[str] List of warning messages
summary_stats Dict Summary statistics

Methods

add_error(self, error: str) -> None

Add an error message and mark validation as failed.

add_warning(self, warning: str) -> None

Add a warning message (does not affect is_valid).

add_stat(self, key: str, value: Any) -> None

Add a summary statistic.

Validators

PhylogeneticValidator

class PhylogeneticValidator

Main validator class that handles multiple file formats automatically.

Methods

validate_file(self, file_path: str, format_hint: str = None) -> ValidationSummary

Validate a single file. Format is auto-detected from extension.

validate_directory(self, directory_path: str, recursive: bool = False) -> List[ValidationSummary]

Validate all supported files in a directory.

get_supported_formats(self) -> Dict[str, List[str]]

Get dictionary of supported formats and their extensions.

Format-Specific Validators

Validator Extensions Description
NewickValidator .nwk, .newick, .tre, .tree Validates Newick tree files
NexusValidator .nex, .nexus Validates Nexus files (taxa, trees, data blocks)
FastaValidator .fasta, .fas, .fa, .fna, .ffn, .faa Validates FASTA sequence files
PhylipValidator .phy, .phylip Validates PHYLIP alignment files
ClustalValidator .aln, .clustal Validates Clustal alignment files
XMLValidator .xml Validates XML files
GenBankValidator .gb, .gbk, .genbank Validates GenBank files

Convenience Functions

def validate_file(file_path: str, format_hint: str = None, print_summary: bool = True) -> ValidationSummary

Convenience function to validate a single file.

def validate_directory(directory_path: str, recursive: bool = False, print_summaries: bool = True) -> List[ValidationSummary]

Convenience function to validate all files in a directory.

def get_supported_formats() -> Dict[str, List[str]]

Get dictionary of all supported formats and extensions.

Usage Examples

from PhyNetPy.Validation import validate_file, validate_directory, get_supported_formats

# Validate a single file
summary = validate_file("network.nex")
print(summary)

# Check results programmatically
if summary.is_valid:
    print("File is valid!")
else:
    print("Errors found:", summary.errors)

# Access statistics
print(f"Number of taxa: {summary.summary_stats.get('Number of Taxa')}")

# Validate without printing
summary = validate_file("data.fasta", print_summary=False)

# Validate a directory
summaries = validate_directory("/path/to/data/", recursive=True)
valid_count = sum(1 for s in summaries if s.is_valid)
print(f"{valid_count}/{len(summaries)} files are valid")

# Get supported formats
formats = get_supported_formats()
for fmt, exts in formats.items():
    print(f"{fmt}: {', '.join(exts)}")

# Use specific validator
from PhyNetPy.Validation import NexusValidator

validator = NexusValidator()
summary = validator.validate("network.nex")

# Output example:
# ============================================================
# VALIDATION SUMMARY: network.nex
# ============================================================
# Format: Nexus
# Status: VALID
#
# SUMMARY STATISTICS:
# --------------------
#   File Size (bytes): 1234
#   Has Taxa Block: True
#   Has Trees Block: True
#   Number of Trees/Networks: 3
#
# WARNINGS:
# ----------
#   Trees contain taxa not in taxa block: ['X']
#
# ============================================================

See Also

  • NetworkParser - Uses validation before parsing
  • MSA - Sequence alignment handling

Navigation

Modules

This Page